summaryrefslogtreecommitdiff
path: root/spec
diff options
context:
space:
mode:
authorEdouard CHIN <chin.edouard@gmail.com>2025-11-17 00:18:33 +0100
committergit <svn-admin@ruby-lang.org>2025-12-04 06:47:46 +0000
commit932762f29457ad1def6fbab7eca7bcbeeb58ea5c (patch)
tree8f6665e29fddb050365ba73f18373900e8bac7d5 /spec
parent0af85a1fe291856d6b3d2b3bb68e6edb8cb44b4f (diff)
[ruby/rubygems] Increase connection pool to allow for up to 70% speed increase:
- ### TL;DR Bundler is heavily limited by the connection pool which manages a single connection. By increasing the number of connection, we can drastiscally speed up the installation process when many gems need to be downloaded and installed. ### Benchmark There are various factors that are hard to control such as compilation time and network speed but after dozens of tests I can consistently get aroud 70% speed increase when downloading and installing 472 gems, most having no native extensions (on purpose). ``` # Before bundle install 28.60s user 12.70s system 179% cpu 23.014 total # After bundle install 30.09s user 15.90s system 281% cpu 16.317 total ``` You can find on this gist how this was benchmarked and the Gemfile used https://gist.github.com/Edouard-chin/c8e39148c0cdf324dae827716fbe24a0 ### Context A while ago in #869, Aaron introduced a connection pool which greatly improved Bundler speed. It was noted in the PR description that managing one connection was already good enough and it wasn't clear whether we needed more connections. Aaron also had the intuition that we may need to increase the pool for downloading gems and he was right. > We need to study how RubyGems uses connections and make a decision > based on request usage (e.g. only use one connection for many small > requests like bundler API, and maybe many connections for > downloading gems) When bundler downloads and installs gem in parallel https://github.com/ruby/rubygems/blob/4f85e02fdd89ee28852722dfed42a13c9f5c9193/bundler/lib/bundler/installer/parallel_installer.rb#L128 most threads have to wait for the only connection in the pool to be available which is not efficient. ### Solution This commit modifies the pool size for the fetcher that Bundler uses. RubyGems fetcher will continue to use a single connection. The bundler fetcher is used in 2 places. 1. When downloading gems https://github.com/ruby/rubygems/blob/4f85e02fdd89ee28852722dfed42a13c9f5c9193/bundler/lib/bundler/source/rubygems.rb#L481-L484 2. When grabing the index (not the compact index) using the `bundle install --full-index` flag. https://github.com/ruby/rubygems/blob/4f85e02fdd89ee28852722dfed42a13c9f5c9193/bundler/lib/bundler/fetcher/index.rb#L9 Having more connections in 2) is not any useful but tweaking the size based on where the fetcher is used is a bit tricky so I opted to modify it at the class level. I fiddle with the pool size and found that 5 seems to be the sweet spot at least for my environment. https://github.com/ruby/rubygems/commit/6063fd9963
Diffstat (limited to 'spec')
-rw-r--r--spec/bundler/bundler/fetcher/gem_remote_fetcher_spec.rb60
1 files changed, 60 insertions, 0 deletions
diff --git a/spec/bundler/bundler/fetcher/gem_remote_fetcher_spec.rb b/spec/bundler/bundler/fetcher/gem_remote_fetcher_spec.rb
new file mode 100644
index 0000000000..df1a58d843
--- /dev/null
+++ b/spec/bundler/bundler/fetcher/gem_remote_fetcher_spec.rb
@@ -0,0 +1,60 @@
+# frozen_string_literal: true
+
+require "rubygems/remote_fetcher"
+require "bundler/fetcher/gem_remote_fetcher"
+require_relative "../../support/artifice/helpers/artifice"
+require "bundler/vendored_persistent.rb"
+
+RSpec.describe Bundler::Fetcher::GemRemoteFetcher do
+ describe "Parallel download" do
+ it "download using multiple connections from the pool" do
+ unless Bundler.rubygems.provides?(">= 4.0.0.dev")
+ skip "This example can only run when RubyGems supports multiple http connection pool"
+ end
+
+ require_relative "../../support/artifice/helpers/endpoint"
+ concurrent_ruby_path = Dir[scoped_base_system_gem_path.join("gems/concurrent-ruby-*/lib/concurrent-ruby")].first
+ $LOAD_PATH.unshift(concurrent_ruby_path)
+ require "concurrent-ruby"
+
+ require_rack_test
+ responses = []
+
+ latch1 = Concurrent::CountDownLatch.new
+ latch2 = Concurrent::CountDownLatch.new
+ previous_client = Gem::Request::ConnectionPools.client
+ dummy_endpoint = Class.new(Endpoint) do
+ get "/foo" do
+ latch2.count_down
+ latch1.wait
+
+ responses << "foo"
+ end
+
+ get "/bar" do
+ responses << "bar"
+
+ latch1.count_down
+ end
+ end
+
+ Artifice.activate_with(dummy_endpoint)
+ Gem::Request::ConnectionPools.client = Gem::Net::HTTP
+
+ first_request = Thread.new do
+ subject.fetch_path("https://example.org/foo")
+ end
+ second_request = Thread.new do
+ latch2.wait
+ subject.fetch_path("https://example.org/bar")
+ end
+
+ [first_request, second_request].each(&:join)
+
+ expect(responses).to eq(["bar", "foo"])
+ ensure
+ Artifice.deactivate
+ Gem::Request::ConnectionPools.client = previous_client
+ end
+ end
+end