diff options
| author | Edouard CHIN <chin.edouard@gmail.com> | 2025-11-17 00:18:33 +0100 |
|---|---|---|
| committer | git <svn-admin@ruby-lang.org> | 2025-12-04 06:47:46 +0000 |
| commit | 932762f29457ad1def6fbab7eca7bcbeeb58ea5c (patch) | |
| tree | 8f6665e29fddb050365ba73f18373900e8bac7d5 | |
| parent | 0af85a1fe291856d6b3d2b3bb68e6edb8cb44b4f (diff) | |
[ruby/rubygems] Increase connection pool to allow for up to 70% speed increase:
- ### TL;DR
Bundler is heavily limited by the connection pool which manages a
single connection. By increasing the number of connection, we can
drastiscally speed up the installation process when many gems need
to be downloaded and installed.
### Benchmark
There are various factors that are hard to control such as
compilation time and network speed but after dozens of tests I
can consistently get aroud 70% speed increase when downloading and
installing 472 gems, most having no native extensions (on purpose).
```
# Before
bundle install 28.60s user 12.70s system 179% cpu 23.014 total
# After
bundle install 30.09s user 15.90s system 281% cpu 16.317 total
```
You can find on this gist how this was benchmarked and the Gemfile
used https://gist.github.com/Edouard-chin/c8e39148c0cdf324dae827716fbe24a0
### Context
A while ago in #869, Aaron introduced a connection pool which
greatly improved Bundler speed. It was noted in the PR description
that managing one connection was already good enough and it wasn't
clear whether we needed more connections. Aaron also had the
intuition that we may need to increase the pool for downloading
gems and he was right.
> We need to study how RubyGems uses connections and make a decision
> based on request usage (e.g. only use one connection for many small
> requests like bundler API, and maybe many connections for
> downloading gems)
When bundler downloads and installs gem in parallel https://github.com/ruby/rubygems/blob/4f85e02fdd89ee28852722dfed42a13c9f5c9193/bundler/lib/bundler/installer/parallel_installer.rb#L128
most threads have to wait for the only connection in the pool to be
available which is not efficient.
### Solution
This commit modifies the pool size for the fetcher that Bundler
uses. RubyGems fetcher will continue to use a single connection.
The bundler fetcher is used in 2 places.
1. When downloading gems https://github.com/ruby/rubygems/blob/4f85e02fdd89ee28852722dfed42a13c9f5c9193/bundler/lib/bundler/source/rubygems.rb#L481-L484
2. When grabing the index (not the compact index) using the
`bundle install --full-index` flag.
https://github.com/ruby/rubygems/blob/4f85e02fdd89ee28852722dfed42a13c9f5c9193/bundler/lib/bundler/fetcher/index.rb#L9
Having more connections in 2) is not any useful but tweaking the
size based on where the fetcher is used is a bit tricky so I opted
to modify it at the class level.
I fiddle with the pool size and found that 5 seems to be the sweet
spot at least for my environment.
https://github.com/ruby/rubygems/commit/6063fd9963
| -rw-r--r-- | lib/bundler/fetcher/gem_remote_fetcher.rb | 6 | ||||
| -rw-r--r-- | lib/rubygems/remote_fetcher.rb | 3 | ||||
| -rw-r--r-- | lib/rubygems/request/connection_pools.rb | 7 | ||||
| -rw-r--r-- | lib/rubygems/request/http_pool.rb | 15 | ||||
| -rw-r--r-- | spec/bundler/bundler/fetcher/gem_remote_fetcher_spec.rb | 60 | ||||
| -rw-r--r-- | test/rubygems/test_gem_request_connection_pools.rb | 12 | ||||
| -rw-r--r-- | tool/bundler/test_gems.rb | 1 | ||||
| -rw-r--r-- | tool/bundler/test_gems.rb.lock | 3 |
8 files changed, 99 insertions, 8 deletions
diff --git a/lib/bundler/fetcher/gem_remote_fetcher.rb b/lib/bundler/fetcher/gem_remote_fetcher.rb index 3fc7b68263..3c3c1826a1 100644 --- a/lib/bundler/fetcher/gem_remote_fetcher.rb +++ b/lib/bundler/fetcher/gem_remote_fetcher.rb @@ -5,6 +5,12 @@ require "rubygems/remote_fetcher" module Bundler class Fetcher class GemRemoteFetcher < Gem::RemoteFetcher + def initialize(*) + super + + @pool_size = 5 + end + def request(*args) super do |req| req.delete("User-Agent") if headers["User-Agent"] diff --git a/lib/rubygems/remote_fetcher.rb b/lib/rubygems/remote_fetcher.rb index 01788a6a5f..805f7aaf82 100644 --- a/lib/rubygems/remote_fetcher.rb +++ b/lib/rubygems/remote_fetcher.rb @@ -82,6 +82,7 @@ class Gem::RemoteFetcher @proxy = proxy @pools = {} @pool_lock = Thread::Mutex.new + @pool_size = 1 @cert_files = Gem::Request.get_cert_files @headers = headers @@ -338,7 +339,7 @@ class Gem::RemoteFetcher def pools_for(proxy) @pool_lock.synchronize do - @pools[proxy] ||= Gem::Request::ConnectionPools.new proxy, @cert_files + @pools[proxy] ||= Gem::Request::ConnectionPools.new proxy, @cert_files, @pool_size end end end diff --git a/lib/rubygems/request/connection_pools.rb b/lib/rubygems/request/connection_pools.rb index 6c1b04ab65..01e7e0629a 100644 --- a/lib/rubygems/request/connection_pools.rb +++ b/lib/rubygems/request/connection_pools.rb @@ -7,11 +7,12 @@ class Gem::Request::ConnectionPools # :nodoc: attr_accessor :client end - def initialize(proxy_uri, cert_files) + def initialize(proxy_uri, cert_files, pool_size = 1) @proxy_uri = proxy_uri @cert_files = cert_files @pools = {} @pool_mutex = Thread::Mutex.new + @pool_size = pool_size end def pool_for(uri) @@ -20,9 +21,9 @@ class Gem::Request::ConnectionPools # :nodoc: @pool_mutex.synchronize do @pools[key] ||= if https? uri - Gem::Request::HTTPSPool.new(http_args, @cert_files, @proxy_uri) + Gem::Request::HTTPSPool.new(http_args, @cert_files, @proxy_uri, @pool_size) else - Gem::Request::HTTPPool.new(http_args, @cert_files, @proxy_uri) + Gem::Request::HTTPPool.new(http_args, @cert_files, @proxy_uri, @pool_size) end end end diff --git a/lib/rubygems/request/http_pool.rb b/lib/rubygems/request/http_pool.rb index 52543de41f..468502ca6b 100644 --- a/lib/rubygems/request/http_pool.rb +++ b/lib/rubygems/request/http_pool.rb @@ -9,12 +9,14 @@ class Gem::Request::HTTPPool # :nodoc: attr_reader :cert_files, :proxy_uri - def initialize(http_args, cert_files, proxy_uri) + def initialize(http_args, cert_files, proxy_uri, pool_size) @http_args = http_args @cert_files = cert_files @proxy_uri = proxy_uri - @queue = Thread::SizedQueue.new 1 - @queue << nil + @pool_size = pool_size + + @queue = Thread::SizedQueue.new @pool_size + setup_queue end def checkout @@ -31,7 +33,8 @@ class Gem::Request::HTTPPool # :nodoc: connection.finish end end - @queue.push(nil) + + setup_queue end private @@ -44,4 +47,8 @@ class Gem::Request::HTTPPool # :nodoc: connection.start connection end + + def setup_queue + @pool_size.times { @queue.push(nil) } + end end diff --git a/spec/bundler/bundler/fetcher/gem_remote_fetcher_spec.rb b/spec/bundler/bundler/fetcher/gem_remote_fetcher_spec.rb new file mode 100644 index 0000000000..df1a58d843 --- /dev/null +++ b/spec/bundler/bundler/fetcher/gem_remote_fetcher_spec.rb @@ -0,0 +1,60 @@ +# frozen_string_literal: true + +require "rubygems/remote_fetcher" +require "bundler/fetcher/gem_remote_fetcher" +require_relative "../../support/artifice/helpers/artifice" +require "bundler/vendored_persistent.rb" + +RSpec.describe Bundler::Fetcher::GemRemoteFetcher do + describe "Parallel download" do + it "download using multiple connections from the pool" do + unless Bundler.rubygems.provides?(">= 4.0.0.dev") + skip "This example can only run when RubyGems supports multiple http connection pool" + end + + require_relative "../../support/artifice/helpers/endpoint" + concurrent_ruby_path = Dir[scoped_base_system_gem_path.join("gems/concurrent-ruby-*/lib/concurrent-ruby")].first + $LOAD_PATH.unshift(concurrent_ruby_path) + require "concurrent-ruby" + + require_rack_test + responses = [] + + latch1 = Concurrent::CountDownLatch.new + latch2 = Concurrent::CountDownLatch.new + previous_client = Gem::Request::ConnectionPools.client + dummy_endpoint = Class.new(Endpoint) do + get "/foo" do + latch2.count_down + latch1.wait + + responses << "foo" + end + + get "/bar" do + responses << "bar" + + latch1.count_down + end + end + + Artifice.activate_with(dummy_endpoint) + Gem::Request::ConnectionPools.client = Gem::Net::HTTP + + first_request = Thread.new do + subject.fetch_path("https://example.org/foo") + end + second_request = Thread.new do + latch2.wait + subject.fetch_path("https://example.org/bar") + end + + [first_request, second_request].each(&:join) + + expect(responses).to eq(["bar", "foo"]) + ensure + Artifice.deactivate + Gem::Request::ConnectionPools.client = previous_client + end + end +end diff --git a/test/rubygems/test_gem_request_connection_pools.rb b/test/rubygems/test_gem_request_connection_pools.rb index 966447bff6..2860deabf7 100644 --- a/test/rubygems/test_gem_request_connection_pools.rb +++ b/test/rubygems/test_gem_request_connection_pools.rb @@ -148,4 +148,16 @@ class TestGemRequestConnectionPool < Gem::TestCase end end.join end + + def test_checkouts_multiple_connections_from_the_pool + uri = Gem::URI.parse("http://example/some_endpoint") + pools = Gem::Request::ConnectionPools.new nil, [], 2 + pool = pools.pool_for uri + + pool.checkout + + Thread.new do + assert_not_nil(pool.checkout) + end.join + end end diff --git a/tool/bundler/test_gems.rb b/tool/bundler/test_gems.rb index 9776a96b90..ddc19e2939 100644 --- a/tool/bundler/test_gems.rb +++ b/tool/bundler/test_gems.rb @@ -11,6 +11,7 @@ gem "builder", "~> 3.2" gem "rb_sys" gem "fiddle" gem "rubygems-generate_index", "~> 1.1" +gem "concurrent-ruby" gem "psych" gem "etc", platforms: [:ruby, :windows] gem "open3" diff --git a/tool/bundler/test_gems.rb.lock b/tool/bundler/test_gems.rb.lock index bc103af860..b11a960725 100644 --- a/tool/bundler/test_gems.rb.lock +++ b/tool/bundler/test_gems.rb.lock @@ -4,6 +4,7 @@ GEM base64 (0.3.0) builder (3.3.0) compact_index (0.15.0) + concurrent-ruby (1.3.5) date (3.5.0) date (3.5.0-java) etc (1.4.6) @@ -59,6 +60,7 @@ PLATFORMS DEPENDENCIES builder (~> 3.2) compact_index (~> 0.15.0) + concurrent-ruby etc fiddle open3 @@ -75,6 +77,7 @@ CHECKSUMS base64 (0.3.0) sha256=27337aeabad6ffae05c265c450490628ef3ebd4b67be58257393227588f5a97b builder (3.3.0) sha256=497918d2f9dca528fdca4b88d84e4ef4387256d984b8154e9d5d3fe5a9c8835f compact_index (0.15.0) sha256=5c6c404afca8928a7d9f4dde9524f6e1610db17e675330803055db282da84a8b + concurrent-ruby (1.3.5) sha256=813b3e37aca6df2a21a3b9f1d497f8cbab24a2b94cab325bffe65ee0f6cbebc6 date (3.5.0) sha256=5e74fd6c04b0e65d97ad4f3bb5cb2d8efb37f386cc848f46310b4593ffc46ee5 date (3.5.0-java) sha256=d6876651299185b935e1b834a353e3a1d1db054be478967e8104e30a9a8f1127 etc (1.4.6) sha256=0f7e9e7842ea5e3c3bd9bc81746ebb8c65ea29e4c42a93520a0d638129c7de01 |
