From 519a4bdbc19693be9020419eb4dea9e66a256a41 Mon Sep 17 00:00:00 2001 From: Jean Boussier Date: Tue, 20 Jan 2026 17:59:15 +0100 Subject: Optimize File.basename The actual algorithm is largely unchanged, just allowed to use singlebyte checks for common encodings. It could certainly be optimized much further, as here again it often scans from the front of the string when we're interested in the back of it. But the algorithm as many Windows only corner cases so I'd rather ship a good improvement now and eventually come back to it later. Most of improvement here is from the reduced setup cost (avodi double null checks, avoid duping the argument, etc), and skipping the multi-byte checks. ``` compare-ruby: ruby 4.1.0dev (2026-01-19T03:51:30Z master 631bf19b37) +PRISM [arm64-darwin25] built-ruby: ruby 4.1.0dev (2026-01-21T08:21:05Z opt-basename 7eb11745b2) +PRISM [arm64-darwin25] ``` | |compare-ruby|built-ruby| |:----------|-----------:|---------:| |long | 3.412M| 18.158M| | | -| 5.32x| |long_name | 1.981M| 8.580M| | | -| 4.33x| |withext | 3.200M| 12.986M| | | -| 4.06x| --- spec/ruby/core/file/basename_spec.rb | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) (limited to 'spec/ruby') diff --git a/spec/ruby/core/file/basename_spec.rb b/spec/ruby/core/file/basename_spec.rb index 989409d76b..87695ab97b 100644 --- a/spec/ruby/core/file/basename_spec.rb +++ b/spec/ruby/core/file/basename_spec.rb @@ -151,8 +151,34 @@ describe "File.basename" do File.basename("c:\\bar.txt", ".*").should == "bar" File.basename("c:\\bar.txt.exe", ".*").should == "bar.txt" end + + it "handles Shift JIS 0x5C (\\) as second byte of a multi-byte sequence" do + # dir\fileソname.txt + path = "dir\\file\x83\x5cname.txt".b.force_encoding(Encoding::SHIFT_JIS) + path.valid_encoding?.should be_true + File.basename(path).should == "file\x83\x5cname.txt".b.force_encoding(Encoding::SHIFT_JIS) + end end + it "rejects strings encoded with non ASCII-compatible encodings" do + Encoding.list.reject(&:ascii_compatible?).reject(&:dummy?).each do |enc| + begin + path = "/foo/bar".encode(enc) + rescue Encoding::ConverterNotFoundError + next + end + + -> { + File.basename(path) + }.should raise_error(Encoding::CompatibilityError) + end + end + + it "works with all ASCII-compatible encodings" do + Encoding.list.select(&:ascii_compatible?).each do |enc| + File.basename("/foo/bar".encode(enc)).should == "bar".encode(enc) + end + end it "returns the extension for a multibyte filename" do File.basename('/path/Офис.m4a').should == "Офис.m4a" -- cgit v1.2.3