Skip to content

Commit

Permalink
source: Remove unnecessary string length comparisons in the case of s…
Browse files Browse the repository at this point in the history
…tring comparisons (#116)

## Why

https://github.com/ruby/rexml/blob/370666e314816b57ecd5878e757224c3b6bc93f5/lib/rexml/source.rb#L208-L234

Because `@line_break = encode(">")`, the end of `@scanner << readline`
is one of the following.

1. ">"
2. "X>"
3. "X" (eof)

This will not be matched by additional reads in the following cases.

- `@source.match("<?")`
- `@source.match("--")`
- `@source.match("DOCTYPE")`

In the following cases, additional reads may result in a match, but
explicitly prohibiting such a specification with a comment makes the
string length check unnecessary.
- `@source.match(">>")`
- `@source.match(">X")`

## Benchmark

```
RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/naitoh/.rbenv/versions/3.3.0/bin/ruby -v -S benchmark-driver /Users/naitoh/ghq/github.com/naitoh/rexml/benchmark/parse.yaml
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin22]
Calculating -------------------------------------
                         before       after  before(YJIT)  after(YJIT)
                 dom     10.689      10.736        18.484       18.108 i/s -     100.000 times in 9.355754s 9.314792s 5.409984s 5.522527s
                 sax     30.793      31.583        52.965       52.641 i/s -     100.000 times in 3.247486s 3.166258s 1.888036s 1.899660s
                pull     36.308      37.182        63.773       64.669 i/s -     100.000 times in 2.754203s 2.689440s 1.568069s 1.546325s
              stream     34.936      35.991        56.830       57.729 i/s -     100.000 times in 2.862361s 2.778467s 1.759632s 1.732238s

Comparison:
                              dom
        before(YJIT):        18.5 i/s
         after(YJIT):        18.1 i/s - 1.02x  slower
               after:        10.7 i/s - 1.72x  slower
              before:        10.7 i/s - 1.73x  slower

                              sax
        before(YJIT):        53.0 i/s
         after(YJIT):        52.6 i/s - 1.01x  slower
               after:        31.6 i/s - 1.68x  slower
              before:        30.8 i/s - 1.72x  slower

                             pull
         after(YJIT):        64.7 i/s
        before(YJIT):        63.8 i/s - 1.01x  slower
               after:        37.2 i/s - 1.74x  slower
              before:        36.3 i/s - 1.78x  slower

                           stream
         after(YJIT):        57.7 i/s
        before(YJIT):        56.8 i/s - 1.02x  slower
               after:        36.0 i/s - 1.60x  slower
              before:        34.9 i/s - 1.65x  slower
```

- YJIT=ON : 0.98x - 1.02x faster
- YJIT=OFF : 1.00x - 1.03x faster
  • Loading branch information
naitoh authored Mar 3, 2024
1 parent 370666e commit 19975fe
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion lib/rexml/source.rb
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,9 @@ def read
end
end

# Note: When specifying a string for 'pattern', it must not include '>' except in the following formats:
# - ">"
# - "XXX>" (X is any string excluding '>')
def match( pattern, cons=false )
read if @scanner.eos? && @source
while true
Expand All @@ -170,7 +173,7 @@ def match( pattern, cons=false )
md = @scanner.check(pattern)
end
break if md
return nil if pattern.is_a?(String) && pattern.bytesize <= @scanner.rest_size
return nil if pattern.is_a?(String)
return nil if @source.nil?
return nil unless read
end
Expand Down

0 comments on commit 19975fe

Please sign in to comment.