Description
Previous discussion: https://bugs.ruby-lang.org/issues/20394
Context
When trying to write pure Ruby gems that are competitive in term of performance with C extensions, a very common bottleneck is parsing of text based protocols and formats, such as the Redis RESP protocol, or even the PDF format (FYI @gettalong).
As a result, currently the most efficient way to parse integers in a string in Ruby, is to reimplement atoi
using String#getbyte
, which is a bit ridiculous.
Otherwise if you create a substring with String#slice
or StringScanner#scan
and then call to_i
or Integer
, instantiating the sub string and copying the bytes really tank the performance.
Proposal
Given that StringScanner
is a default gem, is often involved in string parsing, and already act as a "pointer into a String", I think it's well positioned to offer an efficient way to parse an Integer without instantiating a useless temporary string.
Basically an optimized way to do scanner.scan(/\d+/).to_i
.
The API could be any of:
scanner.scan(/\d+/, :to_i)
scanner.scan(/\d+/, Integer)
scanner.scan_integer(/\d+/)
Logically the two supported types would be Integer
and Float
, but perhaps others would be helpful for other protocols?
@kou as maintainer of strscan
, do you have any opinion? I'm happy to put the work on this, but I'd need to know if the feature is desired, and which API would be deemed acceptable.
Also cc @tenderlove @mame from previous discussions.