Skip to content

Provide an efficient way to parse Integers (and Floats)? #113

Closed
@casperisfine

Description

@casperisfine

Previous discussion: https://bugs.ruby-lang.org/issues/20394

Context

When trying to write pure Ruby gems that are competitive in term of performance with C extensions, a very common bottleneck is parsing of text based protocols and formats, such as the Redis RESP protocol, or even the PDF format (FYI @gettalong).

As a result, currently the most efficient way to parse integers in a string in Ruby, is to reimplement atoi using String#getbyte, which is a bit ridiculous.

Otherwise if you create a substring with String#slice or StringScanner#scan and then call to_i or Integer, instantiating the sub string and copying the bytes really tank the performance.

Proposal

Given that StringScanner is a default gem, is often involved in string parsing, and already act as a "pointer into a String", I think it's well positioned to offer an efficient way to parse an Integer without instantiating a useless temporary string.

Basically an optimized way to do scanner.scan(/\d+/).to_i.

The API could be any of:

  • scanner.scan(/\d+/, :to_i)
  • scanner.scan(/\d+/, Integer)
  • scanner.scan_integer(/\d+/)

Logically the two supported types would be Integer and Float, but perhaps others would be helpful for other protocols?

@kou as maintainer of strscan, do you have any opinion? I'm happy to put the work on this, but I'd need to know if the feature is desired, and which API would be deemed acceptable.

Also cc @tenderlove @mame from previous discussions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions