[Compatibility] Add String#byteindex and String#byterindex #3043

itarato · 2023-05-10T12:51:13Z

Source: #3039

String#byteindex and String#byterindex have been added. [Feature #13110]

src/main/ruby/truffleruby/core/string.rb

spec/ruby/core/string/byteindex_spec.rb

src/main/ruby/truffleruby/core/string.rb

andrykonchin

Thank you!

spec/ruby/core/string/byterindex_spec.rb

src/main/ruby/truffleruby/core/string.rb

andrykonchin · 2023-05-19T14:14:56Z

Also it will be useful to run these Ruby 3.2 related specs on CI. To do so the spec files are supposed to be listed in the spec/truffleruby.next-specs file (here and in #3044).

CHANGELOG.md

itarato · 2023-05-19T14:52:15Z

Also it will be useful to run these Ruby 3.2 related specs on CI. To do so the spec files are supposed to be listed in the spec/truffleruby.next-specs file (here and in #3044).

Done

spec/ruby/core/string/shared/byte_index_ops.rb

src/main/ruby/truffleruby/core/string.rb

spec/ruby/core/string/shared/byte_index_ops.rb

CHANGELOG.md

src/main/ruby/truffleruby/core/string.rb

eregon · 2023-05-25T11:39:10Z

src/main/ruby/truffleruby/core/string.rb

+    finish_adjusted = Primitive.byte_index_to_character_index(self, finish)
+    finish_adjusted += str.size
+    finish_adjusted = size if finish_adjusted > size
+    finish = Primitive.character_index_to_byte_index(self, finish_adjusted)


Can't this be something like finish += str.bytesize ?
I don't understand why this is necessary or what it does though.

I guess I could have made the comment above better. Let's say the call is

("x" * 10).byterindex("xxx", 5)

Ruby will start the lookup matching the pattern on index 5:

xxxxxxxxxx xxx

StringByteReverseIndexNode on the other hand is matching the pattern at the end (non inclusive) on index 2:

xxxxxxxxxx xxx

What makes this offset adjustment non trivial is the difference in encoding. Since str might have different char length, finish += str.bytesize can fall on a non codepoint boundary. Conceptually we need to adjust str.size characters in bytes. Is there a better way to do this?

Ruby will start the lookup matching the pattern on index 5:

These docs are pretty confusing: https://docs.ruby-lang.org/en/master/String.html#method-i-byterindex

Integer argument offset, if given and non-negative, specifies the maximum starting byte-based position in the string to end the search:

But indeed:

> ("x" * 10).byterindex("xxx", 10) => 7 > ("x" * 10).byterindex("xxx", 9) => 7 > ("x" * 10).byterindex("xxx", 8) => 7 > ("x" * 10).byterindex("xxx", 7) => 7 > ("x" * 10).byterindex("xxx", 6) => 6 > ("x" * 10).byterindex("xxx", 5) => 5 > ("x" * 10).byterindex("xxx", 4) => 4

So conceptually it's like a maximum limit to the returned value.

In my understanding, adding finish = Primitive.min(finish + str.bytesize, self.bytesize) is correct then.

can fall on a non codepoint boundary.

Why does it matter? I think if TruffleString can handle it we can just ignore that.
If it can't we could keep increasing finish by 1 until that position is a character head.

Why does it matter? I think if TruffleString can handle it we can just ignore that.

I had the assumption it matters. If we don't care and the code neither (tested, it doesn't), than I guess it's fine. Fixed.

src/main/ruby/truffleruby/core/truffle/string_operations.rb

src/main/java/org/truffleruby/core/string/StringNodes.java

src/main/ruby/truffleruby/core/string.rb

Add String#byterindex Add tests

itarato self-assigned this May 10, 2023

oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label May 10, 2023

itarato added the shopify label May 10, 2023

itarato force-pushed the feature/PA-3039-string-byteindex-byterindex branch 2 times, most recently from 657d2a8 to 7f21400 Compare May 10, 2023 14:06

itarato marked this pull request as ready for review May 12, 2023 12:40

andrykonchin reviewed May 18, 2023

View reviewed changes

andrykonchin approved these changes May 18, 2023

View reviewed changes

andrykonchin reviewed May 19, 2023

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

itarato force-pushed the feature/PA-3039-string-byteindex-byterindex branch from 7f21400 to 61b5b60 Compare May 19, 2023 14:43

itarato requested a review from andrykonchin May 19, 2023 14:47

itarato force-pushed the feature/PA-3039-string-byteindex-byterindex branch 2 times, most recently from ccb2732 to 2cfaa3a Compare May 19, 2023 14:51

andrykonchin mentioned this pull request May 19, 2023

[Compatibility] Add String#bytesplice #3044

Merged

andrykonchin approved these changes May 19, 2023

View reviewed changes

itarato force-pushed the feature/PA-3039-string-byteindex-byterindex branch from 2cfaa3a to 4ff1b34 Compare May 23, 2023 14:43

andrykonchin approved these changes May 23, 2023

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

itarato force-pushed the feature/PA-3039-string-byteindex-byterindex branch from 4ff1b34 to 63e2718 Compare May 23, 2023 14:54

andrykonchin added the in-ci The PR is being tested in CI. Do not push new commits. label May 23, 2023

eregon requested changes May 25, 2023

View reviewed changes

andrykonchin removed the in-ci The PR is being tested in CI. Do not push new commits. label May 25, 2023

itarato force-pushed the feature/PA-3039-string-byteindex-byterindex branch 2 times, most recently from de5d8a7 to 7eb9724 Compare May 26, 2023 13:22

eregon reviewed May 26, 2023

View reviewed changes

src/main/ruby/truffleruby/core/string.rb Outdated Show resolved Hide resolved

Add String#byteindex

45d9756

Add String#byterindex Add tests

itarato force-pushed the feature/PA-3039-string-byteindex-byterindex branch from 7eb9724 to 45d9756 Compare May 26, 2023 20:50

itarato requested review from eregon and andrykonchin May 26, 2023 20:51

eregon approved these changes May 29, 2023

View reviewed changes

eregon added the in-ci The PR is being tested in CI. Do not push new commits. label May 29, 2023

graalvmbot merged commit 188cf00 into oracle:master May 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Compatibility] Add String#byteindex and String#byterindex #3043

[Compatibility] Add String#byteindex and String#byterindex #3043

itarato commented May 10, 2023 •

edited

Loading

andrykonchin left a comment

andrykonchin commented May 19, 2023

itarato commented May 19, 2023

eregon May 25, 2023

itarato May 25, 2023

eregon May 26, 2023 •

edited

Loading

itarato May 26, 2023

[Compatibility] Add String#byteindex and String#byterindex #3043

[Compatibility] Add String#byteindex and String#byterindex #3043

Conversation

itarato commented May 10, 2023 • edited Loading

andrykonchin left a comment

Choose a reason for hiding this comment

andrykonchin commented May 19, 2023

itarato commented May 19, 2023

eregon May 25, 2023

Choose a reason for hiding this comment

itarato May 25, 2023

Choose a reason for hiding this comment

eregon May 26, 2023 • edited Loading

Choose a reason for hiding this comment

itarato May 26, 2023

Choose a reason for hiding this comment

itarato commented May 10, 2023 •

edited

Loading

eregon May 26, 2023 •

edited

Loading