Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Compatibility] Add String#byteindex and String#byterindex #3043

Merged

Conversation

itarato
Copy link
Collaborator

@itarato itarato commented May 10, 2023

Source: #3039

String#byteindex and String#byterindex have been added. [Feature #13110]

@itarato itarato self-assigned this May 10, 2023
@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label May 10, 2023
@itarato itarato force-pushed the feature/PA-3039-string-byteindex-byterindex branch 2 times, most recently from 657d2a8 to 7f21400 Compare May 10, 2023 14:06
@itarato itarato marked this pull request as ready for review May 12, 2023 12:40
src/main/ruby/truffleruby/core/string.rb Outdated Show resolved Hide resolved
spec/ruby/core/string/byteindex_spec.rb Outdated Show resolved Hide resolved
src/main/ruby/truffleruby/core/string.rb Outdated Show resolved Hide resolved
src/main/ruby/truffleruby/core/string.rb Show resolved Hide resolved
Copy link
Member

@andrykonchin andrykonchin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

spec/ruby/core/string/byterindex_spec.rb Outdated Show resolved Hide resolved
spec/ruby/core/string/byterindex_spec.rb Show resolved Hide resolved
src/main/ruby/truffleruby/core/string.rb Outdated Show resolved Hide resolved
src/main/ruby/truffleruby/core/string.rb Outdated Show resolved Hide resolved
@andrykonchin
Copy link
Member

Also it will be useful to run these Ruby 3.2 related specs on CI. To do so the spec files are supposed to be listed in the spec/truffleruby.next-specs file (here and in #3044).

CHANGELOG.md Outdated Show resolved Hide resolved
@itarato itarato force-pushed the feature/PA-3039-string-byteindex-byterindex branch from 7f21400 to 61b5b60 Compare May 19, 2023 14:43
@itarato itarato requested a review from andrykonchin May 19, 2023 14:47
@itarato itarato force-pushed the feature/PA-3039-string-byteindex-byterindex branch 2 times, most recently from ccb2732 to 2cfaa3a Compare May 19, 2023 14:51
@itarato
Copy link
Collaborator Author

itarato commented May 19, 2023

Also it will be useful to run these Ruby 3.2 related specs on CI. To do so the spec files are supposed to be listed in the spec/truffleruby.next-specs file (here and in #3044).

Done

spec/ruby/core/string/shared/byte_index_ops.rb Outdated Show resolved Hide resolved
spec/ruby/core/string/shared/byte_index_ops.rb Outdated Show resolved Hide resolved
src/main/ruby/truffleruby/core/string.rb Show resolved Hide resolved
spec/ruby/core/string/shared/byte_index_ops.rb Outdated Show resolved Hide resolved
@itarato itarato force-pushed the feature/PA-3039-string-byteindex-byterindex branch from 2cfaa3a to 4ff1b34 Compare May 23, 2023 14:43
CHANGELOG.md Outdated Show resolved Hide resolved
@itarato itarato force-pushed the feature/PA-3039-string-byteindex-byterindex branch from 4ff1b34 to 63e2718 Compare May 23, 2023 14:54
@andrykonchin andrykonchin added the in-ci The PR is being tested in CI. Do not push new commits. label May 23, 2023
src/main/ruby/truffleruby/core/string.rb Outdated Show resolved Hide resolved
src/main/ruby/truffleruby/core/string.rb Outdated Show resolved Hide resolved
src/main/ruby/truffleruby/core/string.rb Outdated Show resolved Hide resolved
src/main/ruby/truffleruby/core/string.rb Outdated Show resolved Hide resolved
Comment on lines 1167 to 1162
finish_adjusted = Primitive.byte_index_to_character_index(self, finish)
finish_adjusted += str.size
finish_adjusted = size if finish_adjusted > size
finish = Primitive.character_index_to_byte_index(self, finish_adjusted)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be something like finish += str.bytesize ?
I don't understand why this is necessary or what it does though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I could have made the comment above better. Let's say the call is

("x" * 10).byterindex("xxx", 5)

Ruby will start the lookup matching the pattern on index 5:

xxxxxxxxxx
     xxx

StringByteReverseIndexNode on the other hand is matching the pattern at the end (non inclusive) on index 2:

xxxxxxxxxx
  xxx

What makes this offset adjustment non trivial is the difference in encoding. Since str might have different char length, finish += str.bytesize can fall on a non codepoint boundary. Conceptually we need to adjust str.size characters in bytes. Is there a better way to do this?

Copy link
Member

@eregon eregon May 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ruby will start the lookup matching the pattern on index 5:

These docs are pretty confusing: https://docs.ruby-lang.org/en/master/String.html#method-i-byterindex

Integer argument offset, if given and non-negative, specifies the maximum starting byte-based position in the string to end the search:

But indeed:

> ("x" * 10).byterindex("xxx", 10)
=> 7
> ("x" * 10).byterindex("xxx", 9)
=> 7
> ("x" * 10).byterindex("xxx", 8)
=> 7
> ("x" * 10).byterindex("xxx", 7)
=> 7
> ("x" * 10).byterindex("xxx", 6)
=> 6
> ("x" * 10).byterindex("xxx", 5)
=> 5
> ("x" * 10).byterindex("xxx", 4)
=> 4

So conceptually it's like a maximum limit to the returned value.

In my understanding, adding finish = Primitive.min(finish + str.bytesize, self.bytesize) is correct then.

can fall on a non codepoint boundary.

Why does it matter? I think if TruffleString can handle it we can just ignore that.
If it can't we could keep increasing finish by 1 until that position is a character head.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it matter? I think if TruffleString can handle it we can just ignore that.

I had the assumption it matters. If we don't care and the code neither (tested, it doesn't), than I guess it's fine. Fixed.

src/main/java/org/truffleruby/core/string/StringNodes.java Outdated Show resolved Hide resolved
@andrykonchin andrykonchin removed the in-ci The PR is being tested in CI. Do not push new commits. label May 25, 2023
@itarato itarato force-pushed the feature/PA-3039-string-byteindex-byterindex branch 2 times, most recently from de5d8a7 to 7eb9724 Compare May 26, 2023 13:22
Add String#byterindex
Add tests
@itarato itarato force-pushed the feature/PA-3039-string-byteindex-byterindex branch from 7eb9724 to 45d9756 Compare May 26, 2023 20:50
@itarato itarato requested review from eregon and andrykonchin May 26, 2023 20:51
@eregon eregon added the in-ci The PR is being tested in CI. Do not push new commits. label May 29, 2023
@graalvmbot graalvmbot merged commit 188cf00 into oracle:master May 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in-ci The PR is being tested in CI. Do not push new commits. OCA Verified All contributors have signed the Oracle Contributor Agreement. shopify
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants