Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend the capabilities of convert_ruby_to_v8 #128

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Taiki-San
Copy link
Contributor

Extend the capabilities of convert_ruby_to_v8 to convert more types, and to work better when the encoding of strings isn't UTF-8.

This new code will call to_s when it's available in order to avoid simply sending an error message to V8.
Moreover, this PR make the string conversion code more resilient to varying string encodings, UTF-8 sometimes not being used. A fast path is used for the most common (UTF-8, ASCII, Latin1) and a slower path perform the actual conversion if more exotic encodings are used.

…and to work better when the encoding of strings isn't UTF-8.
@SamSaffron
Copy link
Collaborator

I am seeing lots of failures in travis, can you have a look? Overall I welcome this change but we need to make sure it works everywhere.

@Taiki-San Taiki-San force-pushed the backport/improve-string-conversion branch from b90b2f9 to 5b0afc8 Compare February 10, 2019 10:04
@Taiki-San
Copy link
Contributor Author

Oh, you're correct, I forgot to include a header. This should be good to go!

@SamSaffron
Copy link
Collaborator

Hmmm the utf checker seems to come from https://github.com/lemire/fastvalidate-utf-8 ...we need to properly attribute this

@SamSaffron
Copy link
Collaborator

Also I wonder if we can just go slower and carry a lot less code by force encoding if we get something that is not UTF8 from Ruby? I am uneasy about carrying all this extra c++ code

@Taiki-San
Copy link
Contributor Author

This change was initially instigated on our side as a performance optimisation. Due to the context in which our code is used, we have to be extremely robust to weird encodings and to avoid any corruption of string being transmitted to V8. This change resulted in substantial performance improvements in our benchmark (10-20% faster end-to-end, the overhead is now quite probably close to nil) over a similar Ruby implementation. Because those constraints are quite unusual, I can see why the maintenance tradeoff wouldn't be worth it for you.

If you prefer it as it was, I can rollback the encoding changes. I believe force_encoding only changes the string metadata and thus is pointless if we're about to collect its bytes into a C++ object.

If you feel comfortable proceeding with our changes, I agree that I initially didn't properly credit the UTF-8 checking code. Besides in a header, where do you think I should update the credit?

@cataphract
Copy link
Contributor

@Taiki-San @SamSaffron I've added the notices required for compliance with the terms of the Apache 2 license, under which the added header is licensed.

As to whether the header is necessary: not really, ruby has the same functionality for validating UTF-8: we can call valid_encoding?. It's just faster this way. The UTF-8 validation is critical in our case and it represents a big chunk of our overhead, so this is a valuable improvement for us.

@SamSaffron
Copy link
Collaborator

@cataphract / @Taiki-San I am open to accepting this on the condition that we aim for this to be temporary, can you open up a change request to ruby (at: https://bugs.ruby-lang.org/issues and PR to the ruby/ruby repo) to expose a "fast" valid_encoding? to c extensions, then longer term in 2.7 we can remove this extra file.

@ahorek
Copy link

ahorek commented Dec 23, 2019

too late for Ruby 2.7, but Shopify/ruby#2 looks promising!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants