-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance of decodeUtf8 on non-ascii text #1
Comments
Do you have any ideas for how it could be optimised? Why can you not reach the efficiency of decodeUtf8 from Data.Text? I think it can be achieved with sheer willpower alone. |
It could certainly match the performance of |
Is it possible to write something like 'isUtf8' (which we know can be made relatively efficient), and if that function returns true, make a pass over the text? By the way, the willpower thing was a joke. |
We have to perform the check for utf8 and then decode the character anyway, so maybe that would be an OK solution. |
Actually, that's already what it does. It just passes over the |
The performance of
decodeUtf8
is excellent on ascii text, since the check can be vectorized to operate on a full machine word at a time, and nearly all branch prediction are correct. However, for non-ascii text, I haven't put much effort into optimizing it. There's no way it could ever compete with the decoding of ascii text, but it could probably be much better than it is now.The text was updated successfully, but these errors were encountered: