-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast codepointOffset #451
base: master
Are you sure you want to change the base?
Fast codepointOffset #451
Conversation
I'm not sure why older GHCs are unable to infer the types for the tests I've added, since the types should all be trivially known (Text and Char). |
Thanks @axman6! I suggest we start with |
Yeah I've been working on rewriting the C to avoid going via memmem, and removing the twoway_memmem would significantly reduce the amount of code to maintain. I would guess there are faster memmem implementations out there, hopefully under permissive licenses too. I'll get the changes working and push those today. |
I have a suspicion that Anyways, let's separate concerns. From my perspective the first task is to add |
I'll try and find some time to write a Haskell only version, and then we can think about making a faster C one later. I wonder if it's worth having both, and only moving to the C call when there's enough data to justify it. |
Implements
codepointOffset
with code from the FreeBSD project.I'm planning to explore making a vectorised implementation of the searching for 2, 3 and 4 char codepoints, but will leave that out in the first iteration.
This may be relevant to #369, by eliminating the need to decode codepoints via Haskell.