-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Further optimize the compose code #8
Comments
Just random thoughts:
|
I think we can go a long way even with full decomposition and composition. I wrote
It is pretty expensive even with this null definition of composePair. I guess quick check based normalization can be performed as we are writing to an array. It should roughly go like this - we look up the current char in the quick check table, if the character is NO or Maybe or is in the composition exclusion table then we backtrack to the last starter. We have access to the whole array to which we have already written the previous chars and we should be able to easily backtrack in that array. We will have to create the quickcheck properties file from the UCD, in ucd2haskell script. It is in http://www.unicode.org/Public/UCD/latest/ucd/DerivedNormalizationProps.txt . In addition to the links in the issue description above, the normalization forms doc has a lot of useful information, especially these sections: |
Another thing - |
I do not think that |
Decompose code is well optimized but compose still has a lot of scope for optimization. Though its performance is close to
utf8proc
that we were using earlier, it is still far away fromicu
compose performance.Currently we are not using quickcheck properties of unicode database, we can explore using the quickcheck properties to speed up the case when the string is already in composed or almost composed form. Some related links:
The text was updated successfully, but these errors were encountered: