Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customise prefix code generation #2

Open
no-defun-allowed opened this issue Apr 8, 2021 · 2 comments
Open

Customise prefix code generation #2

no-defun-allowed opened this issue Apr 8, 2021 · 2 comments
Labels
enhancement New feature or request

Comments

@no-defun-allowed
Copy link
Collaborator

one-more-re-nightmare splits a regular expression into a prefix string and the rest of the RE. The prefix is scanned for more efficiently using Boyer-Moore-Horspool, before entering the DFA body generated for the rest of the RE. (This technique is also used in cl-ppcre and GNU grep to my knowledge.)

However, it is possible that BMH is not ideal for processing prefixes. Modern processors have vector arithmetic units which can perform many element comparisons at once, and it is not too hard to design a substring search which uses vectorised code. When combined with heuristics based on properties known about the vector to search, vectorised searching can be faster than clever serial searching algorithms.

It should be possible to allow the client to generate their own prefix scanning code. The interface to the BMH code generator provides most of the required information to generate code, with the lambda list (start length vector prefix aref-generator fail). The first three arguments are symbols naming internal variables o-m-r-n uses, then the prefix sequence, then the :aref-generator argument to compile-regular-expression, then some code to evaluate when there are no more matches.

@no-defun-allowed no-defun-allowed added the enhancement New feature or request label Apr 8, 2021
@no-defun-allowed
Copy link
Collaborator Author

bbfa2cf provides an object for prefix code generation, but it is "bare" and the heuristics it uses are implemented using inheritance and call-next-method instead of a compiler object making decisions.

@no-defun-allowed
Copy link
Collaborator Author

This has sort of been done now, but there is no public interface yet. I have to think about how to expose everything in the right way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant