-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rules for \0\0 #57
Comments
fm-index/src/suffix_array/sais.rs Lines 397 to 403 in 68c8357
Ah, providing |
So why is \0\0 forbidden at the end but not elsewhere? |
This restriction originates from the SA-IS algorithm to build suffix arrays. SA-IS categorizes every character in a text into three types: S-Type, L-Type, and LMS-type. This algorithm expects that the given text must end with an LMS-type character, which is usually the end-marker zero. Appending more zeros to the text breaks this assumption. |
So if I get this right:
is allowed. this is one empty string and one string Yet
is not, you can't have So this means an empty string at the end is forbidden, which seems like an odd exception. I think the rule should be that either empty strings are completely forbidden or not at all. |
I've been reading your changes with interest. One thing looks confusing:
test_sais_too_many_trailing_zero
confirms that you cannot have\0\0
at the end of a string. This made me wonder whether "empty strings" (nothing between\0\0
) is forbidden. But then I found thattest_sais_too_many_trailing_zero
explicitly allows this!What is the rule?
The text was updated successfully, but these errors were encountered: