-
Notifications
You must be signed in to change notification settings - Fork 485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow transition from end states to self with EOS as the only token allowed #606
Conversation
@rlouf Simple crash fix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, makes sense, haven't smoke tested. Would prefer a test case which fails before this change and passes after though.
A simpler regex to reproduce: In this case state The regression test will verify the resulting |
5bc837f
to
c972ba0
Compare
Added a regression test case, which fails before the fix: Apparently Please note that it is much cleaner if we use visibly different token IDs from the state numbers, like token IDs above 100 in the case of my test. This makes the values in With the fix applied # We make sure that it is possible to generate strings in the language
# of the regular expression with the tokens present in the model's
# vocabulary.
if not any(
regex_fsm.finals.intersection(v.values())
for v in states_to_token_maps.values()
):
raise ValueError(
"The vocabulary does not allow us to build a sequence that matches the input regex"
) I think such tests should not remain in the production code, but go into test cases. Also, the |
It reproduces the case where state 5 is missing from the generated `fsm.states_to_token_maps`.
This test case fails now, which is expected until the fix is applied.
c972ba0
to
d284bdc
Compare
# Allow transitions to EOS from all terminals FSM states that are | ||
# reachable | ||
# TODO: Do we really need this anymore? | ||
# Allow transitions to EOS from all terminals FSM states that are reachable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be covered by this function. We should understand why it isn't in your case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can get on board with this, but if I understand well we should be able to remove the if state in self.final_states: return [eos_token_id]
in RegexFSM
?
I think the core issue is that a recent change removed the actual FSM final states from |
Should be fixed by #734. Closing for now. |
Fixes #605