-
Notifications
You must be signed in to change notification settings - Fork 116
Add gate to disable control token decoding #856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add gate to disable control token decoding #856
Conversation
llama-cpp-rs original usage required ommiting control tokens from the consumer of the library. This should not be the default though so now this behavior can be selectively be enabled through an environment variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. See comment.
given that the `special` function argument is used to toggle if the cpp bindings to llama.cpp render special tokens to the output the flag can also be reused to feature gate the exclusion of `token_bos` and `token_eos` from the output.
|
I have now changed the implementation to reused the I have read up on the llama.cpp docs of the relevant function: https://llama-cpp-python.readthedocs.io/en/latest/api-reference/#llama_cpp.llama_cpp.llama_token_to_piece. Still this leaves me with a few questions. In its current form the condition now looks like this: if attrs.is_empty()
|| attrs
.intersects(LlamaTokenAttr::Unknown | LlamaTokenAttr::Byte | LlamaTokenAttr::Unused)
|| attrs.contains(LlamaTokenAttr::Control)
&& (token == self.token_bos() || token == self.token_eos()) && special == Special::PlaintextGiven that I mean the same result should have been achievable by just the condition if attrs.is_empty()
|| attrs
.intersects(LlamaTokenAttr::Unknown | LlamaTokenAttr::Byte | LlamaTokenAttr::Unused)and just using Also given that if attrs.is_empty()
|| attrs
.intersects(LlamaTokenAttr::Unknown | LlamaTokenAttr::Byte | LlamaTokenAttr::Unused)
|| attrs.contains(LlamaTokenAttr::Control) && special == Special::PlaintextAll of this feels a bit off to me, and I since I don't know enough about how this folds out in any downstream code that relies on this behavior, it is hard to reason about. Hence me asking a lot of questions 🤣 Maybe a better approach would be to add a new variant to the enum like this: pub enum Special {
/// Allow tokenizing special and/or control tokens which otherwise are not exposed and treated as plaintext. Does not insert a leading space.
Tokenize,
/// Exclude `bos` and `eos` token from decoding but keep all other special tokens as is
ExcludeBosAndEos
/// Treat special and/or control tokens as plaintext.
Plaintext,
}Thinking about it this seems like a cleaner solution, so I will add a commit which implements it. If you think this is to complicated I can just remove it from the PR again ;) |
to keep all ranges of behavior possible a special variant to the `Special` enum was introduced (ExcludeBosAndEos). It allows decoding of tokens but excludes `bos` and `eos` tokens from the stream. This can be used to keep the old behavior of llama-cpp-rs for decoding streams while allowing the expected behavior that all tokens will be decoded by default. See utilityai#856 for the discussion about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should deprecate the current function (as well as all of our functions that call it), create a new versions that call token_to_peice without any special logic. (no messing around with attrs, as I imagine this is pretty application specific)
This aligns better with the goal of being a safe wrapper and once the deprecated functions are removed there is less code to maintain.
Special should remain an enum of two variants.
This PR implements the required changes to address #826
For easy opt in to the original behavior I added a check to an environment variable. This would make the behavior controllable
at runtime. I can easily change this to be a compile time opt in by using a rust feature flag. Let me know what you think …
Kind regards
ju6ge