Skip to content

Conversation

@ju6ge
Copy link

@ju6ge ju6ge commented Nov 3, 2025

This PR implements the required changes to address #826

For easy opt in to the original behavior I added a check to an environment variable. This would make the behavior controllable
at runtime. I can easily change this to be a compile time opt in by using a rust feature flag. Let me know what you think …

Kind regards
ju6ge

llama-cpp-rs original usage required ommiting control tokens from the
consumer of the library. This should not be the default though so now
this behavior can be selectively be enabled through an environment variable
Copy link
Contributor

@MarcusDunn MarcusDunn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. See comment.

given that the `special` function argument is used to toggle if the
cpp bindings to llama.cpp render special tokens to the output the flag
can also be reused to feature gate the exclusion of `token_bos` and
`token_eos` from the output.
@ju6ge
Copy link
Author

ju6ge commented Nov 4, 2025

I have now changed the implementation to reused the special parameter which is already present in the function signature.

I have read up on the llama.cpp docs of the relevant function: https://llama-cpp-python.readthedocs.io/en/latest/api-reference/#llama_cpp.llama_cpp.llama_token_to_piece.

Still this leaves me with a few questions. In its current form the condition now looks like this:

        if attrs.is_empty()
            || attrs
                .intersects(LlamaTokenAttr::Unknown | LlamaTokenAttr::Byte | LlamaTokenAttr::Unused)
            || attrs.contains(LlamaTokenAttr::Control)
                && (token == self.token_bos() || token == self.token_eos()) && special == Special::Plaintext

Given that special is converted to a boolean that is used to indicate to llama.cpp if it will decode special characters at all. I am again wondering why the original condition was there in the first place?

I mean the same result should have been achievable by just the condition

        if attrs.is_empty()
            || attrs
                .intersects(LlamaTokenAttr::Unknown | LlamaTokenAttr::Byte | LlamaTokenAttr::Unused)

and just using special == Special::Plaintext right? There are more special characters than bos and eos though, so whats the reasoning of just excluding those?

Also given that bos and eos are special tokens and if the explicit condition should be kept it should be reducable to

        if attrs.is_empty()
            || attrs
                .intersects(LlamaTokenAttr::Unknown | LlamaTokenAttr::Byte | LlamaTokenAttr::Unused)
            || attrs.contains(LlamaTokenAttr::Control) && special == Special::Plaintext

All of this feels a bit off to me, and I since I don't know enough about how this folds out in any downstream code that relies on this behavior, it is hard to reason about. Hence me asking a lot of questions 🤣

Maybe a better approach would be to add a new variant to the enum like this:

pub enum Special {
    /// Allow tokenizing special and/or control tokens which otherwise are not exposed and treated as plaintext. Does not insert a leading space.
    Tokenize,
    /// Exclude `bos` and `eos` token from decoding but keep all other special tokens as is
    ExcludeBosAndEos
    /// Treat special and/or control tokens as plaintext.
    Plaintext,
}

Thinking about it this seems like a cleaner solution, so I will add a commit which implements it. If you think this is to complicated I can just remove it from the PR again ;)

to keep all ranges of behavior possible a special variant to the
`Special` enum was introduced (ExcludeBosAndEos). It allows decoding of tokens but excludes
`bos` and `eos` tokens from the stream. This can be used to keep the old
behavior of llama-cpp-rs for decoding streams while allowing the
expected behavior that all tokens will be decoded by default.

See utilityai#856 for the
discussion about this.
Copy link
Contributor

@MarcusDunn MarcusDunn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should deprecate the current function (as well as all of our functions that call it), create a new versions that call token_to_peice without any special logic. (no messing around with attrs, as I imagine this is pretty application specific)

This aligns better with the goal of being a safe wrapper and once the deprecated functions are removed there is less code to maintain.

Special should remain an enum of two variants.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants