Add gate to disable control token decoding #856

ju6ge · 2025-11-03T09:23:09Z

This PR implements the required changes to address #826

For easy opt in to the original behavior I added a check to an environment variable. This would make the behavior controllable
at runtime. I can easily change this to be a compile time opt in by using a rust feature flag. Let me know what you think …

Kind regards
ju6ge

llama-cpp-rs original usage required ommiting control tokens from the consumer of the library. This should not be the default though so now this behavior can be selectively be enabled through an environment variable

MarcusDunn

Thanks for the PR. See comment.

llama-cpp-2/src/model.rs

given that the `special` function argument is used to toggle if the cpp bindings to llama.cpp render special tokens to the output the flag can also be reused to feature gate the exclusion of `token_bos` and `token_eos` from the output.

ju6ge · 2025-11-04T11:44:29Z

I have now changed the implementation to reused the special parameter which is already present in the function signature.

I have read up on the llama.cpp docs of the relevant function: https://llama-cpp-python.readthedocs.io/en/latest/api-reference/#llama_cpp.llama_cpp.llama_token_to_piece.

Still this leaves me with a few questions. In its current form the condition now looks like this:

        if attrs.is_empty()
            || attrs
                .intersects(LlamaTokenAttr::Unknown | LlamaTokenAttr::Byte | LlamaTokenAttr::Unused)
            || attrs.contains(LlamaTokenAttr::Control)
                && (token == self.token_bos() || token == self.token_eos()) && special == Special::Plaintext

Given that special is converted to a boolean that is used to indicate to llama.cpp if it will decode special characters at all. I am again wondering why the original condition was there in the first place?

I mean the same result should have been achievable by just the condition

        if attrs.is_empty()
            || attrs
                .intersects(LlamaTokenAttr::Unknown | LlamaTokenAttr::Byte | LlamaTokenAttr::Unused)

and just using special == Special::Plaintext right? There are more special characters than bos and eos though, so whats the reasoning of just excluding those?

Also given that bos and eos are special tokens and if the explicit condition should be kept it should be reducable to

        if attrs.is_empty()
            || attrs
                .intersects(LlamaTokenAttr::Unknown | LlamaTokenAttr::Byte | LlamaTokenAttr::Unused)
            || attrs.contains(LlamaTokenAttr::Control) && special == Special::Plaintext

All of this feels a bit off to me, and I since I don't know enough about how this folds out in any downstream code that relies on this behavior, it is hard to reason about. Hence me asking a lot of questions 🤣

Maybe a better approach would be to add a new variant to the enum like this:

pub enum Special {
    /// Allow tokenizing special and/or control tokens which otherwise are not exposed and treated as plaintext. Does not insert a leading space.
    Tokenize,
    /// Exclude `bos` and `eos` token from decoding but keep all other special tokens as is
    ExcludeBosAndEos
    /// Treat special and/or control tokens as plaintext.
    Plaintext,
}

Thinking about it this seems like a cleaner solution, so I will add a commit which implements it. If you think this is to complicated I can just remove it from the PR again ;)

to keep all ranges of behavior possible a special variant to the `Special` enum was introduced (ExcludeBosAndEos). It allows decoding of tokens but excludes `bos` and `eos` tokens from the stream. This can be used to keep the old behavior of llama-cpp-rs for decoding streams while allowing the expected behavior that all tokens will be decoded by default. See utilityai#856 for the discussion about this.

MarcusDunn

I think we should deprecate the current function (as well as all of our functions that call it), create a new versions that call token_to_peice without any special logic. (no messing around with attrs, as I imagine this is pretty application specific)

This aligns better with the goal of being a safe wrapper and once the deprecated functions are removed there is less code to maintain.

Special should remain an enum of two variants.

use environment variable to disable control token decoding

d101cfa

llama-cpp-rs original usage required ommiting control tokens from the consumer of the library. This should not be the default though so now this behavior can be selectively be enabled through an environment variable

MarcusDunn requested changes Nov 3, 2025

View reviewed changes

llama-cpp-2/src/model.rs Outdated Show resolved Hide resolved

instead of an env variable use special parameter

9459a28

given that the `special` function argument is used to toggle if the cpp bindings to llama.cpp render special tokens to the output the flag can also be reused to feature gate the exclusion of `token_bos` and `token_eos` from the output.

MarcusDunn approved these changes Nov 4, 2025

View reviewed changes

MarcusDunn requested changes Nov 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add gate to disable control token decoding #856

Add gate to disable control token decoding #856

Uh oh!

ju6ge commented Nov 3, 2025 •

edited

Loading

Uh oh!

MarcusDunn left a comment •

edited

Loading

Uh oh!

Uh oh!

ju6ge commented Nov 4, 2025

Uh oh!

MarcusDunn left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add gate to disable control token decoding #856

Are you sure you want to change the base?

Add gate to disable control token decoding #856

Uh oh!

Conversation

ju6ge commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MarcusDunn left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ju6ge commented Nov 4, 2025

Uh oh!

MarcusDunn left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ju6ge commented Nov 3, 2025 •

edited

Loading

MarcusDunn left a comment •

edited

Loading

MarcusDunn left a comment •

edited

Loading