Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix logprobs when multiple tokens are returned at once. #141

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

zewt
Copy link
Contributor

@zewt zewt commented Jun 24, 2024

This fixes a few issues with logprobs:

  • Return the tokens that were actually selected in logprobs.tokens and logprobs.token_logprobs (currently they use the first entry in top_logprobs).
  • Handle the text including multiple tokens, including each token in the logprobs arrays.
  • Fix top_logprobs flattening multiple tokens. Currently if there are two tokens in the entry, top_logprobs includes the top N tokens from each in a single flattened array. Return each token's logprobs in its own element instead.
  • Include an index for each token in text_offset.

Here's an example of the current output. To reproduce this more easily, I set "Helloxxx" as a stop string, which causes "Hello" + " !" to be returned together by exllamav2:

"choices": [{
    "text": "Hello!",
    "logprobs": {
        "text_offset": [6],
        "token_logprobs": [-0.31962889432907104],
        "tokens": ["Hi"],
        "top_logprobs": [{
            "Hi": -0.31962889432907104,
            "Hello": -1.3464146852493286,
            "HI": -5.0294508934021,
            "Hey": -5.989271640777588,
            "_HI": -7.116503715515137,
            "!": -1.1828906536102295,
            "there": -7.935122966766357,
            " There": -7.935122966766357,
            " THERE": -8.649408340454102,
        }]
    }
}]

Note that "tokens" is "Hi", even though the actual text is "Hello!", and the logprobs for the two are lumped together. With this update:

"choices": [{
    "text": "Hello!"
    "logprobs": {
        "text_offset": [0, 1],
        "token_logprobs": [-1.3473599425440113, -1.253579154961466],
        "tokens": ["Hello", "!"],
        "top_logprobs": [{
            "Hi": -0.3383535146713257,
            "Hello": -1.2981750965118408,
            "HI": -4.947728633880615,
            "Hey": -6.052639484405518,
        }, {
            " there": -0.37861719727516174,
            "!": -1.159867286682129,
            "there": -7.878617286682129,
            " There": -7.92326021194458,
        }]
    }
}]

On the chat completion side, with a similar output where "Hello" + "!" are returned together:

"choices": [{
    "message": {
        "role": "assistant",
        "content": "Hello! It's nice to meet you."
    },
    "logprobs": {
        "content": [{
            "token": "Hello",
            "logprob": -0.0008049269672483206,
            "top_logprobs": []
        }, {
            "token": " It",
            "logprob": -0.16903051733970642,
            "top_logprobs": [
                { "token": "Hello", "logprob": -0.0008049269672483206, "top_logprobs": null },
                { "token": "Hi", "logprob": -8.594554901123047, "top_logprobs": null }
            ]
        }],
    }
}]

The tokens are mismatched: the "!" token is missing and the top_logprobs are off by one. This now returns:

"choices": [{
    "message": {
        "role": "assistant",
        "content": "Hello! It's nice to meet you."
    },
    "logprobs": {
        "content": [{
            "token": "Hello",
            "logprob": -0.0008823590930566984,
            "top_logprobs": [
                { "token": "Hello", "logprob": -0.0008823591051623225 },
                { "token": "Hi", "logprob": -8.235257148742676 },
            ]
        }, {
            "token": "!",
            "logprob": -0.0008548574740537445,
            "top_logprobs": [
                { "token": "!", "logprob": -0.0008548574987798929 },
                { "token": " there", "logprob": -7.094604969024658 }
            ]
        }]
    }
}]

A couple things that still need to be figured out:

  • I'm not sure if text_offset supposed to be the offset into the text string (this is close to what it was doing before, so I went with that for now), or the offset into the full context. I can't find OAI docs on this, but from some API snippets I've seen it might be the latter. (It's simple to derive from the other data, so maybe nobody's actually using this field right now.)

  • Results are odd when token healing is enabled, since the regenerated initial token is included in the list. For example, if the context was "https://", and token healing backs up by three characters and generates "://www", it currently returns that whole underlying token (and a text_offset of -3, since the token starts three characters before the start of the output). But from the client's perspective all that the model actually generated was "www". The token healing overlap should probably be trimmed off from the output, so concatenating the "token" in each entry always gives the same result as "text". I'll return to this after discussion.

@zewt zewt force-pushed the logprobs branch 3 times, most recently from e55d3c7 to eefb572 Compare June 25, 2024 00:04
This also brings the two chat/completions code paths back into
alignment.
@zewt
Copy link
Contributor Author

zewt commented Jul 18, 2024

Token healing is tricky. I think the behavior I described above is correct (if the tokens in logprobs are "http://" and "https://", but "http" is part of token healing overlap and not actually output, the API should strip it out from logprobs too and return "://" and "s://"). But I think doing this needs more information from exllamav2. I tried implementing this by looking at the lengths of the tokens to make a guess, but that's not correct in general (for example, it's wrong if skip_special_tokens is false).

text_offset has a similar problem: it advances by the length of the token, but that's wrong in several cases (skip_special_tokens false, token healing, perhaps others). Maybe the same information for token healing would help here too, like exllamav2 calculating the offset into text (it has the missing info to do this correctly). I'm also still not sure whether text_offset is meant to be from the start of the response or context since I can't find OAI docs for it.

I think these are separate issues and should be explored separately from this patch.

@zewt zewt marked this pull request as ready for review July 18, 2024 07:54
@zewt
Copy link
Contributor Author

zewt commented Sep 25, 2024

Here's a simple repro:

curl 'http://10.0.0.7:5000/v1/completions' \
  -H 'Authorization: Bearer xxx' \
  -H 'Content-Type: application/json' \
  --data-raw '{"logprobs":2,"max_context_length":2048,"max_tokens":4,"token_healing":true,"prompt":"<|start_header_id|>system<|end_header_id|>\n\nYou are a helpful AI.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nSay hello!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n", "stop": "Helloxxx"}'

This is with turboderp/Llama-3.1-8B-Instruct-exl2_5.0bpw, but I think any Llama version will repro it. The "Helloxxx" stop sequence just makes it easier to repro since it triggers buffering in exllamav2 when the model says "Hello". response.txt

This shows the issues in original example:

  • top_logprobs shows ["Hello", " Hello", "!", " there"] in a single token. Two separate token results have been merged: "Hello" and "!".
  • The "tokens" array is missing "!".
  • It also shows the token healing problem: the first entry in tokens and top_logprobs show token healing regenerating "\n\n", which I think should be invisible to the API (this part isn't fixed by this PR).

One thing this doesn't show is that "tokens" always uses the first entry from top_logprobs, instead of the token that was actually chosen. To see that, set temperature to 2 and change the prompt to just "Hello!". The tokens array will be completely different from the actual results. Temperature will cause different tokens to be chosen, but the tokens array doesn't use them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant