-
-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weird issue with context length #220
Comments
What is the sequence length set to in the model config? Maybe something weird is happening if you haven't changed it from the default (2048), and it tries to generate a negative number of tokens. |
thanks for the reply
here is the model config file, I got the model from Llama-2-70B-chat-gptq |
Is there more of this error message?
It looks like it's been cut off. Also, the line number is weird. Has something else been modified in |
I got a similar error. It seems to come from trying to put too many tokens into the model. I was putting 5k words into the model.
I'm using the_bloke/vicuna-13B-v1.5-16K-GPTQ which is supposed to be a 16k context model so it should be able to handle it. At any rate, this is the relevant portions of the config.json.
What I found that worked was changing the parameters on lines 82-87 in model.py
Previously, these were 2048, 2048, 4096, and 1.0 respectively. This worked and seems to give reasonable results but I'm not sure if it's the correct way to go about it. |
@w013nad Where do you define those changes? In the source code or generator model settings?
I am using the same model but getting the following error RuntimeError: start (2048) + length (1265) exceeds dimension size (2048). |
@w013nad You wouldn't need to hard-code new values into the config class. You can just override the values after creating the config. Also, it looks like that config file is incorrect. "max_sequence_length" and "max_position_embeddings" should mean the same thing, or at least I don't know how to interpret those values if they're different. The @Rajmehta123 The |
First of all, thanks a lot for this great project!
I got a weird issue when generating with llama 2 on 4096 context using
generator.generate_simple
,As I understand the code, it already limits the number of new tokens to under the context limit. Is there any settings that I might need to change?
The text was updated successfully, but these errors were encountered: