You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[29], line 1
----> 1 python -VV
NameError: name 'python' is not defined
not working in sagemaker studio notebook?
recently i finetuned a Mistral 7B Instruct v0.3 model and deployed it on an AWS Sagemaker endpoint. But got errors like this during inference in the sagemaker studio notebook:
" Received client error (422) from primary with message "{"error":"Input validation error: inputs tokens + max_new_tokens must be <= 4096. Given: 877 inputs tokens and 4096 max_new_tokens","error_type":"validation"}"."
Which means I am limited to 4096 Tokens. But max. tokens should be the following:
Mistral 7B Instruct v0.1 = 8192
Mistral 7B Instruct v0.2,v0.3 = 32k
First, I tested all model and fine-tuning parameters with 4096 as the value, which were quite a few since everything is a multiple of 512. This didn’t do anything, so it was a bust.
After figuring out that this mostly means the error lies with the deployment container, I at least had a hint. After lengthy Googling, it turned into a jackpot :)
Python -VV
Pip Freeze
Reproduction Steps
recently i finetuned a Mistral 7B Instruct v0.3 model and deployed it on an AWS Sagemaker endpoint. But got errors like this during inference in the sagemaker studio notebook:
" Received client error (422) from primary with message "{"error":"Input validation error: inputs tokens + max_new_tokens must be <= 4096. Given: 877 inputs tokens and 4096 max_new_tokens","error_type":"validation"}"."
Which means I am limited to 4096 Tokens. But max. tokens should be the following:
Mistral 7B Instruct v0.1 = 8192
Mistral 7B Instruct v0.2,v0.3 = 32k
Input parameter were: "parameters": {"max_new_tokens": 4096, "do_sample": True}
I also hosted the basemodels from huggingface on sagemaker endpoints and they all seem to be limited to 4096 tokens.
Does anyone know how to fix this?
Expected Behavior
During inference the token limits should be far higher than 4k.
Under 4k inference works as intended.
Additional Context
I got the code for deployment on AWS Sagemaker from here: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3
Suggested Solutions
No response
The text was updated successfully, but these errors were encountered: