Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to load Llama-2-13b with max_input_length > context_length (4096 tokens) #348

Closed
2 of 4 tasks
rajasbansal opened this issue Mar 20, 2024 · 1 comment · Fixed by #350
Closed
2 of 4 tasks
Assignees
Labels
bug Something isn't working

Comments

@rajasbansal
Copy link

rajasbansal commented Mar 20, 2024

System Info

lorax:latest with docker

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

sudo docker run --gpus all -e DISABLE_SGMV=1 -e ROPE_SCALING=linear -e ROPE_FACTOR=4 --shm-size 1g -p 80:80 ghcr.io/predibase/lorax:latest --model-id NousResearch/Llama-2-13b-hf --dtype float16 --port 80 --num-shard 4 --max-input-length 8000 --max-total-tokens 8002 --max-batch-prefill-tokens 8000

Expected behavior

Here I am trying to use linear rope scaling to use Llama-2-13b on longer input sizes than 4096 tokens, however I see that the model fails to come up during warmup. It seems to be related to the max posiition embeddings in the Llama Config. When I set that value to a higher value, it works as expected as long as max_total_tokens <= max_positiion_embeddings. If max_total_tokens > max_positiion_embeddings and max_input_length < max_positiion_embeddings, then the model comes up during warmup but hangs when making a call to the model. ile

@tgaddair
Copy link
Contributor

Hey @rajasbansal, I just put up PR #350 which should fix this issue. However, I could not repro the hanging issue you saw after fixing the max position embeddings. I suspect the issue there is related to the available memory on the device somehow. What kind of GPUs are you running on?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants