-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add llama 31 support #87
Conversation
Add Llama 3.1 in test_decode.py Set generation_config._eos_token_tensor to None
Add llama 3.1 Support
We only require So how can we handle this
I understand your point about maintaining alignment with the original transformers’ code to simplify updates and maintenance. After reviewing the changes in In the meantime, as a workaround to address the |
@@ -120,6 +120,7 @@ def create( | |||
) | |||
generation_config.max_length = max_seq_length | |||
|
|||
generation_config._eos_token_tensor = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need to modify this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tengomucho Sorry, I missed to inform about this. While running TGI with Llama 3.1 8B below was the error. I feel this too is an unreasonable hack to run Llama 3.1 and not a proper way to handle error. Please suggest on how we can handle this.
2024-08-05T16:51:28.059351Z INFO text_generation_launcher: Warmup (this can take several minutes)
2024-08-05T16:51:28.062291Z DEBUG text_generation_launcher: Generator@0 WARMUP
2024-08-05T16:51:28.062400Z DEBUG text_generation_launcher: Warming up the model
2024-08-05T16:51:28.063135Z DEBUG text_generation_launcher: Warmup for 1 requests, truncate value 256 seq_len 150
2024-08-05T16:51:28.063287Z DEBUG text_generation_launcher: Prefilling 1 new request(s) adding to 0 active slot(s)
2024-08-05T16:51:28.063567Z DEBUG text_generation_launcher: Request 0 assigned to slot 0
2024-08-05T16:51:28.066032Z DEBUG text_generation_launcher: Error in command WARMUP
2024-08-05T16:51:28.067444Z DEBUG text_generation_launcher: Traceback (most recent call last):
2024-08-05T16:51:28.067451Z DEBUG text_generation_launcher: File "/usr/local/lib/python3.10/dist-packages/text_generation_server/generator.py", line 842, in _mp_fn
2024-08-05T16:51:28.067455Z DEBUG text_generation_launcher: return_to_caller(generator.warmup(batch=batch))
2024-08-05T16:51:28.067457Z DEBUG text_generation_launcher: File "/usr/local/lib/python3.10/dist-packages/text_generation_server/generator.py", line 420, in warmup
2024-08-05T16:51:28.067475Z DEBUG text_generation_launcher: _generations, next_batch = self.prefill(warmup_batch)
2024-08-05T16:51:28.067480Z DEBUG text_generation_launcher: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2024-08-05T16:51:28.067484Z DEBUG text_generation_launcher: return func(*args, **kwargs)
2024-08-05T16:51:28.067493Z DEBUG text_generation_launcher: File "/usr/local/lib/python3.10/dist-packages/text_generation_server/generator.py", line 513, in prefill
2024-08-05T16:51:28.067496Z DEBUG text_generation_launcher: selector = TokenSelector.create(
2024-08-05T16:51:28.067498Z DEBUG text_generation_launcher: File "/opt/optimum-tpu/optimum/tpu/generation/token_selector.py", line 124, in create
2024-08-05T16:51:28.067501Z DEBUG text_generation_launcher: logits_processor = model._get_logits_processor(
2024-08-05T16:51:28.067504Z DEBUG text_generation_launcher: File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 871, in _get_logits_processor
2024-08-05T16:51:28.067508Z DEBUG text_generation_launcher: and generation_config._eos_token_tensor is not None
2024-08-05T16:51:28.067512Z DEBUG text_generation_launcher: AttributeError: 'GenerationConfig' object has no attribute '_eos_token_tensor'
2024-08-05T16:51:28.067521Z DEBUG text_generation_launcher:
2024-08-05T16:51:28.067780Z ERROR text_generation_launcher: Method Warmup encountered an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I see, this is a bug in transformers, I see an issue open about it already: huggingface/transformers#32207
I will try to open a PR to fix that in transformers. In the meanwhile, a workaround could be done in the generator.py file. We already modify the generation_config to suit TGI, so we could add one more change in the Slot.assign
method, something like this
# Workaround to avoid bug in token_utils in transformers.
self._generation_config._eos_token_tensor = getattr(self._generation_config, "_eos_token_tensor", None)
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
This PR is closed as it does not contain recent changes in the main branch. A new PR is created as replacement. |
FYI @Bihan, next time you can just rebase to the master branch and force-push:
This way you do not need to open a new PR 🤗 |
What does this PR do?
This PR adds Llama 3.1 8B support. Please refer to the PR for discussion history.
Fixes