Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix regression tests on CUDA and ROCm #23

Closed
jakki-amd opened this issue Nov 18, 2024 · 1 comment
Closed

Fix regression tests on CUDA and ROCm #23

jakki-amd opened this issue Nov 18, 2024 · 1 comment
Assignees
Milestone

Comments

@jakki-amd
Copy link
Collaborator

Fix regression tests

@jakki-amd jakki-amd self-assigned this Nov 18, 2024
@jakki-amd jakki-amd added this to the Week 47 milestone Nov 18, 2024
@jakki-amd jakki-amd assigned eppane and unassigned jakki-amd Nov 25, 2024
@jakki-amd jakki-amd modified the milestones: Week 47, Week 48 Nov 25, 2024
@eppane
Copy link
Collaborator

eppane commented Nov 25, 2024

Appended the comment in pull request #5:

OBS! The test test_handler.py::test_huggingface_bert_model_parallel_inference fails due to:

ValueError: Input length of input_ids is 150, but max_length is set to 50. This can lead to unexpected behavior. You should consider increasing max_length or, better yet, setting max_new_tokens.

This indicates that preprocessing uses a different max_length than inference, which can be verified when looking at the handler when the test was originally implemented: model.generate() has max_length=50 by default, while tokenizer uses max_length from setup_config (max_length=150). It seems that the bert-based Textgeneration.mar needs an update.

@eppane eppane closed this as completed Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants