Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] large file scoring #206

Closed
vince62s opened this issue Mar 4, 2024 · 3 comments
Closed

[QUESTION] large file scoring #206

vince62s opened this issue Mar 4, 2024 · 3 comments
Labels
question Further information is requested

Comments

@vince62s
Copy link

vince62s commented Mar 4, 2024

When scoring a large file (say > 100K records) why does it start with a high throughput , for instance say 50 it/sec, and quickly after a few 10K records it drops significantly (more than half)

Thanks

@vince62s vince62s added the question Further information is requested label Mar 4, 2024
@vince62s
Copy link
Author

vince62s commented Mar 4, 2024

could it be the same as: #158

@ricardorei
Copy link
Collaborator

Training is typically influenced by various factors, but for inference, batch sorting is employed to minimize padding. Consequently, the longest batches end up being processed in the end resulting in a higher number of tokens per batch compared to the beginning.

@ricardorei
Copy link
Collaborator

you can check the difference by setting length_batching to False

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants