-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parallel processing for OpenAI completion models #1460
Conversation
Thanks very much for this PR! I will try to review it as soon as I can. |
Hi @pbevan1 ,
AFAIK, the lm-eval-harness does not support any parallelization. Is it possible for you to do a stress test or an evaluation for both the approaches and then put up numbers here? It will be a great exercise and will help us make an informed decision as to how we want to proceed ahead with supporting these kind of operations for other LM models as well. (Again, @haileyschoelkopf please feel free to correct me here, I might be wrong) I believe adding tests should be a priority here. |
Any plans on resolving conflicts, adding changes/updates and merging before the new release? |
Bump - this would be a really useful feature that would massively speed up eval time |
Is this still needed? Since this is closed #1095 |
yeah. For example with defaults I get 0:23 for 40 requests to openai and 0.02 with |
Add parallel processing for OpenAI completion models
Implements https://github.com/openai/openai-cookbook/blob/main/examples/api_request_parallel_processor.py to speed up OpenAI API calls as per #1410
Gives many X speedup for OpenAI models (dependent on users rate limits)
Uses a 1 token dummy request to get a user & model specific Token Per Minute (TPM) rate limit. Requests Per Minute (RPM) are not available programmatically for some reason, but I've raised this with openai. Both RPM and TPM can be overridden in the
gen_kwargs
usingmax_tokens_per_minute
andmax_requests_per_minute
. Not sure if/where that should be documented?Just implemened for openai chat completions since completions is now legacy, but easy to also do this for completions. Also just separated the local model call from the OpenAI model call as not sure people would want to do async/parallel for a local model?
Also, I follow the openai example and cache requests to jsonl file. I cache in temp and do clean up after, but please let me know if it would be better just doing it in memory (I'm not certain of the size of all of the evaluations).