Add parallel processing for OpenAI completion models #1460

pbevan1 · 2024-02-22T19:24:50Z

Add parallel processing for OpenAI completion models

Implements https://github.com/openai/openai-cookbook/blob/main/examples/api_request_parallel_processor.py to speed up OpenAI API calls as per #1410

Makes requests concurrently, to maximize throughput
Throttles request and token usage, to stay under rate limits (max_tokens_per_minute, max_requests_per_minute)
Retries failed requests up to {max_attempts} times, to avoid missing data

Gives many X speedup for OpenAI models (dependent on users rate limits)

Uses a 1 token dummy request to get a user & model specific Token Per Minute (TPM) rate limit. Requests Per Minute (RPM) are not available programmatically for some reason, but I've raised this with openai. Both RPM and TPM can be overridden in the gen_kwargs using max_tokens_per_minute and max_requests_per_minute. Not sure if/where that should be documented?

Just implemened for openai chat completions since completions is now legacy, but easy to also do this for completions. Also just separated the local model call from the OpenAI model call as not sure people would want to do async/parallel for a local model?

Also, I follow the openai example and cache requests to jsonl file. I cache in temp and do clean up after, but please let me know if it would be better just doing it in memory (I'm not certain of the size of all of the evaluations).

CLAassistant · 2024-02-22T19:24:56Z

All committers have signed the CLA.

haileyschoelkopf · 2024-02-26T23:52:47Z

Thanks very much for this PR! I will try to review it as soon as I can.

sanchit-ahuja · 2024-03-03T08:06:23Z

Hi @pbevan1 ,
This PR looks quite good to me. Left some comments that you can easily address. Some few general thoughts that I had. @haileyschoelkopf please feel free to add in/chip in.

Can you add some unit tests for your core logic?
Move constants to a constant file and then import them to the file. (Have added comments about this)
Your logic functions are big. Is it possible to chunk them down to smaller functions?

I cache in temp and do clean up after, but please let me know if it would be better just doing it in memory

AFAIK, the lm-eval-harness does not support any parallelization. Is it possible for you to do a stress test or an evaluation for both the approaches and then put up numbers here? It will be a great exercise and will help us make an informed decision as to how we want to proceed ahead with supporting these kind of operations for other LM models as well. (Again, @haileyschoelkopf please feel free to correct me here, I might be wrong)

I believe adding tests should be a priority here.

LSinev · 2024-06-13T08:29:27Z

Any plans on resolving conflicts, adding changes/updates and merging before the new release?

Peter-Devine · 2024-06-28T02:27:00Z

Bump - this would be a really useful feature that would massively speed up eval time

LSinev · 2024-08-20T09:05:45Z

Is this still needed? Since this is closed #1095

haileyschoelkopf · 2024-08-20T14:55:22Z

@baberabb -- are there some indicative speedup numbers via using batching and/or concurrency w/ #2008 that can be shared here before we close?

baberabb · 2024-08-21T11:18:38Z

@baberabb -- are there some indicative speedup numbers via using batching and/or concurrency w/ #2008 that can be shared here before we close?

yeah. For example with defaults I get 0:23 for 40 requests to openai and 0.02 with num_concurrent=10. num_concurrent controls the number of parallel I/O connections (uses aiohttp under the hood).

pbevan1 added 3 commits February 22, 2024 18:41

Add parallel processing for openai completion models

e1ae306

Minor fix to print statement

d687e34

Add ability to set max_attempts in kwargs

73dd96e

pbevan1 requested review from haileyschoelkopf and lintangsutawika as code owners February 22, 2024 19:24

Add *0.75 to rate limits for headroom

0550178

haileyschoelkopf mentioned this pull request Mar 1, 2024

concurrent api request to accelerate evaluation #1504

Closed

haileyschoelkopf assigned baberabb Aug 20, 2024

baberabb closed this Aug 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parallel processing for OpenAI completion models #1460

Add parallel processing for OpenAI completion models #1460

pbevan1 commented Feb 22, 2024 •

edited

Loading

CLAassistant commented Feb 22, 2024 •

edited

Loading

haileyschoelkopf commented Feb 26, 2024

sanchit-ahuja commented Mar 3, 2024 •

edited

Loading

LSinev commented Jun 13, 2024

Peter-Devine commented Jun 28, 2024

LSinev commented Aug 20, 2024 •

edited

Loading

haileyschoelkopf commented Aug 20, 2024

baberabb commented Aug 21, 2024

Add parallel processing for OpenAI completion models #1460

Add parallel processing for OpenAI completion models #1460

Conversation

pbevan1 commented Feb 22, 2024 • edited Loading