Add cost estimation for use of closed source models #15

fraboniface · 2024-10-29T19:48:25Z

Work done
Very simple analysis:

take official API pricing for leading small models GPT-4o-mini and Claude 3.5 Haiku;
take MediaTree dataset from https://drive.google.com/drive/folders/1d0idkOmMIXabj7ajYhvkitMMHnH_woSN and compute average number of tokens per sample: ~500
calculate total number of tokens if we have instead 19 channels for 800 h each : ~200M
write a cost function accounting for input tokens, output tokens, and cached prompt tokens
run this function on a range of numbers of output tokens, assuming a rather long example prompt

Results: it could cost as little as $60, and a few hundreds USD for more output tokens (max is $600 for 1000 ot using Claude).

Related issue: #11

ycrouin

LGTM 👍

I think one interesting result is that it will cost at the very least 60$ for one run. The good thing is that it's acceptable and manageable but we probably won't be able to run our pipeline again and again carelessely. The other result is that larger models like gpt-4o or sonnet-3.5 are probably too expensive for this scale.

Maybe we can rerun this later when we'll have the final dataset and prompts to have a more accurate cost estimation before running it on the entire data history.

ycrouin · 2024-10-29T20:46:05Z

notebooks/hackathon_ecf/estimate_cost.ipynb

prompt_cost = prompt_tokens * prices['input'] * prices['cache_discount'] / 1e6

The caching is only possible starting from 1024/2048 tokens, see OpenAI & Anthropic docs so it's a bit optimistic with this prompt that contains only 636 tokens. Maybe we'll include few shots & more and eventually reach this limit though. I guess if we use structured outputs that counts as extra tokens too. But this is really a detail 😛, I just wasn't sure if you were aware of this lower limit of 1024/2048.

fraboniface added 2 commits October 29, 2024 20:31

Move file to correct folder

6300497

Add cost estimation notebook

d73ba1f

fraboniface requested a review from ycrouin October 29, 2024 19:48

ycrouin approved these changes Oct 29, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cost estimation for use of closed source models #15

Add cost estimation for use of closed source models #15

fraboniface commented Oct 29, 2024

ycrouin left a comment •

edited

Loading

ycrouin Oct 29, 2024 •

edited

Loading

Add cost estimation for use of closed source models #15

Are you sure you want to change the base?

Add cost estimation for use of closed source models #15

Conversation

fraboniface commented Oct 29, 2024

ycrouin left a comment • edited Loading

Choose a reason for hiding this comment

ycrouin Oct 29, 2024 • edited Loading

Choose a reason for hiding this comment

ycrouin left a comment •

edited

Loading

ycrouin Oct 29, 2024 •

edited

Loading