This Python script benchmarks the performance improvement of prompt caching for the OpenAI GPT-4o and Anthropic Claude sonnet-3.5 language models. It measures the latency for API calls with and without prompt caching by requesting a small number of output tokens, approximating the time to first byte. It then calculates the percentage improvement in latency.
data:image/s3,"s3://crabby-images/4dfc3/4dfc3fda0639b970c92ee565c33e5ea16d1ae119" alt="Screenshot 2024-10-27 at 10 25 08 PM"
- Install the required Python packages:
pip install -r requirements.txt
- Set the API keys for OpenAI and Anthropic in the
benchmark.py
script. - Run the script:
python benchmark.py