-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
speed and memory plots #10
Conversation
Could you add a
and check the GitHub page stop displaying "74% Jupyter" in the languages section ? |
(I don't manage to add inline comment in the notebook on the GitHub UI so I'll add them in this main thread) Curiosity question: is |
Maybe use a global variable for |
About the plots:
|
It would be great to add a single plot that illustrates why using kvpress. I propose to recycle the data you created in the notebook with what I believed will interest users the most:
Also, I would add an horizontal dashed line at X and clip all curves below this X value to clearly show that with compression / quantization you can fit more context length in your GPU. The plot could be saved in an |
Something I don't understand on memory usage:
I guess it's related to the _scale and _shift parameters. Would be great to add a comment on this, else it's a bit confusing |
Thanks for the feedback!
|
Thanks for the updates and cool plots. Could you add the last plot in the main README ? (and maybe create an assets dir for the image and kvpress.jlg) |
It is in the README under evaluation tab (I can also move it somewhere else, or create a new section). I placed the image under |
This PR updates the
speed_and_memory.ipynb
notebook.The notebook now plots
for both default and quantized cache.
I did not include prefilling speed, as this is mostly independent of the cache size (and the repo isn't designed to optimize this part).
Apart from that, the spelling of "Hugging Face" was changed in various files.