Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use llama-1b for faster and more accessible examples #924

Closed
wants to merge 3 commits into from

Conversation

kylesayrs
Copy link
Collaborator

Purpose

  • Make examples more accessible to users without large GPU resources. Since these are just examples, they should be accessible as possible to encourage easy use and adoption of llm-compressor.

@kylesayrs kylesayrs self-assigned this Nov 19, 2024
Copy link

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

@kylesayrs kylesayrs requested a review from mgoin November 19, 2024 03:44
Copy link
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we actually want to keep the larger models as most of our questions on vllm are around compressing larger models to run them there. So I dont think we should be making this change as these are an easy reference and are our most common case.

These examples are also used directly in our testing which help identify cases that would otherwise be ignored when dealing with larger memory requirements.

I wouldn't be opposed to adding a smaller model in addition to the lager models.

@horheynm
Copy link
Collaborator

horheynm commented Nov 19, 2024

I think if the model structure is the same we can use a smaller model.
for example if the difference from the large and small model is N attention heads only then it should be ok, bc the vllm code execution path pathway will be the same.

If the larger model has different architecture, then we should keep the larger model, the execution path will be different

@kylesayrs
Copy link
Collaborator Author

I agree, it makes sense to keep the 8b models. We already have an example of quantizing a small tiny llama model in the readme.

@kylesayrs kylesayrs closed this Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants