Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] The end-to-end generation speed and W4A4 #16

Open
aur61 opened this issue Oct 13, 2024 · 0 comments
Open

[Question] The end-to-end generation speed and W4A4 #16

aur61 opened this issue Oct 13, 2024 · 0 comments

Comments

@aur61
Copy link

aur61 commented Oct 13, 2024

Great job, starred! I do have a few questions:

  1. Did you test the e2e generation speed, specifically in terms of tokens/second or the latency of the first token?
  2. For the W4A4, the speedup is about 1. Could you share the reason behind this, and is there any potential for improvement?
  3. For the W4A4, did you compare it with fp16 or bf16?

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant