Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for ngram speculation with --speculative-tokens param #259

Closed
1 of 4 tasks
abhibst opened this issue Feb 19, 2024 · 5 comments · Fixed by #375
Closed
1 of 4 tasks

Add support for ngram speculation with --speculative-tokens param #259

abhibst opened this issue Feb 19, 2024 · 5 comments · Fixed by #375
Assignees
Labels
enhancement New feature or request

Comments

@abhibst
Copy link

abhibst commented Feb 19, 2024

System Info

using latest docker image

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

similar to this param with TGI
https://huggingface.co/docs/text-generation-inference/en/basic_tutorials/launcher#speculate

Expected behavior

Expecting to run with no error while we use, --speculate 3 param

from our experiment with TGI it did increase the TPS speed around 2X with mixtral 7X8

@jeffreyftang
Copy link
Contributor

Hi @abhibst, we're in the process of adding speculative decoding using Medusa and other adapter-based methods. We're open to exploring n-gram speculation as well, but based on some of the benchmarks we've seen, it may not actually help especially if you're already compute-bound. Definitely something we're open to investigating after landing the adapter-based work.

@jeffreyftang jeffreyftang added the enhancement New feature or request label Feb 22, 2024
@abhibst
Copy link
Author

abhibst commented Mar 5, 2024

hi , team any update on this .

@tgaddair
Copy link
Contributor

tgaddair commented Apr 2, 2024

Hey @abhibst, PR #372 adds support for Medusa. ngram speculation will be coming immediately right after as a follow-up.

@tgaddair tgaddair changed the title does loraX support --speculate param as TGI ? Add support for ngram speculation with --speculative-tokens param Apr 3, 2024
@tgaddair tgaddair self-assigned this Apr 3, 2024
@tgaddair
Copy link
Contributor

tgaddair commented Apr 3, 2024

Hey @abhibst, PR #375 adds support for ngram speculation. Should be able to land this today!

@abhibst
Copy link
Author

abhibst commented Apr 3, 2024

Thanks @tgaddair

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants