Add support for ngram speculation with --speculative-tokens
param
#259
Labels
enhancement
New feature or request
--speculative-tokens
param
#259
System Info
using latest docker image
Information
Tasks
Reproduction
similar to this param with TGI
https://huggingface.co/docs/text-generation-inference/en/basic_tutorials/launcher#speculate
Expected behavior
Expecting to run with no error while we use, --speculate 3 param
from our experiment with TGI it did increase the TPS speed around 2X with mixtral 7X8
The text was updated successfully, but these errors were encountered: