You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 21, 2024. It is now read-only.
I need to optimize every tool that uses a huggingface model. Such as NMT. Maybe kernl to replace graphs with torch jit or flash attention. Inference speed is key for these.
Investigate faster transformer and triton inference server as well.
The text was updated successfully, but these errors were encountered:
I need to optimize every tool that uses a huggingface model. Such as NMT. Maybe kernl to replace graphs with torch jit or flash attention. Inference speed is key for these.
Investigate faster transformer and triton inference server as well.
The text was updated successfully, but these errors were encountered: