Skip to content
This repository has been archived by the owner on Jun 21, 2024. It is now read-only.

Optimization #19

Open
conceptofmind opened this issue Feb 28, 2023 · 2 comments
Open

Optimization #19

conceptofmind opened this issue Feb 28, 2023 · 2 comments
Assignees

Comments

@conceptofmind
Copy link
Owner

I need to optimize every tool that uses a huggingface model. Such as NMT. Maybe kernl to replace graphs with torch jit or flash attention. Inference speed is key for these.

Investigate faster transformer and triton inference server as well.

@conceptofmind conceptofmind self-assigned this Mar 1, 2023
@conceptofmind
Copy link
Owner Author

Lora + DeepSpeed + Flash Attention + maybe 8 bit

@conceptofmind
Copy link
Owner Author

Just gonna do gptq

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant