Optimization #19

conceptofmind · 2023-02-28T03:31:06Z

I need to optimize every tool that uses a huggingface model. Such as NMT. Maybe kernl to replace graphs with torch jit or flash attention. Inference speed is key for these.

Investigate faster transformer and triton inference server as well.

conceptofmind · 2023-03-02T05:31:27Z

Lora + DeepSpeed + Flash Attention + maybe 8 bit

conceptofmind · 2023-03-13T20:19:13Z

Just gonna do gptq

conceptofmind self-assigned this Mar 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization #19

Optimization #19

conceptofmind commented Feb 28, 2023

conceptofmind commented Mar 2, 2023

conceptofmind commented Mar 13, 2023

Optimization #19

Optimization #19

Comments

conceptofmind commented Feb 28, 2023

conceptofmind commented Mar 2, 2023

conceptofmind commented Mar 13, 2023