【feature advice】Int8 mode to run original model #15

LiuLinyun · 2023-05-14T03:03:40Z

when using LoRA to fine-tune, the original model parameters are locked, which can be converted to INT8 mode to inference, and other trainable parameters with fp16/bf16/fp32/tf32 mode. It is just like what peft library do.

Uing INT8 to run original model will save GPU memory and accelerate training speed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【feature advice】Int8 mode to run original model #15

【feature advice】Int8 mode to run original model #15

LiuLinyun commented May 14, 2023

【feature advice】Int8 mode to run original model #15

【feature advice】Int8 mode to run original model #15

Comments

LiuLinyun commented May 14, 2023