You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried converting both to 4 bits and 2 bits, but inference in all ocassions outputs strange characters:
(base) ✘-1 desktop:~/dev/projects/ai/pyllama [main|✔]> python quant_infer.py --wbits 2 --load ~/data/ai/models/llama/pyllama-7B2b.pt --text "the meaning of life is" --max_length 32 --cuda cuda:0
⌛️ Loading model from /home/nico/data/ai/models/llama/pyllama-7B2b.pt...
✅ Model from /home/nico/data/ai/models/llama/pyllama-7B2b.pt is loaded successfully.
********************************************************************************
🦙: the meaning of life is a aapsamama� Achami0i�am Tam-fz ofz-� Spatchz�
I followed instructions in the README.md and ran the quantization this way:
Same weird output for me with evaluation process completed.
python quant_infer.py --wbits 4 --load pyllama-7B4b.pt --text "what are the planets of the milkyway ?" --max_length 24 --cuda cuda:0
🦙: what are the planets of the milkyway? dress Albhttps SEpoispois AlbēattanRef osc Int
****************************** GPU/CPU/Latency Profiling ******************************
...
Hi
I tried converting both to 4 bits and 2 bits, but inference in all ocassions outputs strange characters:
I followed instructions in the README.md and ran the quantization this way:
The evaluation process couldn't complete because of lack of GPU memory, but the quantized version was saved succesfully.
Anyone has an advice?
The text was updated successfully, but these errors were encountered: