Strange characters #82

webpolis · 2023-04-14T20:59:29Z

Hi

I tried converting both to 4 bits and 2 bits, but inference in all ocassions outputs strange characters:

(base) ✘-1 desktop:~/dev/projects/ai/pyllama [main|✔]> python quant_infer.py --wbits 2 --load ~/data/ai/models/llama/pyllama-7B2b.pt --text "the meaning of life is" --max_length 32 --cuda cuda:0
⌛️ Loading model from /home/nico/data/ai/models/llama/pyllama-7B2b.pt...
✅ Model from /home/nico/data/ai/models/llama/pyllama-7B2b.pt is loaded successfully.
********************************************************************************
🦙: the meaning of life is a aapsamama� Achami0i�am Tam-fz ofz-� Spatchz�

I followed instructions in the README.md and ran the quantization this way:

(base) ✔ desktop:~/dev/projects/ai/pyllama [main|✔]> python -m llama.llama_quant decapoda-research/llama-7b-hf c4 --wbits 2 --save ~/data/ai/models/llama/pyllama-7B2b.pt
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [06:16<00:00, 11.42s/it]
Found cached dataset json (/home/nico/.cache/huggingface/datasets/allenai___json/allenai--c4-6fbe877195f42de5/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)
Found cached dataset json (/home/nico/.cache/huggingface/datasets/allenai___json/allenai--c4-efc3d4f4606f44bd/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)

Quantize layer: 0 ,self_attn.q_proj,self_attn.k_proj,self_attn.v_proj,self_attn.o_proj,mlp.gate_proj,mlp.down_proj,mlp.up_proj,
Quantize layer: 1 ,self_attn.q_proj,self_attn.k_proj,self_attn.v_proj,self_attn.o_proj,mlp.gate_proj,mlp.down_proj,mlp.up_proj,
Quantize layer: 2 ,self_attn.q_proj,self_attn.k_proj,self_attn.v_proj,self_attn.o_proj,mlp.gate_proj,mlp.down_proj,mlp.up_proj,
Quantize layer: 3 ,self_attn.q_proj,self_attn.k_proj,self_attn.v_proj,self_attn.o_proj,mlp.gate_proj,mlp.down_proj,mlp.up_proj,
Quantize layer: 4 ,self_attn.q_proj,self_attn.k_proj,self_attn.v_proj,self_attn.o_proj,mlp.gate_proj,mlp.down_proj,mlp.up_proj,
Quantize layer: 5 ,self_attn.q_proj,self_attn.k_proj,self_attn.v_proj,self_attn.o_proj,mlp.gate_proj,mlp.down_proj,mlp.up_proj,
Quantize layer: 6 ,self_attn.q_proj,self_attn.k_proj,self_attn.v_proj,self_attn.o_proj,mlp.gate_proj,mlp.down_proj,mlp.up_proj,
Quantize layer: 7 ,self_attn.q_proj,self_attn.k_proj,self_attn.v_proj,self_attn.o_proj,mlp.gate_proj,mlp.down_proj,mlp.up_proj,
Quantize layer: 8 ,self_attn.q_proj,self_attn.k_proj,self_attn.v_proj,self_attn.o_proj,mlp.gate_proj,mlp.down_proj,mlp.up_proj,
Quantize layer: 9 ,self_attn.q_proj,self_attn.k_proj,self_attn.v_proj,self_attn.o_proj,mlp.gate_proj,mlp.down_proj,mlp.up_proj,
Quantize layer: 10 ,self_attn.q_proj,self_attn.k_proj,self_attn.v_proj,self_attn.o_proj,mlp.gate_proj,mlp.down_proj,mlp.up_proj,
Quantize layer: 11 ,self_attn.q_proj,self_attn.k_proj,self_attn.v_proj,self_attn.o_proj,mlp.gate_proj,mlp.down_proj,mlp.up_proj,
Quantize layer: 12 ,self_attn.q_proj,self_attn.k_proj,self_attn.v_proj,self_attn.o_proj,mlp.gate_proj,mlp.down_proj,mlp.up_proj,
Quantize layer: 13 ,self_attn.q_proj,self_attn.k_proj,self_attn.v_proj,self_attn.o_proj,mlp.gate_proj,mlp.down_proj,mlp.up_proj,
Quantize layer: 14 ,self_attn.q_proj,self_attn.k_proj,self_attn.v_proj,self_attn.o_proj,mlp.gate_proj,mlp.down_proj,mlp.up_proj,
Quantize layer: 15 ,self_attn.q_proj,self_attn.k_proj,self_attn.v_proj,self_attn.o_proj,mlp.gate_proj,mlp.down_proj,mlp.up_proj,
Quantize layer: 16 ,self_attn.q_proj,self_attn.k_proj,self_attn.v_proj,self_attn.o_proj,mlp.gate_proj,mlp.down_proj,mlp.up_proj,
Quantize layer: 17 ,self_attn.q_proj,self_attn.k_proj,self_attn.v_proj,self_attn.o_proj,mlp.gate_proj,mlp.down_proj,mlp.up_proj,
Quantize layer: 18 ,self_attn.q_proj,self_attn.k_proj,self_attn.v_proj,self_attn.o_proj,mlp.gate_proj,mlp.down_proj,mlp.up_proj,
Quantize layer: 19 ,self_attn.q_proj,self_attn.k_proj,self_attn.v_proj,self_attn.o_proj,mlp.gate_proj,mlp.down_proj,mlp.up_proj,
Quantize layer: 20 ,self_attn.q_proj,self_attn.k_proj,self_attn.v_proj,self_attn.o_proj,mlp.gate_proj,mlp.down_proj,mlp.up_proj,
Quantize layer: 21 ,self_attn.q_proj,self_attn.k_proj,self_attn.v_proj,self_attn.o_proj,mlp.gate_proj,mlp.down_proj,mlp.up_proj,
Quantize layer: 22 ,self_attn.q_proj,self_attn.k_proj,self_attn.v_proj,self_attn.o_proj,mlp.gate_proj,mlp.down_proj,mlp.up_proj,
Quantize layer: 23 ,self_attn.q_proj,self_attn.k_proj,self_attn.v_proj,self_attn.o_proj,mlp.gate_proj,mlp.down_proj,mlp.up_proj,
Quantize layer: 24 ,self_attn.q_proj,self_attn.k_proj,self_attn.v_proj,self_attn.o_proj,mlp.gate_proj,mlp.down_proj,mlp.up_proj,
Quantize layer: 25 ,self_attn.q_proj,self_attn.k_proj,self_attn.v_proj,self_attn.o_proj,mlp.gate_proj,mlp.down_proj,mlp.up_proj,
Quantize layer: 26 ,self_attn.q_proj,self_attn.k_proj,self_attn.v_proj,self_attn.o_proj,mlp.gate_proj,mlp.down_proj,mlp.up_proj,
Quantize layer: 27 ,self_attn.q_proj,self_attn.k_proj,self_attn.v_proj,self_attn.o_proj,mlp.gate_proj,mlp.down_proj,mlp.up_proj,
Quantize layer: 28 ,self_attn.q_proj,self_attn.k_proj,self_attn.v_proj,self_attn.o_proj,mlp.gate_proj,mlp.down_proj,mlp.up_proj,
Quantize layer: 29 ,self_attn.q_proj,self_attn.k_proj,self_attn.v_proj,self_attn.o_proj,mlp.gate_proj,mlp.down_proj,mlp.up_proj,
Quantize layer: 30 ,self_attn.q_proj,self_attn.k_proj,self_attn.v_proj,self_attn.o_proj,mlp.gate_proj,mlp.down_proj,mlp.up_proj,
Quantize layer: 31 ,self_attn.q_proj,self_attn.k_proj,self_attn.v_proj,self_attn.o_proj,mlp.gate_proj,mlp.down_proj,mlp.up_proj,model.layers.0.self_attn.q_proj
model.layers.0.self_attn.k_proj
model.layers.0.self_attn.v_proj
model.layers.0.self_attn.o_proj
model.layers.0.mlp.gate_proj
model.layers.0.mlp.down_proj
model.layers.0.mlp.up_proj
model.layers.1.self_attn.q_proj
model.layers.1.self_attn.k_proj
model.layers.1.self_attn.v_proj
model.layers.1.self_attn.o_proj
model.layers.1.mlp.gate_proj
model.layers.1.mlp.down_proj
model.layers.1.mlp.up_proj
model.layers.2.self_attn.q_proj
model.layers.2.self_attn.k_proj
model.layers.2.self_attn.v_proj
model.layers.2.self_attn.o_proj
model.layers.2.mlp.gate_proj
model.layers.2.mlp.down_proj
model.layers.2.mlp.up_proj
model.layers.3.self_attn.q_proj
model.layers.3.self_attn.k_proj
model.layers.3.self_attn.v_proj
model.layers.3.self_attn.o_proj
model.layers.3.mlp.gate_proj
model.layers.3.mlp.down_proj
model.layers.3.mlp.up_proj
model.layers.4.self_attn.q_proj
model.layers.4.self_attn.k_proj
model.layers.4.self_attn.v_proj
model.layers.4.self_attn.o_proj
model.layers.4.mlp.gate_proj
model.layers.4.mlp.down_proj
model.layers.4.mlp.up_proj
model.layers.5.self_attn.q_proj
model.layers.5.self_attn.k_proj
model.layers.5.self_attn.v_proj
model.layers.5.self_attn.o_proj
model.layers.5.mlp.gate_proj
model.layers.5.mlp.down_proj
model.layers.5.mlp.up_proj
model.layers.6.self_attn.q_proj
model.layers.6.self_attn.k_proj
model.layers.6.self_attn.v_proj
model.layers.6.self_attn.o_proj
model.layers.6.mlp.gate_proj
model.layers.6.mlp.down_proj
model.layers.6.mlp.up_proj
model.layers.7.self_attn.q_proj
model.layers.7.self_attn.k_proj
model.layers.7.self_attn.v_proj
model.layers.7.self_attn.o_proj
model.layers.7.mlp.gate_proj
model.layers.7.mlp.down_proj
model.layers.7.mlp.up_proj
model.layers.8.self_attn.q_proj
model.layers.8.self_attn.k_proj
model.layers.8.self_attn.v_proj
model.layers.8.self_attn.o_proj
model.layers.8.mlp.gate_proj
model.layers.8.mlp.down_proj
model.layers.8.mlp.up_proj
model.layers.9.self_attn.q_proj
model.layers.9.self_attn.k_proj
model.layers.9.self_attn.v_proj
model.layers.9.self_attn.o_proj
model.layers.9.mlp.gate_proj
model.layers.9.mlp.down_proj
model.layers.9.mlp.up_proj
model.layers.10.self_attn.q_proj
model.layers.10.self_attn.k_proj
model.layers.10.self_attn.v_proj
model.layers.10.self_attn.o_proj
model.layers.10.mlp.gate_proj
model.layers.10.mlp.down_proj
model.layers.10.mlp.up_proj
model.layers.11.self_attn.q_proj
model.layers.11.self_attn.k_proj
model.layers.11.self_attn.v_proj
model.layers.11.self_attn.o_proj
model.layers.11.mlp.gate_proj
model.layers.11.mlp.down_proj
model.layers.11.mlp.up_proj
model.layers.12.self_attn.q_proj
model.layers.12.self_attn.k_proj
model.layers.12.self_attn.v_proj
model.layers.12.self_attn.o_proj
model.layers.12.mlp.gate_proj
model.layers.12.mlp.down_proj
model.layers.12.mlp.up_proj
model.layers.13.self_attn.q_proj
model.layers.13.self_attn.k_proj
model.layers.13.self_attn.v_proj
model.layers.13.self_attn.o_proj
model.layers.13.mlp.gate_proj
model.layers.13.mlp.down_proj
model.layers.13.mlp.up_proj
model.layers.14.self_attn.q_proj
model.layers.14.self_attn.k_proj
model.layers.14.self_attn.v_proj
model.layers.14.self_attn.o_proj
model.layers.14.mlp.gate_proj
model.layers.14.mlp.down_proj
model.layers.14.mlp.up_proj
model.layers.15.self_attn.q_proj
model.layers.15.self_attn.k_proj
model.layers.15.self_attn.v_proj
model.layers.15.self_attn.o_proj
model.layers.15.mlp.gate_proj
model.layers.15.mlp.down_proj
model.layers.15.mlp.up_proj
model.layers.16.self_attn.q_proj
model.layers.16.self_attn.k_proj
model.layers.16.self_attn.v_proj
model.layers.16.self_attn.o_proj
model.layers.16.mlp.gate_proj
model.layers.16.mlp.down_proj
model.layers.16.mlp.up_proj
model.layers.17.self_attn.q_proj
model.layers.17.self_attn.k_proj
model.layers.17.self_attn.v_proj
model.layers.17.self_attn.o_proj
model.layers.17.mlp.gate_proj
model.layers.17.mlp.down_proj
model.layers.17.mlp.up_proj
model.layers.18.self_attn.q_proj
model.layers.18.self_attn.k_proj
model.layers.18.self_attn.v_proj
model.layers.18.self_attn.o_proj
model.layers.18.mlp.gate_proj
model.layers.18.mlp.down_proj
model.layers.18.mlp.up_proj
model.layers.19.self_attn.q_proj
model.layers.19.self_attn.k_proj
model.layers.19.self_attn.v_proj
model.layers.19.self_attn.o_proj
model.layers.19.mlp.gate_proj
model.layers.19.mlp.down_proj
model.layers.19.mlp.up_proj
model.layers.20.self_attn.q_proj
model.layers.20.self_attn.k_proj
model.layers.20.self_attn.v_proj
model.layers.20.self_attn.o_proj
model.layers.20.mlp.gate_proj
model.layers.20.mlp.down_proj
model.layers.20.mlp.up_proj
model.layers.21.self_attn.q_proj
model.layers.21.self_attn.k_proj
model.layers.21.self_attn.v_proj
model.layers.21.self_attn.o_proj
model.layers.21.mlp.gate_proj
model.layers.21.mlp.down_proj
model.layers.21.mlp.up_proj
model.layers.22.self_attn.q_proj
model.layers.22.self_attn.k_proj
model.layers.22.self_attn.v_proj
model.layers.22.self_attn.o_proj
model.layers.22.mlp.gate_proj
model.layers.22.mlp.down_proj
model.layers.22.mlp.up_proj
model.layers.23.self_attn.q_proj
model.layers.23.self_attn.k_proj
model.layers.23.self_attn.v_proj
model.layers.23.self_attn.o_proj
model.layers.23.mlp.gate_proj
model.layers.23.mlp.down_proj
model.layers.23.mlp.up_proj
model.layers.24.self_attn.q_proj
model.layers.24.self_attn.k_proj
model.layers.24.self_attn.v_proj
model.layers.24.self_attn.o_proj
model.layers.24.mlp.gate_proj
model.layers.24.mlp.down_proj
model.layers.24.mlp.up_proj
model.layers.25.self_attn.q_proj
model.layers.25.self_attn.k_proj
model.layers.25.self_attn.v_proj
model.layers.25.self_attn.o_proj
model.layers.25.mlp.gate_proj
model.layers.25.mlp.down_proj
model.layers.25.mlp.up_proj
model.layers.26.self_attn.q_proj
model.layers.26.self_attn.k_proj
model.layers.26.self_attn.v_proj
model.layers.26.self_attn.o_proj
model.layers.26.mlp.gate_proj
model.layers.26.mlp.down_proj
model.layers.26.mlp.up_proj
model.layers.27.self_attn.q_proj
model.layers.27.self_attn.k_proj
model.layers.27.self_attn.v_proj
model.layers.27.self_attn.o_proj
model.layers.27.mlp.gate_proj
model.layers.27.mlp.down_proj
model.layers.27.mlp.up_proj
model.layers.28.self_attn.q_proj
model.layers.28.self_attn.k_proj
model.layers.28.self_attn.v_proj
model.layers.28.self_attn.o_proj
model.layers.28.mlp.gate_proj
model.layers.28.mlp.down_proj
model.layers.28.mlp.up_proj
model.layers.29.self_attn.q_proj
model.layers.29.self_attn.k_proj
model.layers.29.self_attn.v_proj
model.layers.29.self_attn.o_proj
model.layers.29.mlp.gate_proj
model.layers.29.mlp.down_proj
model.layers.29.mlp.up_proj
model.layers.30.self_attn.q_proj
model.layers.30.self_attn.k_proj
model.layers.30.self_attn.v_proj
model.layers.30.self_attn.o_proj
model.layers.30.mlp.gate_proj
model.layers.30.mlp.down_proj
model.layers.30.mlp.up_proj
model.layers.31.self_attn.q_proj
model.layers.31.self_attn.k_proj
model.layers.31.self_attn.v_proj
model.layers.31.self_attn.o_proj
model.layers.31.mlp.gate_proj
model.layers.31.mlp.down_proj
model.layers.31.mlp.up_proj
Found cached dataset wikitext (/home/nico/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126)
Found cached dataset wikitext (/home/nico/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126)
wikitext2
Evaluating ...
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/nico/anaconda3/lib/python3.10/runpy.py:196 in _run_module_as_main                          │
│                                                                                                  │
│   193 │   main_globals = sys.modules["__main__"].__dict__                                        │
│   194 │   if alter_argv:                                                                         │
│   195 │   │   sys.argv[0] = mod_spec.origin                                                      │
│ ❱ 196 │   return _run_code(code, main_globals, None,                                             │
│   197 │   │   │   │   │    "__main__", mod_spec)                                                 │
│   198                                                                                            │
│   199 def run_module(mod_name, init_globals=None,                                                │
│                                                                                                  │
│ /home/nico/anaconda3/lib/python3.10/runpy.py:86 in _run_code                                     │
│                                                                                                  │
│    83 │   │   │   │   │      __loader__ = loader,                                                │
│    84 │   │   │   │   │      __package__ = pkg_name,                                             │
│    85 │   │   │   │   │      __spec__ = mod_spec)                                                │
│ ❱  86 │   exec(code, run_globals)                                                                │
│    87 │   return run_globals                                                                     │
│    88                                                                                            │
│    89 def _run_module_code(code, init_globals=None,                                              │
│                                                                                                  │
│ /media/nico/data/projects/ai/pyllama/llama/llama_quant.py:477 in <module>                        │
│                                                                                                  │
│   474                                                                                            │
│   475                                                                                            │
│   476 if __name__ == "__main__":                                                                 │
│ ❱ 477 │   run()                                                                                  │
│   478                                                                                            │
│                                                                                                  │
│ /media/nico/data/projects/ai/pyllama/llama/llama_quant.py:473 in run                             │
│                                                                                                  │
│   470 │   │   │   │   dataset, seed=args.seed, model=args.model, seqlen=model.seqlen, tokenize   │
│   471 │   │   │   )                                                                              │
│   472 │   │   │   print(dataset)                                                                 │
│ ❱ 473 │   │   │   llama_eval(model, testloader, args, dev)                                       │
│   474                                                                                            │
│   475                                                                                            │
│   476 if __name__ == "__main__":                                                                 │
│                                                                                                  │
│ /home/nico/anaconda3/lib/python3.10/site-packages/torch/utils/_contextlib.py:115 in              │
│ decorate_context                                                                                 │
│                                                                                                  │
│   112 │   @functools.wraps(func)                                                                 │
│   113 │   def decorate_context(*args, **kwargs):                                                 │
│   114 │   │   with ctx_factory():                                                                │
│ ❱ 115 │   │   │   return func(*args, **kwargs)                                                   │
│   116 │                                                                                          │
│   117 │   return decorate_context                                                                │
│   118                                                                                            │
│                                                                                                  │
│ /media/nico/data/projects/ai/pyllama/llama/llama_quant.py:154 in llama_eval                      │
│                                                                                                  │
│   151 │   model.model.embed_tokens = model.model.embed_tokens.cpu()                              │
│   152 │   torch.cuda.empty_cache()                                                               │
│   153 │                                                                                          │
│ ❱ 154 │   outs = torch.zeros_like(inps)                                                          │
│   155 │   attention_mask = cache["attention_mask"]                                               │
│   156 │                                                                                          │
│   157 │   for i in range(len(layers)):                                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
OutOfMemoryError: CUDA out of memory. Tried to allocate 2.60 GiB (GPU 0; 5.78 GiB total capacity; 2.61 GiB already allocated; 2.33 GiB free; 2.65 GiB reserved in total by 
PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

The evaluation process couldn't complete because of lack of GPU memory, but the quantized version was saved succesfully.

Anyone has an advice?

The text was updated successfully, but these errors were encountered:

sfornengo · 2023-05-17T11:44:01Z

Same weird output for me with evaluation process completed.

python quant_infer.py --wbits 4 --load pyllama-7B4b.pt --text "what are the planets of the milkyway ?" --max_length 24 --cuda cuda:0

🦙: what are the planets of the milkyway? dress Albhttps SEpoispois AlbēattanRef osc Int
****************************** GPU/CPU/Latency Profiling ******************************
...

Any clue ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange characters #82

Strange characters #82

webpolis commented Apr 14, 2023 •

edited

Loading

sfornengo commented May 17, 2023

Strange characters #82

Strange characters #82

Comments

webpolis commented Apr 14, 2023 • edited Loading

sfornengo commented May 17, 2023

webpolis commented Apr 14, 2023 •

edited

Loading