Cannot get take_test.py to run due to TypeError:vecquant4matmul() #3

jasoninch · 2023-03-31T10:39:06Z

Running on Windows 10, conda build environment, I think I have all the right torch, Cuda etc modules installed, requirements.txt all installed well. Running the NOTA (None of the Above) Trivia test on Alpaca Lora 7B (4bit)

python take_test.py --trivia fake_trivia_questions.json

model and weights installed from huggingface repos and in the correct directories, but there is some error before the first question about tokenizer class being different, so coudl be related to that.... Anyway get the following error:

Found 1 GPU(s) available.
Using device: cuda:0
Loading Model ...
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
The class this function is called from is 'LlamaTokenizer'.
Loaded the model in 2.36 seconds.
Fitting 4bit scales and zeros to half
Question 1: What type of energy is used to power the Salkran Tower of Pertinax in the Carpathian Mountains of Romania?
A. Solar
B. Wind
C. Gravitonic
D. None of the above
E. I don't know
Traceback (most recent call last):
File "D:\devgit\haltt4llm\take_test.py", line 189, in
main()

....

TypeError: vecquant4matmul(): incompatible function arguments. The following argument types are supported:
1. (arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: torch.Tensor, arg4: torch.Tensor, arg5: int) -> None

Invoked with: tensor([[-0.0148, -0.0238, 0.0097, ..., 0.0231, -0.0175, 0.0318]],
device='cuda:0'), tensor([[ 2004248423, 2020046951, 1734903431, ..., -2024113529,
-1772648858, 1988708488],
[ 2004318071, 1985447543, 1719101303, ..., 1738958728,
1734834296, 1988584549],
[-2006481289, -2038991241, 2003200134, ..., -1734780278,
-2055714936, -1401572265],
...,
[-2022213769, -2021226889, 1735947895, ..., 2002357398,
1483176039, -1215859063],
[ 2005366614, -2022148249, 1752733576, ..., 394557864,
1986418055, 1483962710],
[ 1735820935, 1988720743, -2056755593, ..., -1468438152,
1718123383, 1150911352]], device='cuda:0', dtype=torch.int32), tensor([[0., 0., 0., ..., 0., 0., 0.]], device='cuda:0'), tensor([[0.0318],
[0.0154],
[0.0123],
...,
[0.0191],
[0.0206],
[0.0137]], device='cuda:0'), tensor([[0.2229],
[0.1078],
[0.0860],
...,
[0.1528],
[0.1439],
[0.0959]], device='cuda:0')

Any ideas why?

manyoso · 2023-03-31T10:45:10Z

edit the file ./models/llama-7b-hf/tokenizer_config.json and replace the string LLaMATokenizer with LlamaTokenizer

jasoninch · 2023-03-31T11:09:09Z

That did indeed clear the errors about the tokenizers at the beginning, however I am still getting that other error at the bottom about "TypeError: vecquant4matmul(): incompatible function arguments. The following argument types are supported:

(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: torch.Tensor, arg4: torch.Tensor, arg5: int) -> None"

I'm using Pytorch 2., Cuda 11.7, Python 3.9 in my environment. I don't know each to figure out where this is coming from but earlier before this I noted an error "AttributeError: module 'quant_cuda' has no attribute 'vecquant4recons' and I checked my version of quant_cuda and indeed it had no such atttibute but it did have "vequant4recons_v1" and _v2 so I changed the code to be _v1. So I am wondering if this is a versioning problem?

If this might be part of the problem, may I ask which version of cuda_quant you're using? Again I'm on Windows 10 Powershell miniconda env I built just for this install so it is relatively clean. Thanks!

adtreat · 2023-03-31T14:13:40Z

I think this all comes from atreat:~/dev/large_language_models/haltt4llm (main)> pip show gptq_llama
Name: gptq-llama
Version: 0.1
Summary: GPTQ for Llama
Home-page:
Author:
Author-email:
License:
Location: /home/atreat/.local/lib/python3.10/site-packages
Requires: torch
Required-by:

because:

atreat:~/dev/large_language_models/haltt4llm (main)> grep quant_cuda *
grep: loras: Is a directory
matmul_utils_4bit.py:from gptq_llama import quant_cuda
matmul_utils_4bit.py: quant_cuda.vecquant4matmul_v1_faster(x, qweight, y, scales, zeros)
matmul_utils_4bit.py: quant_cuda.vecquant4matmul_faster(x, qweight, y, scales, zeros, groupsize, x.shape[-1] // 2)
matmul_utils_4bit.py: quant_cuda.vecquant4recons_v1(qweight, buffer, scales, zeros)
matmul_utils_4bit.py: quant_cuda.vecquant4recons_v2(qweight, buffer, scales, zeros, groupsize)
matmul_utils_4bit.py: quant_cuda.vecquant4recons_v1(z_mat, z_buffer, z_scales, z_zeros)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot get take_test.py to run due to TypeError:vecquant4matmul() #3

Cannot get take_test.py to run due to TypeError:vecquant4matmul() #3

jasoninch commented Mar 31, 2023

manyoso commented Mar 31, 2023

jasoninch commented Mar 31, 2023

adtreat commented Mar 31, 2023 •

edited

Loading

Cannot get take_test.py to run due to TypeError:vecquant4matmul() #3

Cannot get take_test.py to run due to TypeError:vecquant4matmul() #3

Comments

jasoninch commented Mar 31, 2023

manyoso commented Mar 31, 2023

jasoninch commented Mar 31, 2023

adtreat commented Mar 31, 2023 • edited Loading

adtreat commented Mar 31, 2023 •

edited

Loading