Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot get take_test.py to run due to TypeError:vecquant4matmul() #3

Open
jasoninch opened this issue Mar 31, 2023 · 3 comments
Open

Comments

@jasoninch
Copy link

Running on Windows 10, conda build environment, I think I have all the right torch, Cuda etc modules installed, requirements.txt all installed well. Running the NOTA (None of the Above) Trivia test on Alpaca Lora 7B (4bit)

python take_test.py --trivia fake_trivia_questions.json

model and weights installed from huggingface repos and in the correct directories, but there is some error before the first question about tokenizer class being different, so coudl be related to that.... Anyway get the following error:

Found 1 GPU(s) available.
Using device: cuda:0
Loading Model ...
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
The class this function is called from is 'LlamaTokenizer'.
Loaded the model in 2.36 seconds.
Fitting 4bit scales and zeros to half
Question 1: What type of energy is used to power the Salkran Tower of Pertinax in the Carpathian Mountains of Romania?
A. Solar
B. Wind
C. Gravitonic
D. None of the above
E. I don't know
Traceback (most recent call last):
File "D:\devgit\haltt4llm\take_test.py", line 189, in
main()

....

TypeError: vecquant4matmul(): incompatible function arguments. The following argument types are supported:
1. (arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: torch.Tensor, arg4: torch.Tensor, arg5: int) -> None

Invoked with: tensor([[-0.0148, -0.0238, 0.0097, ..., 0.0231, -0.0175, 0.0318]],
device='cuda:0'), tensor([[ 2004248423, 2020046951, 1734903431, ..., -2024113529,
-1772648858, 1988708488],
[ 2004318071, 1985447543, 1719101303, ..., 1738958728,
1734834296, 1988584549],
[-2006481289, -2038991241, 2003200134, ..., -1734780278,
-2055714936, -1401572265],
...,
[-2022213769, -2021226889, 1735947895, ..., 2002357398,
1483176039, -1215859063],
[ 2005366614, -2022148249, 1752733576, ..., 394557864,
1986418055, 1483962710],
[ 1735820935, 1988720743, -2056755593, ..., -1468438152,
1718123383, 1150911352]], device='cuda:0', dtype=torch.int32), tensor([[0., 0., 0., ..., 0., 0., 0.]], device='cuda:0'), tensor([[0.0318],
[0.0154],
[0.0123],
...,
[0.0191],
[0.0206],
[0.0137]], device='cuda:0'), tensor([[0.2229],
[0.1078],
[0.0860],
...,
[0.1528],
[0.1439],
[0.0959]], device='cuda:0')

Any ideas why?

@manyoso
Copy link
Owner

manyoso commented Mar 31, 2023

edit the file ./models/llama-7b-hf/tokenizer_config.json and replace the string LLaMATokenizer with LlamaTokenizer

@jasoninch
Copy link
Author

That did indeed clear the errors about the tokenizers at the beginning, however I am still getting that other error at the bottom about "TypeError: vecquant4matmul(): incompatible function arguments. The following argument types are supported:

  1. (arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: torch.Tensor, arg4: torch.Tensor, arg5: int) -> None"

I'm using Pytorch 2., Cuda 11.7, Python 3.9 in my environment. I don't know each to figure out where this is coming from but earlier before this I noted an error "AttributeError: module 'quant_cuda' has no attribute 'vecquant4recons' and I checked my version of quant_cuda and indeed it had no such atttibute but it did have "vequant4recons_v1" and _v2 so I changed the code to be _v1. So I am wondering if this is a versioning problem?

If this might be part of the problem, may I ask which version of cuda_quant you're using? Again I'm on Windows 10 Powershell miniconda env I built just for this install so it is relatively clean. Thanks!

@adtreat
Copy link
Collaborator

adtreat commented Mar 31, 2023

I think this all comes from atreat:~/dev/large_language_models/haltt4llm (main)> pip show gptq_llama
Name: gptq-llama
Version: 0.1
Summary: GPTQ for Llama
Home-page:
Author:
Author-email:
License:
Location: /home/atreat/.local/lib/python3.10/site-packages
Requires: torch
Required-by:

because:

atreat:~/dev/large_language_models/haltt4llm (main)> grep quant_cuda *
grep: loras: Is a directory
matmul_utils_4bit.py:from gptq_llama import quant_cuda
matmul_utils_4bit.py: quant_cuda.vecquant4matmul_v1_faster(x, qweight, y, scales, zeros)
matmul_utils_4bit.py: quant_cuda.vecquant4matmul_faster(x, qweight, y, scales, zeros, groupsize, x.shape[-1] // 2)
matmul_utils_4bit.py: quant_cuda.vecquant4recons_v1(qweight, buffer, scales, zeros)
matmul_utils_4bit.py: quant_cuda.vecquant4recons_v2(qweight, buffer, scales, zeros, groupsize)
matmul_utils_4bit.py: quant_cuda.vecquant4recons_v1(z_mat, z_buffer, z_scales, z_zeros)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants