You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Getting ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?) when doing inference using HF from_pretrained() with device_map="auto".
Error
File "/home/coder/Liger-Kernel/src/liger_kernel/ops/swiglu.py", line 111, in forward
a, b, c = swiglu_forward(a, b)
^^^^^^^^^^^^^^^^^^^^
File "/home/coder/Liger-Kernel/src/liger_kernel/ops/swiglu.py", line 74, in swiglu_forward
_swiglu_forward_kernel[(n_rows,)](
File "/usr/local/lib/python3.11/site-packages/triton/runtime/jit.py", line 345, in <lambda>
return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/triton/runtime/jit.py", line 691, in run
kernel.run(grid_0, grid_1, grid_2, stream, kernel.function, kernel.packed_metadata, launch_metadata,
File "/usr/local/lib/python3.11/site-packages/triton/backends/nvidia/driver.py", line 365, in __call__
self.launch(*args, **kwargs)
ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?)
Reproduce
fromliger_kernel.transformersimportAutoLigerKernelForCausalLMfromtransformersimportAutoTokenizer, AutoModelForCausalLMfromliger_kernel.transformersimportapply_liger_kernel_to_qwen2importtorchmodel_name="Qwen/Qwen2.5-Math-1.5B-Instruct"device="cuda"# the device to load the model ontoapply_liger_kernel_to_qwen2(cross_entropy=False,
fused_linear_cross_entropy=False,
rms_norm=True,
rope=True,
swiglu=True)
model=AutoModelForCausalLM.from_pretrained(
model_name,
trust_remote_code=True,
use_cache=False,
device_map="auto",
torch_dtype=torch.bfloat16,
)
tokenizer=AutoTokenizer.from_pretrained(model_name)
prompt="Find the value of $x$ that satisfies the equation $4x+5 = 6x+7$."# CoTmessages= [
{"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},
{"role": "user", "content": prompt}
]
text=tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs=tokenizer([text], return_tensors="pt").to(device)
generated_ids=model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids= [
output_ids[len(input_ids):] forinput_ids, output_idsinzip(model_inputs.input_ids, generated_ids)
]
response=tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(f"{response=}")
🐛 Describe the bug
Getting
ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?)
when doing inference using HFfrom_pretrained()
withdevice_map="auto"
.Error
Reproduce
Versions
Operating System: Linux-4.4.0-x86_64-with-glibc2.36
Python version: 3.11.5
PyTorch version: 2.5.1+cu124
CUDA version: 12.4
Triton version: 3.1.0
Transformers version: 4.46.3
GPU: L4 x 4
Liger-Kernel version: 0.4.2
Accelerate version: 1.1.1
The text was updated successfully, but these errors were encountered: