Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?) #401

Open
shivam15s opened this issue Nov 20, 2024 · 1 comment

Comments

@shivam15s
Copy link
Collaborator

shivam15s commented Nov 20, 2024

🐛 Describe the bug

Getting ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?) when doing inference using HF from_pretrained() with device_map="auto".

Error

File "/home/coder/Liger-Kernel/src/liger_kernel/ops/swiglu.py", line 111, in forward
    a, b, c = swiglu_forward(a, b)
              ^^^^^^^^^^^^^^^^^^^^
  File "/home/coder/Liger-Kernel/src/liger_kernel/ops/swiglu.py", line 74, in swiglu_forward
    _swiglu_forward_kernel[(n_rows,)](
  File "/usr/local/lib/python3.11/site-packages/triton/runtime/jit.py", line 345, in <lambda>
    return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/triton/runtime/jit.py", line 691, in run
    kernel.run(grid_0, grid_1, grid_2, stream, kernel.function, kernel.packed_metadata, launch_metadata,
  File "/usr/local/lib/python3.11/site-packages/triton/backends/nvidia/driver.py", line 365, in __call__
    self.launch(*args, **kwargs)
ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?)

Reproduce

from liger_kernel.transformers import AutoLigerKernelForCausalLM
from transformers import AutoTokenizer, AutoModelForCausalLM
from liger_kernel.transformers import apply_liger_kernel_to_qwen2
import torch

model_name = "Qwen/Qwen2.5-Math-1.5B-Instruct"
device = "cuda"  # the device to load the model onto

apply_liger_kernel_to_qwen2(cross_entropy=False,
                            fused_linear_cross_entropy=False,
                            rms_norm=True,
                            rope=True,
                            swiglu=True)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
    use_cache=False,
    device_map="auto",
    torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Find the value of $x$ that satisfies the equation $4x+5 = 6x+7$."

# CoT
messages = [
    {"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(f"{response=}")

Versions


Operating System: Linux-4.4.0-x86_64-with-glibc2.36
Python version: 3.11.5
PyTorch version: 2.5.1+cu124
CUDA version: 12.4
Triton version: 3.1.0
Transformers version: 4.46.3
GPU: L4 x 4
Liger-Kernel version: 0.4.2
Accelerate version: 1.1.1

@shivam15s
Copy link
Collaborator Author

triton-lang/triton#5205

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant