Skip to content

🐛 [Bug] Transformers GPT2 Model does not compile via FX Path #1741

Closed
@gs-olive

Description

@gs-olive

Bug Description

When compiling the GPT2 Model model via the FX path, the following error is encountered. Note the model can be pre-traced using the HuggingFace symbolic tracer (Pre-Traced / NOT Pre-Traced below).

NOT Pre-Traced torch_tensorrt.fx.compile(model, is_aten=True, min_acc_module_size=5,...)
    fx_trt_model = torch_tensorrt.fx.compile(model, [input_ids, attention_mask],
  File "~/TensorRT/py/torch_tensorrt/fx/lower.py", line 86, in compile
    return lowerer(module, input)
  File "~/TensorRT/py/torch_tensorrt/fx/lower.py", line 316, in __call__
    return do_lower(module, inputs)
  File "~/TensorRT/py/torch_tensorrt/fx/passes/pass_utils.py", line 117, in pass_with_validation
    res0 = module(*input)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/gpt2/modeling_gpt2.py", line 799, in forward
    past_length = past_key_values[0][0].size(-2)
IndexError: Dimension specified as -2 but tensor has no dimensions
Pre-Traced torch_tensorrt.fx.compile(model, is_aten=True, min_acc_module_size=5,...)
    fx_trt_model = torch_tensorrt.fx.compile(traced, [input_ids, attention_mask],
  File "~/TensorRT/py/torch_tensorrt/fx/lower.py", line 86, in compile
    return lowerer(module, input)
  File "~/TensorRT/py/torch_tensorrt/fx/lower.py", line 316, in __call__
    return do_lower(module, inputs)
  File "~/TensorRT/py/torch_tensorrt/fx/passes/pass_utils.py", line 118, in pass_with_validation
    processed_module = pass_(module, input, *args, **kwargs)
  File "~/TensorRT/py/torch_tensorrt/fx/lower.py", line 313, in do_lower
    lower_result = pm(module)
  File "/usr/local/lib/python3.8/dist-packages/torch/fx/passes/pass_manager.py", line 246, in __call__
    out = _pass(out)
  File "~/TensorRT/py/torch_tensorrt/fx/passes/lower_pass_manager_builder.py", line 68, in wrapped_fn
    return fn(gm, input)
  File "~/TensorRT/py/torch_tensorrt/fx/lower.py", line 262, in <lambda>
    trace_func=lambda module, inputs: aten_tracer.opt_trace(
  File "~/TensorRT/py/torch_tensorrt/fx/tracer/dispatch_tracer/aten_tracer.py", line 159, in opt_trace
    pr: PassResult = passes(fx_module)
  File "~/TensorRT/py/torch_tensorrt/fx/passes/lower_basic_pass_aten.py", line 447, in compose_bmm
    new_func,
UnboundLocalError: local variable 'new_func' referenced before assignment
Pre-Traced torch_tensorrt.fx.compile(model, is_aten=True, min_acc_module_size=5,...) + PR #1708 - Compilation Functional
Got 12 acc subgraphs and 13 non-acc subgraphs
Compilation Successful

To Reproduce

Steps to reproduce the behavior:

  1. Initialize model: GPT2Model.from_pretrained("gpt2").eval().cuda()
  2. Initialize two input tensors, for example: torch.randint(0, 1, (1, 14), dtype=torch.int32).to("cuda") ("input_ids", "attention_mask")
  3. (Optional) Use the transformers tools to trace the model via: transformers.utils.fx.symbolic_trace(model, input_names=["input_ids", "attention_mask"])
  4. Compile the model using FX

Expected behavior

Model should compile via the FX path

Environment

  • Transformers: 4.26.1
  • Torch-TensorRT Version (e.g. 1.0.0): fce0a01
  • PyTorch Version (e.g. 1.0): 2.1.0.dev20230313+cu117
  • CPU Architecture: Intel Xeon CPU
  • OS: Ubuntu 20.04
  • How you installed PyTorch: pip
  • Build command you used: python setup.py develop
  • Are you using local sources or building from archives: local
  • Python version: 3.8.13
  • CUDA version: 11.7

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions