Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 [Bug] Torch-TensorRT silently converts a boolean mask into an integer index containing zeros and ones #2024

Closed
airalcorn2 opened this issue Jun 16, 2023 · 3 comments
Assignees
Labels
bug Something isn't working component: converters Issues re: Specific op converters No Activity

Comments

@airalcorn2
Copy link

airalcorn2 commented Jun 16, 2023

Bug Description

When a PyTorch model calculates a mask using something like this:

mask = vals < mask_val

and uses it to index a Tensor like this:

vals[mask]

the Torch-TensorRT model effectively does this:

vals[mask.long()]

Replacing the mask indexing operation with torch.masked_select, e.g.:

torch.masked_select(vals, mask)

makes the Torch-TensorRT model work correctly.

While the incorrectly working code creates Tensors that have different shapes, a common use case for this kind of masking involves placing the Tensors into a different, fixed-size Tensor. As a result, the outputs of such a PyTorch model will wildly differ from its associated Torch-TensorRT model, but because the mask casting happens silently, it's difficult to debug. As far as I could tell, this behavior isn't mentioned in the documentation, and the only thing I could find when googling around about this issue was this GitHub discussion.

To Reproduce

import torch
import torch_tensorrt

from torch import nn


class Masker(nn.Module):
    def __init__(self, use_masked_select, mask_val):
        super().__init__()
        self.use_masked_select = use_masked_select
        self.mask_val = mask_val

    def forward(self, vals):
        mask = vals < self.mask_val
        if self.use_masked_select:
            return torch.masked_select(vals, mask)
        else:
            return vals[mask]


torch_tensorrt.logging.set_reportable_log_level(torch_tensorrt.logging.Level.Error)

device = torch.device("cuda:0")
vals = torch.rand(20).to(device)
inputs = [torch_tensorrt.Input(vals.shape)]
mask_val = 0.5
for use_masked_select in [False, True]:
    model = Masker(use_masked_select, mask_val).to(device)
    trt_model = torch_tensorrt.compile(model, inputs=inputs)
    with torch.no_grad():
        pt_out = model(vals)
        trt_out = trt_model(vals)

        print(f"use_masked_select: {use_masked_select}")
        print(f"pt_out.shape: {pt_out.shape}")
        print(f"trt_out.shape: {trt_out.shape}")
        if use_masked_select:
            print(f"(pt_out == trt_out).all(): {(pt_out == trt_out).all()}")
        else:
            mask_0_1s = (vals < mask_val).long()
            pt_out_0_1s = pt_out[mask_0_1s]
            print(f"(pt_out_0_1s == trt_out).all(): {(pt_out_0_1s == trt_out).all()}\n")

Expected behavior

Either work the same as masking does in PyTorch or raise a warning or error if a mask operation is detected, and instruct the user to use masked_select.

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

  • Torch-TensorRT Version (e.g. 1.0.0): 1.4.0
  • PyTorch Version (e.g. 1.0): 2.0.1+cu117
  • CPU Architecture: i7-12800H
  • OS (e.g., Linux): Linux
  • How you installed PyTorch (conda, pip, libtorch, source): pip
  • Build command you used (if compiling from source):
  • Are you using local sources or building from archives:
  • Python version: 3.10.10
  • CUDA version: 11.7
  • GPU models and configuration: GeForce RTX 3080 Ti
  • Any other relevant information:

Additional context

@airalcorn2 airalcorn2 added the bug Something isn't working label Jun 16, 2023
@narendasan narendasan added the component: converters Issues re: Specific op converters label Jun 20, 2023
@narendasan
Copy link
Collaborator

We will have to look at the specific operation decomposition here but there are some limitations for TensorRT and boolean inputs where in certain places we cast to int in order for the input to be accepted.

@github-actions
Copy link

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

@airalcorn2
Copy link
Author

I no longer experience this bug when using PyTorch 2.0.1, Torch-TensorRT 1.4.0, and CUDA 12.2, and compiling the model with:

trt_model = torch.compile(model, backend="torch_tensorrt")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component: converters Issues re: Specific op converters No Activity
Projects
None yet
Development

No branches or pull requests

3 participants