Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error Code 10: Internal Error (Could not find any implementation for node failure of TensorRT 8.5 when running on GPU Jetson Xavier NX #4255

Open
fettahyildizz opened this issue Nov 20, 2024 · 6 comments

Comments

@fettahyildizz
Copy link

fettahyildizz commented Nov 20, 2024

Description

When I try to convert SuperPoint model from onnx to tensorrt engine using trtexec I faced

[optimizer.cpp::computeCosts::3728] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[/Flatten...(Unnamed Layer* 139) [Shuffle]]}

error. It works in tensorrt 8.6 but since our workspace is Jetson Xavier NX and the latest supported Jetpack version for Xavier NX has Tensorrt 8.5, upgrading Tensorrt is not an option for now.

Environment

TensorRT Version: 8.5

NVIDIA GPU: Jetson Xavier NX

CUDA Version: 11.4

Operating System: Jetpack 5.1.4

Python Version (if applicable): 3.8

Relevant Files

Model link: Superpoint ONNX model

Steps To Reproduce

/trtexec --onnx=superpoint_v1.onnx --saveEngine=superpoint_v1.trt

@fettahyildizz fettahyildizz changed the title Error Code 10: Internal Error (Could not find any implementation for node failure of TensorRT 8.5 when running Jetson Xavier NX on GPU XXX Error Code 10: Internal Error (Could not find any implementation for node failure of TensorRT 8.5 when running on GPU Jetson Xavier NX Nov 20, 2024
@lix19937
Copy link

Maybe you need location the code block of ForeignNode[/Flatten...(Unnamed Layer* 139), and rewrite some ops.

@fettahyildizz
Copy link
Author

ed location the code blo

Hi @lix19937, I haven't done such thing before, is there some code pieces I can follow? Do I need to rewrite ops in pytorch environment or TensorRt environment?

@lix19937
Copy link

That is say, you need know which net module(layers/ops) match this foreign node by context and key layer information in torch forward graph. Usually, you can export the model in a gradual manner. Like follow

def forward(x, y):
   a = self.module1(x)
   b = self.module2(b)
   c = self.module3(y)
   return a*c

to

def forward(x, y):
   a = self.module1(x)
   # b = self.module2(b)
   # c = self.module3(y)
   return a

or

def forward(x, y):
   a = self.module1(x)
   b = self.module2(b)
   # c = self.module3(y)
   return b

@fettahyildizz
Copy link
Author

fettahyildizz commented Nov 20, 2024

Hi @lix19937, what I don't understand is Flatten is a basic operation, there is no way TensorRt wouldn't support this ops in 8.5 and start to support it in 8.6. I feel like I'm missing something basic here.

Image

This is the only Flatten node available in my onnx model.

I shared forward method below. I couldn't find what method matches with Flatten node.

def forward(self, data):
        """ Compute keypoints, scores, descriptors for image """
        # Shared Encoder
        x = self.relu(self.conv1a(data))
        x = self.relu(self.conv1b(x))
        x = self.pool(x)
        x = self.relu(self.conv2a(x))
        x = self.relu(self.conv2b(x))
        x = self.pool(x)
        x = self.relu(self.conv3a(x))
        x = self.relu(self.conv3b(x))
        x = self.pool(x)
        x = self.relu(self.conv4a(x))
        x = self.relu(self.conv4b(x))

        # Compute the dense keypoint scores
        cPa = self.relu(self.convPa(x))
        scores = self.convPb(cPa)
        scores = torch.nn.functional.softmax(scores, 1)[:, :-1]
        b, _, h, w = scores.shape
        scores = scores.permute(0, 2, 3, 1).reshape(b, h, w, 8, 8)
        scores = scores.permute(0, 1, 3, 2, 4).reshape(b, h * 8, w * 8)
        scores = simple_nms(scores, default_config['nms_radius'])

        # Extract keypoints
        keypoints = [
            torch.nonzero(s > default_config['keypoint_threshold'])
            for s in scores]
        scores = [s[tuple(k.t())] for s, k in zip(scores, keypoints)]
        
        # Discard keypoints near the image borders
        keypoints, scores = list(zip(*[
            remove_borders(k, s, default_config['remove_borders'], h * 8, w * 8)
            for k, s in zip(keypoints, scores)]))
        
        # Keep the k keypoints with highest score
        if default_config['max_keypoints'] >= 0:
            keypoints, scores = list(zip(*[
                top_k_keypoints(k, s, default_config['max_keypoints'])
                for k, s in zip(keypoints, scores)]))
        
        # Convert (h, w) to (x, y)
        keypoints = [torch.flip(k, [1]).float() for k in keypoints]

        # Compute the dense descriptors
        cDa = self.relu(self.convDa(x))
        descriptors = self.convDb(cDa)
        
        descriptors = torch.nn.functional.normalize(descriptors, p=2, dim=1)
   
        # Extract descriptors
        descriptors = [sample_descriptors(k[None], d[None], 8)[0]
                       for k, d in zip(keypoints, descriptors)]

        
        return {
            'keypoints': keypoints,
            'scores': scores,
            'descriptors': descriptors,
        }

@fettahyildizz
Copy link
Author

@lix19937 I have figured the line seems creating problem is this
scores = [s[tuple(k.t())] for s, k in zip(scores, keypoints)]

@lix19937
Copy link

Before you rewrite the code map to foreign node, you can try use onnx-simplifier or polygraphy to optimize your onnx, then use trtexec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants