Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the speed of TRT #22

Open
JudasDie opened this issue Jun 8, 2023 · 6 comments
Open

the speed of TRT #22

JudasDie opened this issue Jun 8, 2023 · 6 comments

Comments

@JudasDie
Copy link

JudasDie commented Jun 8, 2023

Hi, thanks for the great work. I have tried to apply vanilla-9 to object detection, however, when transferring the model to TensorRT, it seems much slower than ResNet-34. Is there any guidance? Thanks in advance.

@ggjy
Copy link
Collaborator

ggjy commented Jun 12, 2023

In object detection, the image input size is much larger than 224 on ImageNet, you can try FP16 or lower down the input resolution, or choose some platform more suitable for our vanilla (e.g., A100) to narrow the gap between vanilla-9 and Res34.

@rememberBr
Copy link

Hello, I've been testing the speed of VanillaNet recently. I tried VanillaNet6 with an image size of 128128, which is smaller than 224224. Using tensorrt for inference, the speed of mobilenetv3-large is 0.028ms, while VanillaNet6 is 0.096ms. The gpu is A40. This is inconsistent with the results of directly using torch in the paper. It seems that VanillaNet is not good at fast processing in TRT?

@HantingChen
Copy link
Collaborator

Hello, I've been testing the speed of VanillaNet recently. I tried VanillaNet6 with an image size of 128_128, which is smaller than 224_224. Using tensorrt for inference, the speed of mobilenetv3-large is 0.028ms, while VanillaNet6 is 0.096ms. The gpu is A40. This is inconsistent with the results of directly using torch in the paper. It seems that VanillaNet is not good at fast processing in TRT?

Thank you for sharing your observations regarding the speed performance of VanillaNet when tested with TensorRT. It's important to note that changing the input resolution of the model can lead to variations in speed performance. We recommend conducting tests using an input image size of 224*224, as per the model's original design and the conditions under which it was benchmarked in our paper.

If you're using an input size of 128*128, the model might require a redesign to optimize its performance for that specific resolution. The discrepancy you observed, especially when comparing VanillaNet6 with MobileNetV3-Large under TensorRT, might be attributed to this change in input size, which can significantly affect how the model processes data and, consequently, its inference speed.

@rememberBr
Copy link

Hello, I've been testing the speed of VanillaNet recently. I tried VanillaNet6 with an image size of 128_128, which is smaller than 224_224. Using tensorrt for inference, the speed of mobilenetv3-large is 0.028ms, while VanillaNet6 is 0.096ms. The gpu is A40. This is inconsistent with the results of directly using torch in the paper. It seems that VanillaNet is not good at fast processing in TRT?

Thank you for sharing your observations regarding the speed performance of VanillaNet when tested with TensorRT. It's important to note that changing the input resolution of the model can lead to variations in speed performance. We recommend conducting tests using an input image size of 224*224, as per the model's original design and the conditions under which it was benchmarked in our paper.

If you're using an input size of 128*128, the model might require a redesign to optimize its performance for that specific resolution. The discrepancy you observed, especially when comparing VanillaNet6 with MobileNetV3-Large under TensorRT, might be attributed to this change in input size, which can significantly affect how the model processes data and, consequently, its inference speed.
Hello, thanks for your reply. I tried using a model with size 224 and the speed matched the readme mentioned when setting the batch size to 64 - about 0.3ms (but Mobilenetv3-large only takes 0.1ms, emmmm). It seems that Mobilenet is faster when using TensorRT. I ran test_latency.py again for testing and found the conclusion consistent with the paper. However, when setting batch_size to 64 in the line of code "data_loader_val = create_loader(dataset_val, input_size=size, batch_size=1, is_training=False, use_prefetcher=False)" in test_latency.py, something unexpected happened: VanillaNet6 takes 0.8ms while Mobilenetv3-large only takes 0.25ms. Is my testing method wrong? This conclusion makes me frustrated, you can also try it and see if my conclusion is wrong.

@HantingChen
Copy link
Collaborator

Hello, I've been testing the speed of VanillaNet recently. I tried VanillaNet6 with an image size of 128_128, which is smaller than 224_224. Using tensorrt for inference, the speed of mobilenetv3-large is 0.028ms, while VanillaNet6 is 0.096ms. The gpu is A40. This is inconsistent with the results of directly using torch in the paper. It seems that VanillaNet is not good at fast processing in TRT?

Thank you for sharing your observations regarding the speed performance of VanillaNet when tested with TensorRT. It's important to note that changing the input resolution of the model can lead to variations in speed performance. We recommend conducting tests using an input image size of 224224, as per the model's original design and the conditions under which it was benchmarked in our paper.
If you're using an input size of 128
128, the model might require a redesign to optimize its performance for that specific resolution. The discrepancy you observed, especially when comparing VanillaNet6 with MobileNetV3-Large under TensorRT, might be attributed to this change in input size, which can significantly affect how the model processes data and, consequently, its inference speed.
Hello, thanks for your reply. I tried using a model with size 224 and the speed matched the readme mentioned when setting the batch size to 64 - about 0.3ms (but Mobilenetv3-large only takes 0.1ms, emmmm). It seems that Mobilenet is faster when using TensorRT. I ran test_latency.py again for testing and found the conclusion consistent with the paper. However, when setting batch_size to 64 in the line of code "data_loader_val = create_loader(dataset_val, input_size=size, batch_size=1, is_training=False, use_prefetcher=False)" in test_latency.py, something unexpected happened: VanillaNet6 takes 0.8ms while Mobilenetv3-large only takes 0.25ms. Is my testing method wrong? This conclusion makes me frustrated, you can also try it and see if my conclusion is wrong.

Thank you for sharing your further testing results and observations. VanillaNet is designed with fewer layers, but each layer involves more complex computations. Therefore, it's better suited for scenarios where there's ample computational resources. In such cases, the primary latency bottleneck tends to be the number of layers rather than FLOPs, which is a key point we aimed to highlight in our work.

We generally set the batch size to 1 for our tests because when the batch size is larger, VanillaNet may not exhibit the advantages as seen in your tests. I suggest you try setting the batch size to 1 when running tests in TensorRT as well. This might better reflect the performance characteristics described in the paper.

@rememberBr
Copy link

Hello, I've been testing the speed of VanillaNet recently. I tried VanillaNet6 with an image size of 128_128, which is smaller than 224_224. Using tensorrt for inference, the speed of mobilenetv3-large is 0.028ms, while VanillaNet6 is 0.096ms. The gpu is A40. This is inconsistent with the results of directly using torch in the paper. It seems that VanillaNet is not good at fast processing in TRT?

Thank you for sharing your observations regarding the speed performance of VanillaNet when tested with TensorRT. It's important to note that changing the input resolution of the model can lead to variations in speed performance. We recommend conducting tests using an input image size of 224_224, as per the model's original design and the conditions under which it was benchmarked in our paper.
If you're using an input size of 128_128, the model might require a redesign to optimize its performance for that specific resolution. The discrepancy you observed, especially when comparing VanillaNet6 with MobileNetV3-Large under TensorRT, might be attributed to this change in input size, which can significantly affect how the model processes data and, consequently, its inference speed.
Hello, thanks for your reply. I tried using a model with size 224 and the speed matched the readme mentioned when setting the batch size to 64 - about 0.3ms (but Mobilenetv3-large only takes 0.1ms, emmmm). It seems that Mobilenet is faster when using TensorRT. I ran test_latency.py again for testing and found the conclusion consistent with the paper. However, when setting batch_size to 64 in the line of code "data_loader_val = create_loader(dataset_val, input_size=size, batch_size=1, is_training=False, use_prefetcher=False)" in test_latency.py, something unexpected happened: VanillaNet6 takes 0.8ms while Mobilenetv3-large only takes 0.25ms. Is my testing method wrong? This conclusion makes me frustrated, you can also try it and see if my conclusion is wrong.

Thank you for sharing your further testing results and observations. VanillaNet is designed with fewer layers, but each layer involves more complex computations. Therefore, it's better suited for scenarios where there's ample computational resources. In such cases, the primary latency bottleneck tends to be the number of layers rather than FLOPs, which is a key point we aimed to highlight in our work.

We generally set the batch size to 1 for our tests because when the batch size is larger, VanillaNet may not exhibit the advantages as seen in your tests. I suggest you try setting the batch size to 1 when running tests in TensorRT as well. This might better reflect the performance characteristics described in the paper.

Thank you. I see. VanillaNet is a very interesting job. May I ask whether VanillaNet will continue to develop in the future, such as VanillaNetV2 or VanillaNetplus, which can maintain its advantage even when batchsize>1? That will be exciting. Because batchsize>1 is often used in practical application scenarios, higher throughput can be obtained.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants