Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nsight-compute fail when profile the kernel performance #50

Open
weilinquan opened this issue Sep 24, 2024 · 1 comment
Open

Nsight-compute fail when profile the kernel performance #50

weilinquan opened this issue Sep 24, 2024 · 1 comment

Comments

@weilinquan
Copy link

I have compiled my example by such pass pipeline.
oec-opt --stencil-shape-inference --convert-stencil-to-std --cse --parallel-loop-tiling='parallel-loop-tile-sizes=128,1,1' --canonicalize --test-gpu-greedy-parallel-loop-mapping --convert-parallel-loops-to-gpu --canonicalize --lower-affine --convert-scf-to-std --stencil-kernel-to-cubin ../test/Examples/test.mlir > temp.mlir
mlir-translate --mlir-to-llvmir temp.mlir > temp.bc
llc -O3 temp.bc -o temp.s
clang -c temp.s -o temp.o
nvcc --default-stream per-thread -allow-unsupported-compiler -ccbin clang main.cc temp.o -lcuda-runtime-wrappers -lcudart -lcuda
Here are main.cc and test.mlir files in zip. Are there any steps wrong in my pipeline? I want to use ncu to profile more details. Thank you for your help!
test.zip

@gysit
Copy link
Collaborator

gysit commented Sep 24, 2024

I don't have a setup to reproduce this anymore. Does the code per se work? Note that this was tested on a much older CUDA version so maybe things don't work anymore nowadays.

In this post I discussed a bit a more extended pass pipeline with more optimizations:
#46 (comment)

Maybe this works. However if the code is functional then I suspect there is some tool incompatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants