-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a native profiler. #2024
Add a native profiler. #2024
Conversation
eac7973
to
099d67f
Compare
Sefaults pretty quickly on non-1.9. I guess this depends on foreign thread adoption, as CUPTI calls us from an unmanaged worker thread. |
629c9ce
to
da34fa7
Compare
The hang on 11.1 seems real. EDIT: let's just not support CUDA <11.2; who's using that anyway. |
[skip tests]
[skip tests]
523ccd0
to
d4333f1
Compare
[skip julia] [skip cuda] [skip subpackages] [skip benchmarks]
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #2024 +/- ##
===========================================
- Coverage 59.03% 16.58% -42.45%
===========================================
Files 152 152
Lines 12851 13214 +363
===========================================
- Hits 7586 2192 -5394
- Misses 5265 11022 +5757
☔ View full report in Codecov by Sentry. |
This PR adds a native profiler, built on top of CUPTI, that should make it easier to do some simple profiling without having to resort to NSight. Output is loosely based on the old
nvprof
tool, rendered using PrettyTables.jl. Requires Julia 1.9, and CUDA 11.2.The old
CUDA.@profile
, which only activated an external profiler, has been moved toCUDA.@profile external=true
. As such, this probably will need to be a breaking release.TODO:
NVTX integration: seems like a bug in CUPTI, I've contacted NVIDIASmall demo:
Also features a
trace
mode where events are listed chronologically:We do some filtering and pre-processing to make the output a little more compact; this can be disabled using
raw=true
:Fixes #2017
Any suggestions for improvements are welcome. Reporting of metrics/performance counters, source-code correlation, or other advanced features is currently not on the table, just use NSight for that (this functionality is not intended to replace those tools, which work perfectly fine, but are just a bit cumbersome to set-up for most user's needs).