-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI move to ALPS (daint-gpu -> alps_gh200) #1225
Conversation
cscs-ci run |
fe656de
to
cbef80a
Compare
cscs-ci run |
5 similar comments
cscs-ci run |
cscs-ci run |
cscs-ci run |
cscs-ci run |
cscs-ci run |
5e7de35
to
075245f
Compare
cscs-ci run |
1 similar comment
cscs-ci run |
a9d1caf
to
3219200
Compare
cscs-ci run |
6 similar comments
cscs-ci run |
cscs-ci run |
cscs-ci run |
cscs-ci run |
cscs-ci run |
cscs-ci run |
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #1225 +/- ##
=======================================
Coverage 95.06% 95.06%
=======================================
Files 139 139
Lines 8573 8573
Branches 1107 1107
=======================================
Hits 8150 8150
Misses 236 236
Partials 187 187 ☔ View full report in Codecov by Sentry. |
cscs-ci run |
Just to add a bit of info on my investigation of this from today: It seems like yes, the pika runtime takes some time to start and stop. However, the biggest chunk of time comes from creating and destroying cuSOLVER handles (of which we create 16 by default). I'm able to reproduce slow timings simply by creating and destroying n cuSOLVER handles, without starting or stopping the pika runtime. In particular, it's slower:
|
I've opened two PRs related to the slowness of API tests: #1268 and #1269. The former attempts to address starting and stopping the runtime (which includess initializing/finalizing DLA-Future and CUDA pools/cuSOLVER handles) unnecessarily frequently. The latter leaves one core free for non-pika threads, which I later found also has a big impact on test times. |
cscs-ci run |
cscs-ci run |
cscs-ci run |
cscs-ci run |
Still open points:
Quick test:
Note: we are seeing some hangs on eiger. Didn't track them, but IIRC at least a couple of cases in the same backtransformation.