-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decrease _DEFAULT_SAFE_OPEN_INTERVAL
(to what?)
#6599
base: main
Are you sure you want to change the base?
Decrease _DEFAULT_SAFE_OPEN_INTERVAL
(to what?)
#6599
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #6599 +/- ##
==========================================
+ Coverage 77.51% 77.87% +0.37%
==========================================
Files 560 567 +7
Lines 41444 42121 +677
==========================================
+ Hits 32120 32797 +677
Misses 9324 9324 ☔ View full report in Codecov by Sentry. |
Why change this default? If I understand correctly, the main problem is with the SSH plugin, right? The |
Yes, good point, it should be changed in the I think most users, especially new ones, will just be running with the default value. Especially if they go through Report: Note: before the computer can be used, it has to be configured with the command:
Report: verdi -p <profile> computer configure core.ssh <computer> they'll likely just copy-paste this command. Instead, if they run The question is, is 30s really necessary? And if it's overkill, why not reduce it. |
It is put at that value to prevent users being banned from compute centers for opening too many SSH connections. Admittedly, this is more likely to happen when you are running high-throughput with multiple daemon workers, and I am not sure if that is the main use case anymore for AiiDA users. So we can lower it, with the risk that some users may get warnings or banned. If @giovannipizzi also signed off on changing the default to favoring casual users over protecting heavy high-throughput users by default, then it is fine by me as well. |
In practice, as also the tests by @khsrali (i think in a different issue?) show, even with a very low value (like 0.1s, that I would not recommend) the number of open connections remains every low. This is because while a connection is not being opened, other tasks pile up and then reuse the same connection (@khsrali can you point to your tests?). For me 5s (or maybe even 3s) is OK, in any case a supercomputer center should not ban you if you connect every 5 seconds every now and then (e.g I could do it myself opening a few ssh shells). If I'm running mid throughput, requests will pile up and reuse the connection. If I'm really running high thoughput, probably I'll need to tweak a few options anyway. So I'd probably just check if 3s is too much once more and then change the default, and update the doc page on how to run high throughput, mentioning what this value is and that it might need to be modified to avoid being banned |
The benchmarks by @khsrali can be found here: #6544 (comment) |
@@ -64,6 +64,9 @@ def convert_to_bool(string): | |||
class SshTransport(Transport): | |||
"""Support connection, command execution and data transfer to remote computers via SSH+SFTP.""" | |||
|
|||
# Reduced from the 30s set in the Transport abstract base class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Reduced from the 30s set in the Transport abstract base class | |
# Reduced from the 5s set in the Transport abstract base class |
@GeigerJ2 , do you have a moment to apply the changes? |
Hi @khsrali, thanks for the ping here! Before we actually merge this change, I'd like to do some proper benchmarking (e.g., with Thor). That's why this has been on hold for a while now. Maybe we can do it together after the holidays? I remember you already did some benchmarks in the past 😉 |
A possible quick fix to not spend more of our time (at least for now...) on:
safe_interval
more dynamic for quick transport tasks #6544While this is quite a minor change, as it has rather far-reaching consequences, we should bikeshed this at least for a week or so :D
Any thoughts on the duration, @sphuber, @mbercx, @unkcpz, @t-reents? @giovannipizzi proposed 3s, but I'm worried this is a bit short, so I put 5s for now. Will add some benchmarks here later on.