-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch curl to using the multi-handle & workers #47
Conversation
2b51ba7
to
38309ce
Compare
I built this and ran my stress test on it to see if maybe it fixed the failures problem as a side effect. Quite the opposite! The existing code base fails at something like a 0-20% rate, with high levels of variability. This update is failing at more like a 90% or more rate. I don't have a ton of time this afternoon to dig deeper, but have a bunch of time tomorrow to see if this might help us uncover the failure problem(s). |
Here's the log of my latest failure. 95% of transfers failed. 241016 rich log.txt |
15cc21c
to
56656fd
Compare
We now depend on XRootD 5.7.0 or later -- that means we'll need upstream builds.
Start using the new XrdOucGatherConf-style syntax to process S3/HTTP server configurations. Used while otherwise debugging the branch.
This adds a new test fixture that launches an xrootd daemon for use during the HTTP unit tests.
Print a fixed size of the buffer, not a string. curl doesn't necessarily null-terminate.
2728449
to
11f6b31
Compare
When providing a successful response to a Stat request, initialize all the fields inside the struct. This ensures everything is zeroed out when it is later used. (Use of uninitialized data was reported by valgrind)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one was a little long to crawl through for line-by-line understanding like I usually like to do, but the overall curl_multi approach seems solid (aided by the fact that we use the same approach in xrdcl-pelican
) and the extra tests are always welcome. If I had one request, it would be to crank up the volume on comment frequency and verbosity.
My matrix of hand-run tests all passed.
We've noticed the performance of the plugin is horribly slow -- to the point where we don't have enough cores in machines to have the origin keep up with the caches (why the caches are pushing so hard is a separate story).
A few judicious uses of
pstack
indicates that almost all the CPU time is spent in TLS handshakes. It turns out that thecurl_easy_*
APIs have no connection reuse unless you reuse the handle itself. So, every single request is a separate TLS connection to the upstream HTTP / S3 server. That is indeed, no fun.This takes motivation (and code) from https://github.com/PelicanPlatform/xrdcl-pelican to have all the requests handled in a set of separate worker threads. The request object enters a queue, is processed by a worker, and then, on completion, the original thread is notified.
This means there's probably 5-ish connections held open to AWS (one per thread) instead of many thousands of short-lived ones.