Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse OpenSSL::SSL::Context::Client in HTTP::Client (12x speedup for HTTPS requests) #15419

Open
compumike opened this issue Feb 6, 2025 · 11 comments · May be fixed by #15420
Open

Reuse OpenSSL::SSL::Context::Client in HTTP::Client (12x speedup for HTTPS requests) #15419

compumike opened this issue Feb 6, 2025 · 11 comments · May be fixed by #15420

Comments

@compumike
Copy link
Contributor

TLDR: HTTP::Client.get HTTPS request in 16.92 ms 🐢 vs 1.43 ms 🐇

HTTPS GET request:
  new SSL_CTX  59.11  ( 16.92ms) (± 5.70%)  66.9kB/op  11.87× slower
reuse SSL_CTX 701.40  (  1.43ms) (± 1.21%)  66.8kB/op        fastest

The bug: HTTP::Client creates a new OpenSSL:SSL::Context::Client for every https://... connection.

Desired fix: use a global, default OpenSSL::SSL::Context::Client in HTTP::Client

We have a sharded cluster of Crystal processes doing a large number of HTTPS requests per second at Heii On-Call because we're continuously monitoring our customers' API endpoints and websites. I'm embarrassed that I've only just discovered that we're burning an order of magnitude more CPU time on HTTPS requests than we need to be, all because HTTP::Client is creating a new OpenSSL:SSL::Context::Client for every new connection.


Creating a new SSL_CTX is slow because of loading CA certificates

OpenSSL::SSL::Context::Client.new calls OpenSSL::SSL::Context#set_default_verify_paths, which calls LibSSL.ssl_ctx_set_default_verify_paths. This loads all of the system CA certificates:

The CAfile is processed on execution of the SSL_CTX_load_verify_locations() function.

Here's a quick benchmark:

docker run --rm -it crystallang/crystal:1.15.1 /bin/bash

# crystal eval --release << EOF
require "benchmark"
require "openssl"
Benchmark.ips(calculation: 5.seconds, warmup: 1.second, interactive: false)  { |x|
  x.report("new") { OpenSSL::SSL::Context::Client.new }
  x.report("insecure") { OpenSSL::SSL::Context::Client.insecure }
  x.report("insecure+load") { OpenSSL::SSL::Context::Client.insecure.set_default_verify_paths }
}
EOF

          new 126.77  (  7.89ms) (± 1.80%)  118B/op  75.68× slower
     insecure   9.59k (104.23µs) (± 3.46%)  112B/op        fastest
insecure+load 126.61  (  7.90ms) (± 2.41%)  113B/op  75.78× slower

# crystal version

Crystal 1.15.1 [89944bf17] (2025-02-04)

LLVM: 18.1.6
Default target: x86_64-unknown-linux-gnu
  
# dpkg -l openssl | tail -n 1

ii  openssl        3.0.13-0ubuntu3.4 amd64        Secure Sockets Layer toolkit - cryptographic utility

(Note that this issue may vary based on OpenSSL versions. It's possibly a performance regression that is being tracked openssl/openssl#20286 .)


SSL_CTX is designed to be set up once and reused

A single SSL_CTX object can be used to create many connections (each represented by a separate SSL object). [...] Note that you should not normally make changes to an SSL_CTX after the first SSL object has been created from it.

SSL_CTX: This is the global context structure which is created by a server or client once per program life-time and which holds mainly default values for the SSL structures which are later created for the connections.

One SSL_CTX can be used for an unlimited number of connections

An SSL_CTX object should not be changed after it is used to create any SSL objects or from multiple threads concurrently, since the implementation does not provide serialization of access for these cases.

An SSL_CTX may be used on multiple threads provided it is not reconfigured.


HTTP::Client.get benchmark

docker run --rm -it crystallang/crystal:1.15.1 /bin/bash

# crystal eval --release << EOF
require "benchmark"
require "http"
require "openssl"

uri = URI.parse(ENV["SSLTEST_URL"])
global_openssl_client_context = OpenSSL::SSL::Context::Client.new
headers = HTTP::Headers{"Connection" => "close"}

puts "HTTPS GET request:"

Benchmark.ips(calculation: 5.seconds, warmup: 1.second, interactive: false) do |x|
  x.report("new SSL_CTX") do
    HTTP::Client.get(uri, headers: headers)
  end
  
  x.report("reuse SSL_CTX") do
    client = HTTP::Client.new(uri, tls: global_openssl_client_context)
    client.get(uri.request_target, headers: headers)
    client.close
  end
end
EOF

HTTPS GET request:
  new SSL_CTX  59.11  ( 16.92ms) (± 5.70%)  66.9kB/op  11.87× slower
reuse SSL_CTX 701.40  (  1.43ms) (± 1.21%)  66.8kB/op        fastest

(SSLTEST_URL is an internal HTTPS service running on the same machine.)


Possible resolution

In src/openssl/ssl/context.cr, I could imagine:

    class_getter(default) { new }

and then in src/http/client.cr#initialize replacing the call to OpenSSL::SSL::Context::Client.new with OpenSSL::SSL::Context::Client.default.

I've tested it -- this works and fixes the performance issue.

However, in #2689 I saw that @jhass removed a OpenSSL::SSL::Context.default so as not to expose a global mutable default, which makes sense.

Is there a good solution here? It would be great to use a single global default SSL_CTX for performance, but it seems like leaking the potentially-mutable context is inevitable. For example, could we create a ReadOnlyContext that didn't allow mutation?

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Feb 6, 2025

You can already create a context and tell HTTP::Client to use it:

MY_GLOBAL_CONTEXT = OpenSSL::SSL::Context::Client.new

HTTP::Client.new(host, port, tls: MY_GLOBAL_CONTEXT)

@straight-shoota
Copy link
Member

Thanks for reporting and digging into the issue. Especially the references to related discussions are very valuable.

I think it's worth noting though that HTTP::Client.get is intended for one-off requests. If you want to make repeated requests to the same endpoint, it's recommended to use an HTTP::Client instance. In which case the SSL context like the TCP connection would be reused.
Now I'm not sure if your use case involves requests to the same server or different connections, so this might not be applicable.

Still creating explicit HTTP::Client instances with an explicit SSL context can be a good workaround. HTTP::Client.get also has a tls parameter.

I do agree that it would be a very useful feature if HTTP::Client.get would automatically reuse a default SSL context instead of creating a fresh (and usually identical) instance every time.

Instead of caching a generic, global instance as OpenSSL::SSL::Context::Client.default, maybe it would be better to keep this internal to HTTP::Client. And only used by the class methods. I believe in these use cases we can be sure to avoid inadverted mutation to the context (this needs to be verified, though!).

@straight-shoota
Copy link
Member

straight-shoota commented Feb 6, 2025

Actually, it appears that for older versions libssl, every SSL client socket may mutate the context for hostname validation 😮
See last line in this excerpt:

LibSSL.ssl_ctrl(
@ssl,
LibSSL::SSLCtrl::SET_TLSEXT_HOSTNAME,
LibSSL::TLSExt::NAMETYPE_host_name,
hostname.to_unsafe.as(Pointer(Void))
)
{% if LibSSL.has_method?(:ssl_get0_param) %}
param = LibSSL.ssl_get0_param(@ssl)
if ::Socket::IPAddress.valid?(hostname)
unless LibCrypto.x509_verify_param_set1_ip_asc(param, hostname) == 1
raise OpenSSL::Error.new("X509_VERIFY_PARAM_set1_ip_asc")
end
else
unless LibCrypto.x509_verify_param_set1_host(param, hostname, 0) == 1
raise OpenSSL::Error.new("X509_VERIFY_PARAM_set1_host")
end
end
{% else %}
context.set_cert_verify_callback(hostname)
{% end %}

That would suggest that SSL context reuse isn't even concurrency safe when used to connect to different hostnames... When ssl_get0_param is missing which was introduced in OpenSSL 1.0.2. That's been EOL for a long time but we still support it.
Maybe this would be a reason to drop it... 🤔

@straight-shoota
Copy link
Member

There is also an issue with cache invalidation. If we store and reuse an SSL context it won't pick up on changes to the ca store on disk. This is particularly relevant for long-running processes. So we should consider how to handle cache invalidation.

For example, libcurl by default uses a CA cache timeout of 24 hours (https://curl.se/libcurl/c/CURLOPT_CA_CACHE_TIMEOUT.html).

Another option could be to compare time stampts of the ca files. I'm not sure how feasible this is though, as it would require that information from within the SSL library.

@ysbaddaden
Copy link
Contributor

Maybe this would be a reason to drop it... 🤔

Agree. A perfect argument to drop support for 1.0.2.

Even Ubuntu 18.04 had 1.1.1 so that would likely break nothing.

Cache

Good point. We'll need a timeout, but it would have to be configurable somehow...

Note: curl applies the timeout per CA (not globally), and also checks if the CA changed to invalidate the cache. See https://github.com/curl/curl/blob/553248f501762735c6aa5531f5748e88aefb5314/lib/vtls/openssl.c#L3515-L3560

@ysbaddaden
Copy link
Contributor

Hum, we should dig into SSL_CTX. Since the context is supposed to be reused, it might already deal with cache invalidation by itself.

It seems curl is doing its own cache for some reason —maybe because it supports many libraries?

@straight-shoota
Copy link
Member

https://docs.openssl.org/3.4/man3/SSL_CTX_new/#description

An SSL_CTX object should not be changed after it is used to create any SSL objects or from multiple threads concurrently, since the implementation does not provide serialization of access for these cases.

@ysbaddaden
Copy link
Contributor

That doesn't mean that OpenSSL itself can't safely maintain internal caches in the context.

@straight-shoota
Copy link
Member

I don't think SSL context even remembers where certs are loaded from.

We probably need to take care of this ourselves. For a default context this shouldn't be too difficult though. We can use the X509_get_default_* functions to retrieve the default store path.

@compumike
Copy link
Contributor Author

context.set_cert_verify_callback(hostname)

Oof, good catch. 😢 https://linux.die.net/man/3/ssl_ctx_set_cert_verify_callback says:

SSL_CTX_set_cert_verify_callback() sets the verification callback function for ctx. SSL objects that are created from ctx inherit the setting valid at the time when ssl_new is called.

so if we wanted to continue supporting older versions, we might do something like:

@context_mutex.synchronize do
  context.set_cert_verify_callback(hostname)
  @ssl = LibSSL.ssl_new(context)
  context.clear_cert_verify_callback # LibSSL.set_cert_verify_callback(@handle, nil)
end

Yes, once I discovered this issue, I made a global shared MY_GLOBAL_CONTEXT = OpenSSL::SSL::Context::Client.new, so we're not impacted by this performance issue any more! I made this issue to see if it was something that might be an improvement worth integrating more broadly.

For our monitoring use-case, we are intentionally trying to exercise the full network stack on every request (to make sure the customer's DNS / TCP / load balancer / SSL / application full-stack is all behaving as expected), so we do use Connection: close even though we are instantiating an explicit HTTP::Client.new.


I also dug in and I do think this is (at least partially) an OpenSSL 3.x issue.

In the last of these, the requests 2.32.0 changelog says:

Improvements

verify=True now reuses a global SSLContext which should improve
request time variance between first and subsequent requests.

Given that this is possibly mostly a performance regression that may eventually get fixed upstream in OpenSSL, and that an end-user workaround is easy by creating my own MY_GLOBAL_CONTEXT = OpenSSL::SSL::Context::Client.new, I'm not sure how much complexity we should add to dealing with it. It'd still be nice to reuse a default context, but there do seem to be a lot of concurrency and correctness issues around it in the general case.

@straight-shoota
Copy link
Member

Given that this is possibly mostly a performance regression that may eventually get fixed upstream in OpenSSL

Maybe OpenSSL might be able to improve the performance of loading certificates. But loading them for every request is always going to be slower than not loading them.
And I don't think OpenSSL would be up for an implicit autoloading mechanism. Pretty sure that should be user-defined behaviour.
But we can do it. I think that makes sense.
It's nice that there's an easy workaround, but it would also be nice if HTTP::Client.get wouldn't be worse-performing than it needs to.
On a related note, we might also want to pool and re-use connections, see #6011. But I think this optimization which can be implemented independently.

And I don't think it's actually very complex, at least not for the default of HTTP::Client.get & co.
We might want to take a look how other client libraries do this, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants