Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP/2 server #687

Merged
merged 104 commits into from
Nov 19, 2023
Merged

HTTP/2 server #687

merged 104 commits into from
Nov 19, 2023

Conversation

KingMob
Copy link
Collaborator

@KingMob KingMob commented Aug 31, 2023

The code is ready for review, though there are admittedly, some details to work out and tests to fix.

Some things to know:

  1. There's extremely little continuity in the Netty code, so a lot of things are more different than they need to be. Some of that is genuinely caused by how HTTP/2 has a completely different binary format and multiplexed streams, unlike HTTP/1, but a lot is just Netty doing it differently over time.

  2. Organizationally, truly common code has been moved to a.http.common. Code specific to HTTP2, but shared between client and server is in a.http.http2. HTTP1-specific code is left wherever it was, in a.http.core/server/client. Public functions in a.http.server and a.http.client should handle both HTTP versions. This is messier than I'd like, but since we never hid the http sub-namespaces, there's a chance someone's code is using them. Otherwise, I'd have made a much cleaner design.

  3. Missing features: this does not support proxies or multipart. Multipart is kind of pointless in HTTP2 (just open a new stream for each part), but understandably some people want to keep using it. I think it's fine to tell them to wait, since Netty lacks all support for it in HTTP2, and we'd have to figure something out. I was working on shifting to the HTTP1 pathway when multipart is used, but never finished.

  4. pipeline-transform - fundamentally this assumed a single pipeline, but multiplexing breaks that. We now have 2 types of pipeline: a general connection pipeline that handles common concerns like SSL and HTTP frame decoding, and many (identical) stream pipelines that each handle one stream, aka a request/response pair. There's no safe way to know what the user intended with pipeline-transform, so I think perhaps two new params (conn-pipeline-transform and stream-pipeline-transform?) should be added, and an error should be thrown if the user supplies only pipeline-transform, and is using http2. It's not ideal, but it will prevent subtle problems.

  5. Some good news: While I think the multiplexing and Netty changes added some complexity, I believe the fundamental core is actually simpler in the HTTP2 code, because requests/responses are isolated in their own streams. There's no crazy state-tracking to know when one request/resp is finished, nor are there multiple queues feeding in/out on the client side.

  6. The broken tests are a mix of fragile, weird, and different.


This change is Reviewable

Update some deprecated usages
Refactored some names for clarity
The connection-level http2 pipelines are almost identical between client
and server, because they're just setting up the frame codec and the
multiplex handler.
Created top-level server pipeline building fns
Refactored out executor setup fns
Netty has an optimized formatter
Keeps 1s scheduled executor updates
Bumped up Netty to 4.1.96
If `continue-executor` is same as main executor .shutdown gets called
twice. Probably harmless, but also pointless.
Switch to `common/date-header-value` for HTTP/1
pipeline and handler now part of opts map
`ring-map->netty-http2-headers` now handles both requests and responses
correctly.

Use AsciiString/String consistently when comparing
Remove unneeded null check for header values (Netty will do it)
Updated add-header docstring
Add add-exception-handler to common to log exception that make it to the
end of the pipeline.
Rename attach-idle-handlers to add-idle-handlers to match the rest of
the system.
Move error-response and invalid-value-exception to common ns
Hint ssl-handler with SslHandler, not ChannelHandler
Hint coerce-ssl-context with SslContext
Fix SSL test when using BoringSSL
`keyword` doesn't work with CharSequences
Basic GET/HEAD requests work
Bump byte-streams to 0.3.4

Raise io.netty.util log level. It generates a lot of distracting logs.
@arnaudgeiser
Copy link
Collaborator

arnaudgeiser commented Nov 7, 2023

I don't know if it's on me or not, but I'm not able to either run lein test nor start a REPL with Leiningen 2.8.
It always ends up with a StackOverflowError.
(I don't have this issue with current master)

❯ lein test             
Pathological dependency tree detected.
Consider setting :pedantic? false in project.clj to bypass.
Execution error (StackOverflowError) at (REPL:1).
null

Full report at:
/tmp/clojure-12621424673382473380.edn

I bumped to Leiningen 2.10.0 I don't have this issue anymore.

@KingMob
Copy link
Collaborator Author

KingMob commented Nov 8, 2023

@arnaudgeiser Never seen that problem, but I'm already on lein 2.10.0, for no idea how long. What was in that report?

@KingMob KingMob mentioned this pull request Nov 8, 2023
@arnaudgeiser
Copy link
Collaborator

Anyway, it's not a "problem" since bumping to 2.1.0 is enough to fix it.
But there is something that changed underneath with this PR and I don't have a problem with any other projects I'm working on.

❯ cat /tmp/clojure-12621424673382473380.edn
{:clojure.main/message
 "Execution error (StackOverflowError) at (REPL:1).\nnull\n",
 :clojure.main/triage
 {:clojure.error/class java.lang.StackOverflowError,
  :clojure.error/phase :execution},
 :clojure.main/trace
 {:via
  [{:type java.lang.StackOverflowError,
    :at [clojure.lang.RT boundedLength "RT.java" 1793]}],
  :trace
  [[clojure.lang.RT boundedLength "RT.java" 1793]
   [clojure.lang.AFn applyToHelper "AFn.java" 148]
   [clojure.lang.RestFn applyTo "RestFn.java" 132]
   [clojure.core$apply invokeStatic "core.clj" 669]
   [clojure.core$mapcat invokeStatic "core.clj" 2787]
   [clojure.core$tree_seq$walk__6403$fn__6404 invoke "core.clj" 4934]
   [clojure.lang.LazySeq sval "LazySeq.java" 42]
   [clojure.lang.LazySeq seq "LazySeq.java" 51]
   [clojure.lang.RT seq "RT.java" 535]
   [clojure.core$seq__5419 invokeStatic "core.clj" 139]
   [clojure.core$concat$fn__5510 invoke "core.clj" 727]
   [clojure.lang.LazySeq sval "LazySeq.java" 42]
   [clojure.lang.LazySeq seq "LazySeq.java" 51]
   [clojure.lang.RT seq "RT.java" 535]
   [clojure.core$seq__5419 invokeStatic "core.clj" 139]
   [clojure.core$concat$cat__5512$fn__5513 invoke "core.clj" 736]
   [clojure.lang.LazySeq sval "LazySeq.java" 42]
   [clojure.lang.LazySeq seq "LazySeq.java" 51]
   [clojure.lang.RT seq "RT.java" 535]

@KingMob
Copy link
Collaborator Author

KingMob commented Nov 8, 2023

@arnaudgeiser I found this in Leiningen's NEWS.md. Looks like it was fixed in 2.9.9

Fix a bug in pedantic checks which resulted in infinite loops.

Copy link
Collaborator

@DerGuteMoritz DerGuteMoritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 31 files at r1, 1 of 7 files at r3, 2 of 4 files at r4.
Reviewable status: 11 of 31 files reviewed, 33 unresolved discussions (waiting on @arnaudgeiser and @KingMob)


examples/src/aleph/examples/http2.clj line 68 at r1 (raw file):

   NB: `:insecure? true` is necessary for self-signed certs. Do not use in prod."
  (http/connection-pool {:connection-options {:http-versions [:http2 :http1]
                                              :insecure?     true}}))

You can actually leave the :insecure? true out by passing the server's self-signed certificate as :trust-store here. See https://github.com/clj-commons/aleph/pull/689/files where I changed the tests to do the same. I think we should genereally strive to discourage the use of :insecure? true even in examples so as not to (unintentionally) normalize it.


src/aleph/http.clj line 59 at r1 (raw file):

   | `transport`                       | The transport to use, one of `:nio`, `:epoll`, `:kqueue` or `:io-uring` (defaults to `:nio` ) |
   | `compression?`                    | When `true`, enables HTTP body compression, defaults to `false` |
   | `compression-options`             | An optional Java array of CompressionOptions. If supplied, any codec options not passed in will be unavailable. Otherwise, uses default options for all available codecs. For Brotli/Zstd, be sure to guard your use with their `.isAvailable()` methods and add their deps to your classpath. |

Class name should be fully qualified and in backticks for consistency with other option docstrings.


src/aleph/http.clj line 83 at r1 (raw file):

   | `conn-go-away-handler`            | A connection-level cleanup handler for when a GOAWAY is received. Indicates the peer has, or soon will, close the TCP connection. Contains the last-processed stream ID.|
   | `stream-go-away-handler`          | A stream-level cleanup handler called for streams above the last-stream-id in a GOAWAY frame. Indicates the stream will not be processed. Called with the context and the Http2GoAwayFrame. |
   | `reset-stream-handler`            | A stream-level cleanup handler called for streams that have been sent RST_STREAM. Called with the context and the Http2ResetFrame. |

Would it be clearer to also prefix these options with http2- since they indeed all exclusively apply to H2? Likewise, also prefix all H1-specific options with http1-? Perhaps switch to keyword namespaces as in http2/reset-stream-handler etc.? (Sorry for the bikeshed 😄)


src/aleph/http.clj line 483 at r4 (raw file):

   | `request-timeout`    | timeout in milliseconds for the arrival of a response over the established connection
   | `read-timeout`       | timeout in milliseconds for the response to be completed
   | `response-executor`  | optional `java.util.concurrent.Executor` that will handle the requests (defaults to a `flow/utilization-executor` of 256 `max-threads` and a `queue-length` of 0)")

Nice that you filled these in! Now I can't help but notice that we inconsistently capitalize these option docstrings 🙈 Maybe we should just break out a separate issue for overhauling the docstrings for consistency.


src/aleph/netty.clj line 822 at r1 (raw file):

  ^ChannelInitializer
  [pipeline-builder]
  (AlephChannelInitializer. #_ensure-dynamic-classloader

Dev leftover?


src/aleph/netty.clj line 961 at r1 (raw file):

     | `session-cache-size`          | The size of the cache used for storing SSL session objects.
     | `session-timeout`             | The timeout for the cached SSL session objects, in seconds.
     | `application-protocol-config` | An `ApplicationProtocolConfig` instance to configure ALPN. See the `application-protocol-config` function.

Do we also want to accept "pure data" here for symmetry with ssl-context (via coerce-ssl-context)?


src/aleph/netty.clj line 1247 at r1 (raw file):

    (max 1 (SystemPropertyUtil/getInt "io.netty.eventLoopThreads" (* cpu-count 2)))))

#_(defn ^:no-doc enumerating-thread-factory

Delete this or leave a comment on why it might sense to revive it for future readers?


src/aleph/netty.clj line 1480 at r1 (raw file):

  "Generates a new SslHandler for the given SslContext.

   The 2-arity version is for servers.

Heh I had actually changed it from n-arity to n-ary here last time around 🙈 Not a native speaker but isn't arity the noun and -ary the adjectivization suffix?


src/aleph/netty.clj line 1273 at r2 (raw file):

Previously, KingMob (Matthew Davidson) wrote…

Don't recall it off the top of my head, but according to a35226a, there was an issue with byte-streams.graph.Type being loaded by two different classloaders.

See clj-commons/byte-streams#68 for some more details


src/aleph/http/common.clj line 122 at r4 (raw file):

          (.contains ApplicationProtocolNames/HTTP_2)))

(defn validate-http1-pipeline-xform

Would spell out transform here, too, since xform (at least to me) evokes transducers.


src/aleph/http/compression.clj line 70 at r4 (raw file):

        (Double/parseDouble (.substring enc (inc equals-idx)))
        (catch NumberFormatException e
          0.0))

Hmm I guess this parser is designed to be liberal in the spirit of Postel's Law (which has recently been challenged by the IETF itself) but do we really want to accept q=NaN and things like that? 🤔


src/aleph/http/compression.clj line 149 at r4 (raw file):

                 (.contains enc "zstd") (assoc qvs :zstd qval)
                 (.contains enc "gzip") (assoc qvs :gzip qval)
                 (.contains enc "deflate") (assoc qvs :deflate qval)

IMHO this parser is definitely too liberal even in the spirit of Postel's Law since it would misinterpret overlapping values (e.g. a hypothetical future codec named "cobra" would get parsed as :br). It would be more robust to match exactly and thus not compress then.

Copy link
Collaborator Author

@KingMob KingMob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 11 of 31 files reviewed, 33 unresolved discussions (waiting on @arnaudgeiser and @DerGuteMoritz)


project.clj line 36 at r1 (raw file):

Previously, DerGuteMoritz (Moritz Heidkamp) wrote…

The comment is a bit redundant now given that we're in the :dev profile anyway

I think the point was to remind people not to try and require it in their own projects for prod, but you have a point.


examples/src/aleph/examples/http2.clj line 39 at r1 (raw file):

Previously, DerGuteMoritz (Moritz Heidkamp) wrote…

Not generally true: If you control your clients and supply them with your self-signed server certificate, it's perfectly fine in production, too 🙂

This comment was for the earlier version (2eb16ba) of this line, right? Do you still think it needs changing?

Another issue is io.netty.handler.ssl.util.SelfSignedCertificate uses an insecure PRNG, so it shouldn't be used even if you DO want a self-signed cert.


examples/src/aleph/examples/http2.clj line 68 at r1 (raw file):

Previously, DerGuteMoritz (Moritz Heidkamp) wrote…

You can actually leave the :insecure? true out by passing the server's self-signed certificate as :trust-store here. See https://github.com/clj-commons/aleph/pull/689/files where I changed the tests to do the same. I think we should genereally strive to discourage the use of :insecure? true even in examples so as not to (unintentionally) normalize it.

Removing :insecure? true might make the example more secure, but it also makes it more confusing. To get it to work, you have to manually make an SslContext with :trust-store set. However, once you pass in an SslContext, the code won't handle ALPN for you, but it will check that the ALPN config and http-versions match, so you then have to add an ALPN config. You can remove :http-versions, but that shouldn't usually be needed for clients. The results looks like:

(http/connection-pool
    {:connection-options
     {:http-versions [:http2 :http1] ; <- optional
      :ssl-context   (netty/ssl-client-context
                       {:trust-store                 [(.cert self-signed-cert)]
                        :application-protocol-config (netty/application-protocol-config [:http2 :http1])})}})

With :insecure? true, it at least looks alarming enough that people probably assume they need to remove it in real code.

With the :trust-store and ALPN, people might not know they're only needed for this particular example, and will assume they need to use those for all clients. We can add copious notes, but a lot of people won't read them, and will ask ?s in slack.

I'm open to removing :insecure? true, but do we have a solution that will resemble production usage?


examples/src/aleph/examples/http2.clj line 57 at r2 (raw file):

Previously, arnaudgeiser (Arnaud Geiser) wrote…

It can be kept that way if you are sure, but I don't think we will change to default to HTTP2 within the following year.
We'll need to be sure the support of HTTP2 is correctly supported on the Aleph HTTP2 client... and it won't have any side effects.

I'm hoping it won't take a whole year.


src/aleph/http.clj line 59 at r1 (raw file):

Previously, DerGuteMoritz (Moritz Heidkamp) wrote…

Class name should be fully qualified and in backticks for consistency with other option docstrings.

Done.


src/aleph/http.clj line 83 at r1 (raw file):

Previously, DerGuteMoritz (Moritz Heidkamp) wrote…

Would it be clearer to also prefix these options with http2- since they indeed all exclusively apply to H2? Likewise, also prefix all H1-specific options with http1-? Perhaps switch to keyword namespaces as in http2/reset-stream-handler etc.? (Sorry for the bikeshed 😄)

Heh. No, because they'll apply to HTTP/3 if/when we get around to that. And many of the now-H1-only options should be left alone to preserve backwards-compatibility. I guess we could add extra :http1/ variants, but I'm fine with this.


src/aleph/http.clj line 483 at r4 (raw file):

Previously, DerGuteMoritz (Moritz Heidkamp) wrote…

Nice that you filled these in! Now I can't help but notice that we inconsistently capitalize these option docstrings 🙈 Maybe we should just break out a separate issue for overhauling the docstrings for consistency.

Yeah, they should really be capitalized since so many of them turn into multiple sentences...


src/aleph/netty.clj line 822 at r1 (raw file):

Previously, DerGuteMoritz (Moritz Heidkamp) wrote…

Dev leftover?

Yeah, from when I thought I'd need to replace Netty's ClassLoaders with the clj classloader. Deleted.


src/aleph/netty.clj line 961 at r1 (raw file):

Previously, DerGuteMoritz (Moritz Heidkamp) wrote…

Do we also want to accept "pure data" here for symmetry with ssl-context (via coerce-ssl-context)?

I'm not the biggest fan of all those ssl fns taking maps. I've ended up constantly referring to their impls to see what options I'm actually getting out.

I'd say go ahead, but the only parameter to application-protocol-config is the list of HTTP versions, just like :http-versions, which would make it confusing to have both. :http-versions was meant to save people from fiddling with the ALPN config entirely.


src/aleph/netty.clj line 1247 at r1 (raw file):

Previously, DerGuteMoritz (Moritz Heidkamp) wrote…

Delete this or leave a comment on why it might sense to revive it for future readers?

Deleted


src/aleph/netty.clj line 1480 at r1 (raw file):

Previously, DerGuteMoritz (Moritz Heidkamp) wrote…

Heh I had actually changed it from n-arity to n-ary here last time around 🙈 Not a native speaker but isn't arity the noun and -ary the adjectivization suffix?

Yes, but I've only seen that with actual nouns, not digit prefixes.

NM, a quick google/wikipedia check shows me I'm wrong.


src/aleph/http/common.clj line 122 at r4 (raw file):

Previously, DerGuteMoritz (Moritz Heidkamp) wrote…

Would spell out transform here, too, since xform (at least to me) evokes transducers.

Hmmm. I don't like the concept of transforms in general being considered transducer-only, and "xform" is the natural abbreviation, but you have a point. If nothing else, it's more consistent with the rest of the codebase.


src/aleph/http/compression.clj line 70 at r4 (raw file):

Previously, DerGuteMoritz (Moritz Heidkamp) wrote…

Hmm I guess this parser is designed to be liberal in the spirit of Postel's Law (which has recently been challenged by the IETF itself) but do we really want to accept q=NaN and things like that? 🤔

Hmm, well, it's a transliterated version of the Java fn io.netty.handler.codec.http.HttpContentCompressor/determineEncoding, so they have the same bug, FWIW.

And HTTP quality vals should also theoretically be checked to ensure they lie between 0.0 and 1.0, according to https://www.rfc-editor.org/rfc/rfc9110#quality.values.

But the only consequence is they might get a different codec (or no compression) than their qvals would specify. They have to handle all the codecs they send anyway, so it's not a big deal.

Still, it's just some math.

OK, fixed. Anything outside of [0.0, 1.0], including Infs and NaNs, is set to 0.


src/aleph/http/compression.clj line 149 at r4 (raw file):

Previously, DerGuteMoritz (Moritz Heidkamp) wrote…

IMHO this parser is definitely too liberal even in the spirit of Postel's Law since it would misinterpret overlapping values (e.g. a hypothetical future codec named "cobra" would get parsed as :br). It would be more robust to match exactly and thus not compress then.

This also a transliteration of a Netty fn. And given that only one new codec ("br") has been added to browsers in twenty years, I'm not too worried about being caught by surprise 😄

Copy link
Collaborator

@DerGuteMoritz DerGuteMoritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 31 files at r1, 1 of 4 files at r4.
Reviewable status: 10 of 31 files reviewed, 24 unresolved discussions (waiting on @arnaudgeiser and @KingMob)


project.clj line 36 at r1 (raw file):

Previously, KingMob (Matthew Davidson) wrote…

I think the point was to remind people not to try and require it in their own projects for prod, but you have a point.

I see - your call then!


examples/src/aleph/examples/http2.clj line 39 at r1 (raw file):

Previously, KingMob (Matthew Davidson) wrote…

This comment was for the earlier version (2eb16ba) of this line, right? Do you still think it needs changing?

Another issue is io.netty.handler.ssl.util.SelfSignedCertificate uses an insecure PRNG, so it shouldn't be used even if you DO want a self-signed cert.

Ah good to know about the PRNG! New comment lgtm.


examples/src/aleph/examples/http2.clj line 68 at r1 (raw file):

Previously, KingMob (Matthew Davidson) wrote…

Removing :insecure? true might make the example more secure, but it also makes it more confusing. To get it to work, you have to manually make an SslContext with :trust-store set. However, once you pass in an SslContext, the code won't handle ALPN for you, but it will check that the ALPN config and http-versions match, so you then have to add an ALPN config. You can remove :http-versions, but that shouldn't usually be needed for clients. The results looks like:

(http/connection-pool
    {:connection-options
     {:http-versions [:http2 :http1] ; <- optional
      :ssl-context   (netty/ssl-client-context
                       {:trust-store                 [(.cert self-signed-cert)]
                        :application-protocol-config (netty/application-protocol-config [:http2 :http1])})}})

With :insecure? true, it at least looks alarming enough that people probably assume they need to remove it in real code.

With the :trust-store and ALPN, people might not know they're only needed for this particular example, and will assume they need to use those for all clients. We can add copious notes, but a lot of people won't read them, and will ask ?s in slack.

I'm open to removing :insecure? true, but do we have a solution that will resemble production usage?

Aaah, now I remember struggling with exactly this when I changed the tests. We even talked about this here and here where I said I would file a ticket for making this more convenient which I never ended up doing 😅 With the change proposed there, it would then work like this:

(http/connection-pool
 {:connection-options
  {:http-versions [:http2 :http1]
   :ssl-context {:trust-store [(.cert self-signed-cert)]}}})

How about we add this change to this here PR then while we're at it? I could try my hand at it, too, if you prefer.


src/aleph/http.clj line 83 at r1 (raw file):

Previously, KingMob (Matthew Davidson) wrote…

Heh. No, because they'll apply to HTTP/3 if/when we get around to that. And many of the now-H1-only options should be left alone to preserve backwards-compatibility. I guess we could add extra :http1/ variants, but I'm fine with this.

Ah right, didn't think about H3, makes sense! 👍


src/aleph/http.clj line 483 at r4 (raw file):

Previously, KingMob (Matthew Davidson) wrote…

Yeah, they should really be capitalized since so many of them turn into multiple sentences...

Filed as #695


src/aleph/netty.clj line 961 at r1 (raw file):

Previously, KingMob (Matthew Davidson) wrote…

I'm not the biggest fan of all those ssl fns taking maps. I've ended up constantly referring to their impls to see what options I'm actually getting out.

I'd say go ahead, but the only parameter to application-protocol-config is the list of HTTP versions, just like :http-versions, which would make it confusing to have both. :http-versions was meant to save people from fiddling with the ALPN config entirely.

Yeah, this could probably be documented a bit better, too.

Hm if the ALPN config has no further options anyway, maybe we can just completely remove it and always derive it from :http-versions instead? It's still in alpha after all.


src/aleph/netty.clj line 1480 at r1 (raw file):

Previously, KingMob (Matthew Davidson) wrote…

Yes, but I've only seen that with actual nouns, not digit prefixes.

NM, a quick google/wikipedia check shows me I'm wrong.

Thanks for confirming! I guess both ways to spell it make sense after all, let's keep it as is then.


src/aleph/http/common.clj line 122 at r4 (raw file):

Previously, KingMob (Matthew Davidson) wrote…

Hmmm. I don't like the concept of transforms in general being considered transducer-only, and "xform" is the natural abbreviation, but you have a point. If nothing else, it's more consistent with the rest of the codebase.

Oh yeah, internal consistency is the better argument here.


src/aleph/http/compression.clj line 149 at r4 (raw file):

Previously, KingMob (Matthew Davidson) wrote…

This also a transliteration of a Netty fn. And given that only one new codec ("br") has been added to browsers in twenty years, I'm not too worried about being caught by surprise 😄

Haha alright, let's see what the future holds!

Copy link
Collaborator Author

@KingMob KingMob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 10 of 31 files reviewed, 24 unresolved discussions (waiting on @arnaudgeiser and @DerGuteMoritz)


project.clj line 36 at r1 (raw file):

Previously, DerGuteMoritz (Moritz Heidkamp) wrote…

I see - your call then!

Damn, a sweet Quickman avatar!


examples/src/aleph/examples/http2.clj line 68 at r1 (raw file):

Previously, DerGuteMoritz (Moritz Heidkamp) wrote…

Aaah, now I remember struggling with exactly this when I changed the tests. We even talked about this here and here where I said I would file a ticket for making this more convenient which I never ended up doing 😅 With the change proposed there, it would then work like this:

(http/connection-pool
 {:connection-options
  {:http-versions [:http2 :http1]
   :ssl-context {:trust-store [(.cert self-signed-cert)]}}})

How about we add this change to this here PR then while we're at it? I could try my hand at it, too, if you prefer.

If you're willing, want to do the PR as soon as this is merged, and before we cut a release? It would be a good way to get your hands dirty with the new codebase.

To be clear, the idea is to hide the ALPN config from users entirely, right?


src/aleph/http.clj line 418 at r2 (raw file):

Previously, arnaudgeiser (Arnaud Geiser) wrote…

Sorry, just saw your suggestion. Thing is, terminated-with-error is past tense, but nothing is actually terminated at the time of setting it, at least not in the H1 code.

Let's use destroy-conn? then.

Done.


src/aleph/netty.clj line 961 at r1 (raw file):

Previously, DerGuteMoritz (Moritz Heidkamp) wrote…

Yeah, this could probably be documented a bit better, too.

Hm if the ALPN config has no further options anyway, maybe we can just completely remove it and always derive it from :http-versions instead? It's still in alpha after all.

When does Reviewable offer a blank "Acknowledge" option vs offering a "Done" option that clutters up the comments? I can't figure it out, but it's annoying.

UPDATE: figured it out. If I click on the circles in the lower-right, and set my status to "Satisfied", I get the option to "Acknowledge".


src/aleph/http/common.clj line 122 at r4 (raw file):

Previously, DerGuteMoritz (Moritz Heidkamp) wrote…

Oh yeah, internal consistency is the better argument here.

Done.


src-java/aleph/http/AlephChannelInitializer.java line 7 at r2 (raw file):

Previously, arnaudgeiser (Arnaud Geiser) wrote…

Sorry for the misunderstanding.
Reverting the changes on this class since there are actually none but comments.

Done.

Copy link
Collaborator Author

@KingMob KingMob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 10 of 31 files reviewed, 24 unresolved discussions (waiting on @arnaudgeiser and @DerGuteMoritz)


src/aleph/netty.clj line 390 at r2 (raw file):

Previously, KingMob (Matthew Davidson) wrote…

I checked, and looks like it's necessary.

Some of the Netty init fns put themselves on the event loop if necessary, but once a channel is active (i.e., on the ev loop), the lifecycle methods all just use asserts.

Will remove

Hmm, comments originating in Github don't maintain location when moved over to Reviewable.

@KingMob
Copy link
Collaborator Author

KingMob commented Nov 15, 2023

@DerGuteMoritz I think this is the only thing left: https://reviewable.io/reviews/clj-commons/aleph/687#-

Copy link
Collaborator

@DerGuteMoritz DerGuteMoritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 12 of 31 files at r1, 2 of 7 files at r3, 1 of 4 files at r4, 6 of 6 files at r5, 1 of 1 files at r6, all commit messages.
Reviewable status: all files reviewed, 32 unresolved discussions (waiting on @arnaudgeiser and @KingMob)


project.clj line 36 at r1 (raw file):

Previously, KingMob (Matthew Davidson) wrote…

Damn, a sweet Quickman avatar!

Ha, you must be the first person I encounter to even know it's Quickman 😄


examples/src/aleph/examples/http2.clj line 68 at r1 (raw file):

Previously, KingMob (Matthew Davidson) wrote…

If you're willing, want to do the PR as soon as this is merged, and before we cut a release? It would be a good way to get your hands dirty with the new codebase.

To be clear, the idea is to hide the ALPN config from users entirely, right?

Yep, can do! Right, the idea is to drop the the whole :application-protocol-config setting.


src/aleph/netty.clj line 390 at r2 (raw file):

Previously, KingMob (Matthew Davidson) wrote…

Hmm, comments originating in Github don't maintain location when moved over to Reviewable.

😢


src/aleph/http/http2.clj line 188 at r5 (raw file):

     EmptyArrays/EMPTY_OBJECTS)))

(defn- go-away

Intentionally private? Seems like it would be nice to haveg in user code.


src/aleph/http/http2.clj line 598 at r5 (raw file):

    (netty/write-and-flush ch (end-of-stream-frame stream))))

;; TODO: (minor) switch to sending over Context? should avoid tail and error-handler, anyway

I'm not following what this means 😅 Can you elaborate, @KingMob?


src/aleph/http/http2.clj line 792 at r5 (raw file):

  "Given Netty Http2Headers and a body stream, returns a Ring request map.

   For advanced users only:

This reads as if the function is intended to be part of the public API but it's marked as private. Intentional?


src/aleph/http/http2.clj line 972 at r5 (raw file):

        body-stream
        (if raw-stream?
          (netty/buffered-source ch #(.readableBytes ^ByteBuf %) buffer-capacity)

FWIW, the H1 implementation doesn't use a buffered source in raw mode. I guess this is done to prevent leaks from unconsumed ByteBufs buffered in the stream? Would make sense to do the same here then.


src/aleph/http/http2.clj line 1038 at r5 (raw file):

                 ;; else handle it inline (hope you know what you're doing!)
                 (try
                   (log/warn "Running user handler inline")

Might be a bit annoying to log this warning here in case I really do know what I'm doing 😄 Suggesting to log this only once when instantiating the handler instead.


src/aleph/http/http2.clj line 1113 at r5 (raw file):

        body-stream
        (if raw-stream?
          (netty/buffered-source ch #(.readableBytes ^ByteBuf %) buffer-capacity)

Hmm oddly enough, here we were indeed already using a buffered source even in raw mode. If my rationale for not using it above makes sense, the same would apply here...


src/aleph/http/http2.clj line 1200 at r5 (raw file):

  [max-request-body-size]
  (let [byte-count (AtomicLong. 0)
        frame-list (atom [])]

Could use volatile! here for performance since it's bound to a single thread anyway, right?


src-java/aleph/http/AlephHttp2FrameCodecBuilder.java line 19 at r5 (raw file):

        // Can't call super() because it's only package-visible, so the
        // below is copied from the Http2FrameCodecBuilder ctor. Let's hope
        // parent doesn't change.

The joys of inheritance 😬


test/aleph/tcp_ssl_test.clj line 77 at r5 (raw file):

          ;; JDK built-in SSL doesn't throw this, but BoringSSL does, so we
          ;; catch, instead of asserting
          ;; TODO: enforce use of BoringSSL or OpenSSL?

You mean enforce in the test suite or in general?

Copy link
Collaborator Author

@KingMob KingMob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 32 unresolved discussions (waiting on @arnaudgeiser and @DerGuteMoritz)


project.clj line 36 at r1 (raw file):

Previously, DerGuteMoritz (Moritz Heidkamp) wrote…

Ha, you must be the first person I encounter to even know it's Quickman 😄

I played the hell out of Megaman 2 in the fifth grade. My wife bought me a Megaman tee...I'll send a photo.


src/aleph/netty.clj line 390 at r2 (raw file):

Previously, DerGuteMoritz (Moritz Heidkamp) wrote…

😢

Done.


src/aleph/http/http2.clj line 188 at r5 (raw file):

Previously, DerGuteMoritz (Moritz Heidkamp) wrote…

Intentionally private? Seems like it would be nice to haveg in user code.

Changed


src/aleph/http/http2.clj line 598 at r5 (raw file):

Previously, DerGuteMoritz (Moritz Heidkamp) wrote…

I'm not following what this means 😅 Can you elaborate, @KingMob?

Updated the comment.

I started using the Http2StreamChannel because we need stream info, but it's faster to send over the context instead of the channel. Also, if the handlers after the Ring handler did anything with writes, sending thru the channel would pass thru them, which is usually the wrong thing in Netty.


src/aleph/http/http2.clj line 792 at r5 (raw file):

Previously, DerGuteMoritz (Moritz Heidkamp) wrote…

This reads as if the function is intended to be part of the public API but it's marked as private. Intentional?

Yes, this is correct. Those keys are readable from the user Ring handler, which is why the docstring mentions it, even if the fn is not meant to be called by users.


src/aleph/http/http2.clj line 972 at r5 (raw file):

Previously, DerGuteMoritz (Moritz Heidkamp) wrote…

FWIW, the H1 implementation doesn't use a buffered source in raw mode. I guess this is done to prevent leaks from unconsumed ByteBufs buffered in the stream? Would make sense to do the same here then.

I'm not following you. It looks to me that raw handlers in server use a buffered stream too.

See line 463 in server.clj:
s (netty/buffered-source (netty/channel ctx) #(.readableBytes ^ByteBuf %) buffer-capacity)

(They don't buffer when they see a FullHttpRequest, but that's because it's going to be opened and closed almost immediately.)

I don't think using an unbuffered stream will actually prevent leaks. If the user code doesn't consume all ByteBufs and free them, there will be leaks regardless of where they're located.


src/aleph/http/http2.clj line 1038 at r5 (raw file):

Previously, DerGuteMoritz (Moritz Heidkamp) wrote…

Might be a bit annoying to log this warning here in case I really do know what I'm doing 😄 Suggesting to log this only once when instantiating the handler instead.

Done.


src/aleph/http/http2.clj line 1113 at r5 (raw file):

Previously, DerGuteMoritz (Moritz Heidkamp) wrote…

Hmm oddly enough, here we were indeed already using a buffered source even in raw mode. If my rationale for not using it above makes sense, the same would apply here...

See related comment. Pretty sure we actually use buffered streams in H1 code, and I don't think unbuffered streams will prevent ByteBuf leaks


src/aleph/http/http2.clj line 1200 at r5 (raw file):

Previously, DerGuteMoritz (Moritz Heidkamp) wrote…

Could use volatile! here for performance since it's bound to a single thread anyway, right?

Good point


src-java/aleph/http/AlephHttp2FrameCodecBuilder.java line 19 at r5 (raw file):

Previously, DerGuteMoritz (Moritz Heidkamp) wrote…

The joys of inheritance 😬

Heh. It's less a problem of inheritance, and more about encapsulation. Netty moves slowly, and there's no guarantee they'd open up the permissions just for Aleph, so I'm reserving my interactions with Netty for more important issues.

When I ask a question on the Netty Discord, I only get an answer maybe half the time.

Getting the multiplex code to support compression is probably a more worthy battle, but it might take weeks/months.


test/aleph/tcp_ssl_test.clj line 77 at r5 (raw file):

Previously, DerGuteMoritz (Moritz Heidkamp) wrote…

You mean enforce in the test suite or in general?

Just the test suite. IIRC, the issue is the built in JDK SSL layer, BoringSSL/OpenSSL have different failure modes.

go-away and reset-stream now public
Updated docstrings and comments
Print inline handler warning only once now
Switch frame-list from atom to volatile!, since it's thread-bound
@KingMob
Copy link
Collaborator Author

KingMob commented Nov 19, 2023

I'm going to merge this for now, and hold off on cutting a release until @DerGuteMoritz's http-versions PR merges

@KingMob KingMob merged commit 0a01265 into master Nov 19, 2023
1 check passed
@KingMob KingMob deleted the feature/http2-server branch November 19, 2023 10:25
@DerGuteMoritz
Copy link
Collaborator

(Responding via GH now since Reviewable doesn't allow me to respond to a closed PR anymore it seems)

@KingMob

I'm not following you. It looks to me that raw handlers in server use a buffered stream too.

See line 463 in server.clj: s (netty/buffered-source (netty/channel ctx) #(.readableBytes ^ByteBuf %) buffer-capacity)

(They don't buffer when they see a FullHttpRequest, but that's because it's going to be opened and closed almost immediately.)

Ah damn, you're right, I mixed it up with aleph.tcp which always uses unbuffered streams, regardless of the :raw-stream? setting even (might be worth changing here, too ...). Then when I wanted to confirm my memory, I only took a quick look and indeed ended up in the branch for FullHttpRequest without realizing it. Sorry for the confusion 🙏

I don't think using an unbuffered stream will actually prevent leaks. If the user code doesn't consume all ByteBufs and free them, there will be leaks regardless of where they're located.

Indeed and even in the case of an unbuffered stream, there may still be one pending buffer there which could be leaked. Alright then!

Heh. It's less a problem of inheritance, and more about encapsulation.

Right, and implementation inheritance makes it very easy to break encapsulation accidentally!

Netty moves slowly, and there's no guarantee they'd open up the permissions just for Aleph, so I'm reserving my interactions with Netty for more important issues.

Seems wise.

Just the test suite. IIRC, the issue is the built in JDK SSL layer, BoringSSL/OpenSSL have different failure modes.

Ah good, yes, +1 on that!

#(.shutdownGracefully group 0 0 TimeUnit/SECONDS)))
;; We still wait 1.5s to gracefully end repeating tasks
;; like the date-header-value.
#(.shutdownGracefully group 1500 1500 TimeUnit/MILLISECONDS)))
Copy link
Collaborator

@DerGuteMoritz DerGuteMoritz Nov 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only noticed this now when working on the follow-up change for the ALPN config: can you explain the rationale behind this change a bit more? I was confused where that 1.5s slowdown on HTTP tests was coming from and only then noticed this. Why is it important to keep the date-header-value task running when the server is already shutdown? And if it's not essential, I would tend to introduce another setting for overriding this - our test suite contains hundreds of tests which start and stop aleph servers so 1.5 seconds on each would add up quite badly 😬

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue was some race condition causing test-request-timeout to fail.

I agree that there's no need to delay shutdown just for one test. I think I was really exasperated at the time I made the change. I'll see about undoing it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ehh, whatever situtation led me to change that no longer holds. I can't replicate the error now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice, glad to hear 💯

@KingMob KingMob mentioned this pull request Jan 3, 2024
DerGuteMoritz added a commit that referenced this pull request Feb 9, 2024
This is a WIP artifact from the patch which introduced HTTP/2 support. It is no longer needed and
dropping it reduces test suite runtime from roughly 5 minutes to roughly 40 seconds.

See #687 (comment) for discussion.
DerGuteMoritz added a commit that referenced this pull request Feb 11, 2024
This is a WIP artifact from the patch which introduced HTTP/2 support. It is no longer needed and
dropping it reduces test suite runtime from roughly 5 minutes to roughly 40 seconds.

See #687 (comment) for discussion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants