Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fast compiled route matcher #10131

Draft
wants to merge 24 commits into
base: 4.3.x
Choose a base branch
from
Draft

Add fast compiled route matcher #10131

wants to merge 24 commits into from

Conversation

yawkat
Copy link
Member

@yawkat yawkat commented Nov 17, 2023

Note this is two commits that are mostly independent, but the second commit has the vast majority of the changes.


This PR adds an alternate request processing path that mostly bypasses RoutingInBoundHandler. The goal is to speed up processing of simple routes significantly.

The approach taken here is to accumulate all routes, prepare those that can be executed quickly (i.e. simple argument / return binding, exact path match, no difficult filters), and then compile the routing tree into an optimized MatchPlan. In principle this approach could be extended to all routes, we "just" need to adapt other framework features (e.g. path parameters) to be included in the compiled MatchPlan. This is my long-term vision for the future of routing in the Micronaut HTTP stack.

The results are very promising so far. FullHttpStackBenchmark goes from ~7µs to ~5µs, a big improvement. And there aren't even very many microoptimizations yet.

This PR adds an alternate request processing path that mostly bypasses RoutingInBoundHandler. The goal is to speed up processing of simple routes significantly.

The approach taken here is to accumulate all routes, prepare those that can be executed quickly (i.e. simple argument / return binding, exact path match, no difficult filters), and then compile the routing tree into an optimized MatchPlan. In principle this approach could be extended to all routes, we "just" need to adapt other framework features (e.g. path parameters) to be included in the compiled MatchPlan. This is my long-term vision for the future of routing in the Micronaut HTTP stack.
@yawkat yawkat added the type: improvement A minor improvement to an existing feature label Nov 17, 2023
@yawkat yawkat added this to the 4.3.0 milestone Nov 17, 2023
@graemerocher graemerocher marked this pull request as draft November 18, 2023 08:56
@graemerocher
Copy link
Contributor

Converted to draft because I think we will need to iterate a bit on this and @dstepanov and me are away next weeks. Will try do an initial review Monday

* @throws CodecException If an error occurs decoding
*/
void writeTo(
HttpHeaders requestHeaders,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean we need to parse the headers? Or could this be the raw Netty headers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new NettyHttpHeaders(headers, conversionService) has no logic so this is not a big concern. i tried adding a new header abstraction first to abstract over netty and micronaut http header classes but i saw NettyHttpHeaders is so light that it's pointless.

}
}

private ExecutionLeaf<RequestHandler> shortCircuitHandler(MatchRule rule, UriRouteInfo<?, ?> routeInfo) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a long method I think we need to make this logic extensible and split this up, can the MathRule interface be extended somehow such that it can handle different return types, method parameters etc.?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean by extending MatchRule. MatchRule's only purpose is to match the request, essentially what Router.find does at the moment. The one thing Router.find does that MatchRule can't is to return e.g. path parameters, but imo this is a good thing because the stateless nature enables NettyShortCircuitRouterBuilder to do logical operations (eg DNF transformation) without restrictions. It is imo better to keep MatchRule simple and add further steps (eg path param extraction) later only if they are necessary.

wrt extensibility, yes it would be nice. What I did for this method is go through RequestLifecycle and exclude all the weird cases (e.g. HttpStatus method return type). When you remove those exclusions, the prepare step already becomes a lot simpler:

MethodExecutionHandle<?, ?> executionHandle = routeInfo.getTargetMethod();
        boolean unwrapResponse = HttpResponse.class.isAssignableFrom(executionHandle.getReturnType().getType());
        MatchRule.ContentType fixedContentType = findFixedContentType(rule);
        MediaType responseMediaType;
        if (fixedContentType != null) {
            responseMediaType = fixedContentType.expectedType();
        } else {
            List<MediaType> produces = routeInfo.getProduces();
            if (!produces.isEmpty()) {
                responseMediaType = produces.get(0);
            } else {
                responseMediaType = MediaType.APPLICATION_JSON_TYPE;
            }
        }
        RequestArgumentBinder<Object>[] argumentBinders = routeInfo.resolveArgumentBinders(requestArgumentSatisfier.getBinderRegistry());
        ShortCircuitArgumentBinder.Prepared[] shortCircuitBinders = new ShortCircuitArgumentBinder.Prepared[argumentBinders.length];
        for (int i = 0; i < argumentBinders.length; i++) {
            Optional<ShortCircuitArgumentBinder.Prepared> prep = scb.prepare(executionHandle.getArguments()[i], fixedContentType);
            shortCircuitBinders[i] = prep.get();
        }
        MessageBodyWriter<Object> messageBodyWriter = (MessageBodyWriter<Object>) routeInfo.getMessageBodyWriter();

It's essentially just content type determination for the MessageBodyWriter, and the prep stage for the argument binders. Similarly, the actual accept implementation is very simple (call arg binders, invoke method, invoke body writer), it's just a bit blown up by the need to handle HttpResponse return types for FullHttpStackBenchmark.

So I think that the current extension points (MessageBodyWriter, argument binders) are already fine conceptually, they just need some refinement. For example maybe there should just be a MessageBodyWriter for HttpStatus, and the special handling for HttpStatus could be removed. And there should also be a MessageBodyWriter for HttpResponse though that is a bit harder.

Until we have some improvements there I don't think we should add new abstractions to simplify this method.

@Override
public void accept(ChannelHandlerContext ctx, io.netty.handler.codec.http.HttpRequest request, ByteBody body, PipeliningServerHandler.OutboundAccess outboundAccess) {
try {
NettyHttpHeaders requestHeaders = new NettyHttpHeaders(request.headers(), conversionService);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like I said do we need this headers object?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

@@ -159,4 +163,38 @@ Optional<T> transform(NettyHttpRequest<?> nhr, ArgumentConversionContext<T> cont
.convert(conversionService, context)
.map(o -> (T) o.claimForExternal());
}

@Override
public Optional<Prepared> prepare(Argument<T> argument, MatchRule.ContentType fixedContentType) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is going to be an internal non-public API do we need the overhead of Optional? Can we just make the return value nullable and return null?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either is fine but this is not a hot path. this is called once per route, so i think it's better to have a nicer api here than to overoptimize this.

.filter(e -> e.getKey() instanceof MatchRule.PathMatchExact)
.collect(Collectors.toMap(e -> ((MatchRule.PathMatchExact) e.getKey()).path(), Map.Entry::getValue));
return request -> {
// this replicates AbstractNettyHttpRequest.getPath but not exactly :(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please clarify

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AbstractNettyHttpRequest.getPath for some reason uses the charset from the content-type of the request, if set. i do not want the charset parsing logic anymore. i dont think anyone uses it and it adds overhead to every request.

Map<String, MatchPlan<R>> byContentType = coerceRules(MatchRule.ContentType.class, nextPlans)
.entrySet().stream().collect(Collectors.toMap(e -> {
MediaType expectedType = e.getKey().expectedType();
return expectedType == null ? null : expectedType.getName();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a null key?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For MatchRule.ContentType, null means no content type header. it's convenient because request.headers().get(HttpHeaderNames.CONTENT_TYPE) returns null when the content type is missing, which will then match MatchRule.ContentType(null)

* @param body The request body
* @return The bound argument
*/
Object bind(@NonNull HttpRequest nettyRequest, HttpHeaders mnHeaders, @NonNull ImmediateByteBody body);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is a new interface maybe httpHeaders can be the netty headers object

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

static class MyController {
@Get("/simple")
String simple() {
return "foo: " + ServerRequestContext.currentRequest().isPresent()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of completely disabling ServerRequestContext.currentRequest() perhaps we can make it lazy. We bind the netty request to the propagation context and then if the Micronaut request is resolved with currentRequest() we initialise it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is a good idea, it would also make things like fallback for error routes easier.

@dstepanov
Copy link
Contributor

dstepanov commented Nov 19, 2023

If I understand correctly, we have to duplicate matching and method invocation logic to do this. The already complicated code gets more complex, and it's not worse IMHO. I feel it's going to be another option like HttpServerType, and is never going to be enabled. We should try to optimize RoutingInBoundHandler if it's that problematic. I see that we don't cache filters, which might be a good idea to add.

When we discussed this task, I assumed we wanted to optimize the Netty pipeline as it is the most visible in the profiling. (Note: I don't fully understand Netty)
My idea was to rewrite this code to be more dynamic based on the route we resolve:

private void insertMicronautHandlers(boolean zeroCopySupported) {
            channel.attr(STREAM_PIPELINE_ATTRIBUTE.get()).set(this);
            if (sslHandler != null) {
                channel.attr(CERTIFICATE_SUPPLIER_ATTRIBUTE.get()).set(sslHandler.findPeerCert());
            }

            SmartHttpContentCompressor contentCompressor = new SmartHttpContentCompressor(embeddedServices.getHttpCompressionStrategy());
            if (zeroCopySupported) {
                channel.attr(PipeliningServerHandler.ZERO_COPY_PREDICATE.get()).set(contentCompressor);
            }
            pipeline.addLast(ChannelPipelineCustomizer.HANDLER_HTTP_COMPRESSOR, contentCompressor);
            pipeline.addLast(ChannelPipelineCustomizer.HANDLER_HTTP_DECOMPRESSOR, new HttpContentDecompressor());

            Optional<NettyServerWebSocketUpgradeHandler> webSocketUpgradeHandler = embeddedServices.getWebSocketUpgradeHandler(server);
            if (webSocketUpgradeHandler.isPresent()) {
                pipeline.addLast(NettyServerWebSocketUpgradeHandler.COMPRESSION_HANDLER, new WebSocketServerCompressionHandler());
            }
            if (server.getServerConfiguration().getServerType() != NettyHttpServerConfiguration.HttpServerType.STREAMED) {
                pipeline.addLast(ChannelPipelineCustomizer.HANDLER_HTTP_AGGREGATOR,
                    new HttpObjectAggregator(
                        (int) server.getServerConfiguration().getMaxRequestSize(),
                        server.getServerConfiguration().isCloseOnExpectationFailed()
                    )
                );
            }
            pipeline.addLast(ChannelPipelineCustomizer.HANDLER_HTTP_CHUNK, new ChunkedWriteHandler()); // todo: move to PipeliningServerHandler

            RequestHandler requestHandler = routingInBoundHandler;
            if (webSocketUpgradeHandler.isPresent()) {
                webSocketUpgradeHandler.get().setNext(routingInBoundHandler);
                requestHandler = webSocketUpgradeHandler.get();
            }
            if (server.getServerConfiguration().isDualProtocol() && server.getServerConfiguration().isHttpToHttpsRedirect() && sslHandler == null) {
                requestHandler = new HttpToHttpsRedirectHandler(routingInBoundHandler.conversionService, server.getServerConfiguration(), sslConfiguration, hostResolver);
            }
            pipeline.addLast(ChannelPipelineCustomizer.HANDLER_MICRONAUT_INBOUND, new PipeliningServerHandler(requestHandler));
        }

We add those steps after we resolve the route. So the idea:

  1. Add a step as much in the front as possible where we can get the request
  2. Convert the request into our abstraction, probably as lazy as possible. I don't think a basic hello-world with disabled CORS should touch headers
  3. Find a route
  4. Based on the route, add pipeline steps. GET method should not add content decompression etc. Websocket added only for WS routes. I assume we can add the steps needed for the response (chunks, decompression) after we know what kind it is and if it does exist. Looking at HttpContentDecompressor and HttpContentCompressor, those can be skipped for some content types.

@dstepanov
Copy link
Contributor

I guess you can comment-out those steps and see if that is worse the change.

@yawkat
Copy link
Member Author

yawkat commented Nov 20, 2023

@dstepanov

This PR addresses RoutingInBoundHandler, RequestLifecycle and so on. It is true that routing and execution logic is duplicated, but I do not see a way around this. The current routing API (not just the implementation) is sub-optimal, and the only solution I could see is to flip it around (accumulate the routes in a builder and use essentially a compiler to build them into a final optimized routing tree). Yes it is similar to HttpServerType, and people probably won't turn it on, but don't forget that HttpServerType also got us the PipeliningServerHandler which now gives the benefit of HttpServerType.FULL_CONTENT for most requests (those that don't exceed a certain body size) for everyone even without FULL_CONTENT enabled. Similarly, in my opinion this new approach can be extended (maybe with breaking some compat, so 5.x) to extend to most or all routes by default.


The netty pipeline is a separate issue. It is not as easy as you say to simply add the handlers dynamically. Many of the handlers need to see the full request lifecycle so cannot be added halfway into the request dynamically. Additionally, HTTP pipelining is a huge issue when it comes to modifying the handler pipeline. There can be many requests in flight at the same time in the same pipeline and it's hard to avoid bugs when modifying it dynamically.

However we already have compressed some parts of the pipeline with the introduction of PipeliningServerHandler. The job that was done before by I think three separate handlers is now done by one. In my opinion, this is the correct approach to remove the other handlers also: The chunked write handler and the compression handlers should go into PipeliningServerHandler where they can be loaded dynamically per request, and where HTTP pipelining is already handled properly. PipeliningServerHandler with its InboundHandler and OutboundHandler internal abstractions is already set up to do exactly this, without adding overhead for uncompressed requests, it's just not implemented yet. But such an optimization cannot be a substitute for improved routing.

@graemerocher
Copy link
Contributor

I think this is an interesting line of investigation and we should continue it, it could be that it becomes the default if we can make it pass the HTTP TCK.

I think compatibility is possible if we make certain things lazy (ServerRequestContext). I agree we should probably not have a configuration option for it as no one will enable it, and the goal should be full compatibility somehow.

Also I think we should investigate optimising route argument binding so that it can use the raw netty headers and provide an optimised API without all the BindingResult/Optional wrapping which will reduce allocations.

Looking at something like Vertx, they have separate routing for normal/templates routes for regex routes as well. So I think the only way to achieve that level of speed is to have static route lookups without the current route matching we have now.

@yawkat
Copy link
Member Author

yawkat commented Nov 20, 2023

I think a config option is unavoidable for 4.x. Though they are a bit reduced from 3.x with the MessageBodyHandler introduction, we still have many cases where certain decisions cannot be done statically. For example, RouteExecutor does an body instanceof HttpStatus every time on controller return type instead of determining this from the route statically. So if you have a controller Object get() {return HttpStatus.NOT_FOUND;} we can't know statically that this needs special handling instead of going through the standard json body writer.

I want to get rid of these cases and make all response handling decisions statically (this example decision is made statically in this PR), but I don't think this is viable for 4.x without an explicit config option.

Copy link

sonarcloud bot commented Nov 20, 2023

SonarCloud Quality Gate failed.    Quality Gate failed

Bug C 2 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 12 Code Smells

75.7% 75.7% Coverage
0.0% 0.0% Duplication

idea Catch issues before they fail your Quality Gate with our IDE extension sonarlint SonarLint

@graemerocher
Copy link
Contributor

You do know statically that it is a dynamic route since the return type is Object and therefore has to fallback to the slower original behaviour

@yawkat
Copy link
Member Author

yawkat commented Nov 20, 2023

yea i guess we can be more conservative with Object return type and skip a lot of the logic.

@dstepanov
Copy link
Contributor

I will take a look in more detail next week, but I would prefer optimizing the current routing/invocation or reusing it instead of duplicating the implementation.

@graemerocher
Copy link
Contributor

Agree that if it is possible to optimise the existing routing that is preferable to duplicate routing

yawkat added a commit that referenced this pull request Nov 20, 2023
This PR replaces the writeChunked and writeFile APIs with a new writeStream API that takes an InputStream. This removes the need for the ChunkedWriteHandler.

Chunked writes were used for two purposes: Sending file regions and sending InputStreams. This has always complicated the HTTP pipeline somewhat as the pipeline had to deal with not just HttpContent objects but also ChunkedInput and FileRegion objects.

This PR replaces the machinery for InputStream writing with a more straight-forward solution that reads the data on the IO thread and then sends it down the channel.

Additionally, the file-specific APIs based on RandomAccessFile are removed. The body writer now just creates an InputStream for the file region in question and sends that. This removes support for zero-copy transfers, however that is a niche feature anyway because it doesn't work with TLS or HTTP/2. If someone wants a performant HTTP server, HTTP/2 takes priority over zero-copy so it makes little sense.

This PR may have small conflicts with #10131 as that PR changed the PipeliningServerHandler body handling a little bit. Otherwise this PR should have no visible impact on users.
yawkat added a commit that referenced this pull request Nov 21, 2023
PipeliningServerHandler was supposed to implement backpressure, but it turns out that auto read was still enabled and that the implementation didn't really work. This means that it would keep reading even if that means buffering data when the downstream can't keep up.

This PR disables auto read and fixes the read implementation in PipeliningServerHandler. In principle there should be no change to users, this just means that instead of buffering any incoming data internally, backpressure is now applied to the client.

This PR is based on #10138 but is separate for easier review. It also has conflicts with #10131.
@yawkat
Copy link
Member Author

yawkat commented Nov 22, 2023

fyi @dstepanov when i'm on vacation, feel free to look at the routing side but please leave the netty pipeline alone, i have changes already lined up for that.

@dstepanov
Copy link
Contributor

I'm working on optimizing filtering/routing. It looks like this change detected the overhead of it.

# Conflicts:
#	http-server-netty/src/main/java/io/micronaut/http/server/netty/HttpToHttpsRedirectHandler.java
#	http-server-netty/src/main/java/io/micronaut/http/server/netty/NettyHttpRequest.java
#	http-server-netty/src/main/java/io/micronaut/http/server/netty/RoutingInBoundHandler.java
#	http-server-netty/src/main/java/io/micronaut/http/server/netty/binders/NettyBodyAnnotationBinder.java
#	http-server-netty/src/test/groovy/io/micronaut/http/server/netty/stack/InvocationStackSpec.groovy
#	http-server-netty/src/test/groovy/io/micronaut/http/server/netty/websocket/WebSocketContextValidationFilter.java
#	router/src/main/java/io/micronaut/web/router/DefaultRouteInfo.java
#	router/src/main/java/io/micronaut/web/router/RouteInfo.java
# Conflicts:
#	gradle.properties
#	gradle/libs.versions.toml
#	http-client/src/main/java/io/micronaut/http/client/netty/DefaultHttpClient.java
#	inject-java/src/test/groovy/io/micronaut/visitors/ClassElementSpec.groovy
# Conflicts:
#	http-server/src/main/java/io/micronaut/http/server/exceptions/response/HateoasErrorResponseProcessor.java
# Conflicts:
#	http-server-netty/src/main/java/io/micronaut/http/server/netty/handler/PipeliningServerHandler.java
yawkat added a commit that referenced this pull request Jan 4, 2024
Decompression errors should be treated as DEBUG to prevent log spam.

I've also copied over some other fuzz tests from #10131. None of them were an issue except one caused by a bug in netty EmbeddedChannel ( netty/netty#13730 ).
yawkat added a commit that referenced this pull request Jan 4, 2024
This PR replaces the StreamedHttpRequest passed to RequestHandler by a separate ByteBody argument. This removes the need for some instanceof checks.

This is a small part of #10131, but it makes for annoying merge conflicts, so I want to pull it into 4.3.x. It should have no functional or performance difference.
yawkat added a commit that referenced this pull request Jan 4, 2024
Decompression errors should be treated as DEBUG to prevent log spam.

I've also copied over some other fuzz tests from #10131. None of them were an issue except one caused by a bug in netty EmbeddedChannel ( netty/netty#13730 ).
yawkat added a commit that referenced this pull request Jan 8, 2024
* Small RequestHandler refactor
This PR replaces the StreamedHttpRequest passed to RequestHandler by a separate ByteBody argument. This removes the need for some instanceof checks.

This is a small part of #10131, but it makes for annoying merge conflicts, so I want to pull it into 4.3.x. It should have no functional or performance difference.

* checkstyle

* fix merge
# Conflicts:
#	gradle/libs.versions.toml
#	http-server-netty/src/main/java/io/micronaut/http/server/netty/RoutingInBoundHandler.java
#	http-server-netty/src/main/java/io/micronaut/http/server/netty/handler/PipeliningServerHandler.java
#	http-server-netty/src/main/java/io/micronaut/http/server/netty/handler/RequestHandler.java
#	inject-java/src/test/groovy/io/micronaut/visitors/ClassElementSpec.groovy
yawkat added a commit that referenced this pull request Jan 8, 2024
This leads to a slight speedup in #10131 but can be merged without the rest of the PR. Not sure if this has impact outside of #10131.
sdelamo pushed a commit that referenced this pull request Jan 8, 2024
This leads to a slight speedup in #10131 but can be merged without the rest of the PR. Not sure if this has impact outside of #10131.
return ExecutionLeaf.indeterminate();
}
// CorsFilter is handled specially here. It's always present, so we can't bail, but it only does anything when the Origin header is set, which is checked in accept().
fixedFilters = fixedFilters.stream().filter(f -> !FilterRunner.isCorsFilter(f, CorsFilter.class)).toList();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this? Filters can now be filtered by request, eliminating the CORS filter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will profile

@CLAassistant
Copy link

CLAassistant commented Feb 7, 2024

CLA assistant check
All committers have signed the CLA.

@altro3
Copy link
Contributor

altro3 commented Apr 11, 2024

Is it dead improvement?

@yawkat
Copy link
Member Author

yawkat commented Apr 12, 2024

@altro3 we've decided that for now the marginal perf benefit does not outweigh the maintenance effort of duplicating the routing code paths. this decision may change in the future however

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: improvement A minor improvement to an existing feature
Projects
No open projects
Status: No status
Development

Successfully merging this pull request may close these issues.

5 participants