Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref(node): Improve Express URL Parameterization #5450

Merged
merged 8 commits into from
Jul 26, 2022
Merged

Conversation

Lms24
Copy link
Member

@Lms24 Lms24 commented Jul 22, 2022

This PR improves the URL parameterization for transaction names in our Express integration.

Previously, we would only obtain a parameterized version of the incoming request's URL after the registered handler(s) finished their job, right before we called transaction.finish(). This is definitely too late for DSC propagation, if the handlers made any outgoing request. In that case, the DSC (propagated via the baggage header) would not contain a transaction name.

As of this PR, we patch the Express.Router prototype, more precisely the process_params function. This function contains a reference to the incoming req object as well as to the layer that matched the route. We hook into the function to extract the matched and parameterized partial route of the layer and we attach it to the req object. This happens for every matched layer, until the entire route is resolved, in which stichtched together the entire parameterized route.

For the time being, we deliberately ignore route registrations with wildcards (e.g. app.get('/*', ...)) when stitching together the parameterized route. After we added a new part to the reconstructed route, we check if it has the same number of segments as the original (raw) URL. In case it has, we assume that the parameterized route is complete and we update the active transaction with the parameterized name. Additionally, we set the transaction source to route at this point. In case we never get to the point where we have an equal amount of URL segments, the transaction name is never updated, meaning its source stays url (and therefore, it's not added to the DSC). In that case, we continue to update the transaction name like before right before the end of the transaction.

After reading the Express source code, we confirmed that the process for resolving parts of a route and handling it is performed sequentially for each matched route segment. This means that each matched layer's handle function is invoked before the next part of the URL is matched. We therefore still have timing issues around DSC population if any of those intermediate handler functions make outgoing requests. A simple example:

The incoming request is /api/users/123

  • We intercept the request and start the transaction with the name /api/users/123 and source url
  • The first layer that matches is matches the route /*.
    • We start reconstructing the parameterized route with /
    • The handle function of this layer is invoked (e.g. to check authentication)
      • the handler makes an XHR request
      • at this point, we populate and freeze the DSC (without transaction name)
  • The second handler matches the route /api/users/123
    • We obtain the parameterized route and our reconstructed route is now /api/users/:id
    • Now we have 3 segments in the original and in the reconstructed route
    • We update the transaction name with /api/users/:id and set its source to route
    • The handle function of this layer is invoked
      • every request that might happen here will still propagate the DSC from layer 1 because it can't be modified
  • We finish the transaction

This example shows that we still have timing issues w.r.t DSC propagation. That is, assuming that the Node SDK is in this case the head of the trace and it didn't intercept incoming DSC from the incoming request.
However, with this PR we now at least have the chance to get the correct transaction name early enough.

ref: #5342

@Lms24 Lms24 self-assigned this Jul 22, 2022
@Lms24 Lms24 marked this pull request as ready for review July 22, 2022 15:09
@Lms24 Lms24 requested a review from lobsterkatie July 22, 2022 15:09
Copy link
Member

@lobsterkatie lobsterkatie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good!

In your PR description, where you say "The second handler matches the route /api/users/123," do you mean that to be the parameterized route instead?

packages/tracing/src/integrations/node/express.ts Outdated Show resolved Hide resolved
packages/tracing/src/integrations/node/express.ts Outdated Show resolved Hide resolved
packages/tracing/src/integrations/node/express.ts Outdated Show resolved Hide resolved
packages/tracing/src/integrations/node/express.ts Outdated Show resolved Hide resolved
packages/tracing/src/integrations/node/express.ts Outdated Show resolved Hide resolved
packages/utils/src/requestdata.ts Show resolved Hide resolved
packages/utils/src/requestdata.ts Outdated Show resolved Hide resolved
packages/utils/src/requestdata.ts Outdated Show resolved Hide resolved
@Lms24 Lms24 requested a review from lobsterkatie July 26, 2022 10:55
Copy link
Member

@lobsterkatie lobsterkatie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Lms24 Lms24 merged commit 5e9327f into master Jul 26, 2022
@Lms24 Lms24 deleted the lms-express-routes branch July 26, 2022 16:18
Lms24 added a commit that referenced this pull request Jul 28, 2022
make our URL parameterization for Express (introduced in #5450) compatible with RegEx-defined routes.

Previously, as reported in #5481, our parameterization logic would cause a crash because instead of a string, the matched route would be of type `RegExp`. This PR adjusts our logic so that we detect if we get a matched string our regex. In the latter case, we also append a `'/'` to the reconstructed partial route name so that the regex is closed.
Lms24 added a commit that referenced this pull request Aug 1, 2022
Fix an error thrown from our Node SDK that  when our Express router parameterization logic (introduced in #5450) would try to parameterize a route consisting of an array of paths (which could be strings or RegExes). 

Since a crucial part of our parameterization approach is to determine if the currently resolved layer is the "last" part of the route (in which case we update the transaction name and source), this patch also makes a few modifications to determine this correctly for arrays. In order to do so, we check for the number of URL segments in the original URL vs. that of the parameterized URL. In the case of arrays, we'll likely have more segments than in the raw URL path. Therefore, we balance this out by determining the number of extra segments we got from the array.

Additionally, added tests for array routes.
Lms24 added a commit that referenced this pull request Aug 1, 2022
Hotfixes a problem in our router instrumentation introduced in #5450 which would cause Express 5 Node apps to crash on startup. Because the internals of Express 5 (which is still in beta) have changed quite a bit compared to Express 4, there is unfortunately no quick way of supporting the new version in our current router instrumentation. 

Therefore, this patch simply checks if the router we retrieve from Express 4 apps (which we get from `app._router`) exists. In case it does, we can patch it; in case it doesn't, we know that the integration is either used with Express 3 or 5. In both cases, we early return and do not patch the router.

We can revisit adding proper support for early URL parameterization of Express 5 apps but for now, this PR will unblock Express 5 users by skipping instrumentation. This means that for Express 5, we fall back to our old way of instrumenting routes (which means we get paramterized routes for transaction names but only after the route handler was executed and the transaction is finished).

fixes #5501
@Lms24 Lms24 mentioned this pull request Aug 2, 2022
14 tasks
@brotherko
Copy link

Hi Team I'm wondering if this PR also solves express with a nested router?

@Lms24
Copy link
Member Author

Lms24 commented Dec 16, 2022

Hi @brotherko I think transaction-name-parameterization-wise nested routers should work. However, there are still missing spans on nested routers as outlined in #5510. If you want to try it out, feel free to do so and open an issue in case something is not working as expected. Thanks!

@brotherko
Copy link

brotherko commented Dec 20, 2022

Hi @Lms24 Thanks i'm not sure if we are referring to the same kind of parameterization here's the example

const route1 = Router();
const route2 = Router();

route1.use('/route2', route2)
route2.use('/:param1', (req, res) => res.send(`hello ${req.params.param1}`))

app.use('/route1', route1)

When I browse => route1/route2/someparameter
Sentry isn't able to capture the transaction as /route1/route2/:param1 but as route1/route2/someparameter

Is it the intended behavior? Or I'm just misusing express? thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants