Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Express 5 Instrumentation #4913

Open
wants to merge 27 commits into
base: master
Choose a base branch
from
Open

Express 5 Instrumentation #4913

wants to merge 27 commits into from

Conversation

IlyasShabi
Copy link
Contributor

@IlyasShabi IlyasShabi commented Nov 20, 2024

What does this PR do?

This PR adds instrumentation for Express5 to support its new features and behavior. Some related changes were previously made by @wconti27 please check their commits for reference.

Checklist

  • Path params: Instrumente handleRequest in Layer class in router library
  • Query string: Directly instrumente query getter in express
  • Response: Added instrumentation for json, jsonp, and render in response.js
  • Route: Instrumente router prototype for use and route methods
  • Enable support for tests across different Express versions
  • Skip Path not named widcard for Express5 since they are not supported
  • Implement request blocking exception handling (to discuss)
  • Fix router params callback tests
  • Fix RASP lfi and ssrf tests (not related to Express 🤔 )
  • Instrumente Node.js querystring used by express to parse query params
  • remove qs instrumentation

Copy link

github-actions bot commented Nov 20, 2024

Overall package size

Self size: 8.12 MB
Deduped: 94.62 MB
No deduping: 95.19 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | @datadog/libdatadog | 0.2.2 | 29.27 MB | 29.27 MB | | @datadog/native-appsec | 8.3.0 | 19.37 MB | 19.38 MB | | @datadog/native-iast-taint-tracking | 3.2.0 | 13.9 MB | 13.91 MB | | @datadog/pprof | 5.4.1 | 9.76 MB | 10.13 MB | | protobufjs | 7.2.5 | 2.77 MB | 5.16 MB | | @datadog/native-iast-rewriter | 2.5.0 | 2.51 MB | 2.65 MB | | @opentelemetry/core | 1.14.0 | 872.87 kB | 1.47 MB | | @datadog/native-metrics | 3.0.1 | 1.06 MB | 1.46 MB | | @opentelemetry/api | 1.8.0 | 1.21 MB | 1.21 MB | | import-in-the-middle | 1.11.2 | 112.74 kB | 826.22 kB | | msgpack-lite | 0.1.26 | 201.16 kB | 281.59 kB | | source-map | 0.7.4 | 226 kB | 226 kB | | opentracing | 0.14.7 | 194.81 kB | 194.81 kB | | lru-cache | 7.18.3 | 133.92 kB | 133.92 kB | | pprof-format | 2.1.0 | 111.69 kB | 111.69 kB | | @datadog/sketches-js | 2.1.0 | 109.9 kB | 109.9 kB | | semver | 7.6.3 | 95.82 kB | 95.82 kB | | lodash.sortby | 4.7.0 | 75.76 kB | 75.76 kB | | ignore | 5.3.1 | 51.46 kB | 51.46 kB | | int64-buffer | 0.1.10 | 49.18 kB | 49.18 kB | | shell-quote | 1.8.1 | 44.96 kB | 44.96 kB | | istanbul-lib-coverage | 3.2.0 | 29.34 kB | 29.34 kB | | rfdc | 1.3.1 | 25.21 kB | 25.21 kB | | @isaacs/ttlcache | 1.4.1 | 25.2 kB | 25.2 kB | | tlhunter-sorted-set | 0.1.0 | 24.94 kB | 24.94 kB | | limiter | 1.1.5 | 23.17 kB | 23.17 kB | | dc-polyfill | 0.1.4 | 23.1 kB | 23.1 kB | | retry | 0.13.1 | 18.85 kB | 18.85 kB | | jest-docblock | 29.7.0 | 8.99 kB | 12.76 kB | | crypto-randomuuid | 1.0.0 | 11.18 kB | 11.18 kB | | koalas | 1.0.2 | 6.47 kB | 6.47 kB | | path-to-regexp | 0.1.10 | 6.38 kB | 6.38 kB | | module-details-from-path | 1.0.3 | 4.47 kB | 4.47 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

@IlyasShabi IlyasShabi changed the title Express5 Express 5 Instrumentation Nov 20, 2024
Copy link
Contributor

@CarlesDD CarlesDD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is worth reviewing this and optimizing it to avoid invoking the query getter 3 times.

@IlyasShabi
Copy link
Contributor Author

Yes I forgot to push this change too @CarlesDD

Copy link

codecov bot commented Nov 21, 2024

Codecov Report

Attention: Patch coverage is 33.33333% with 2 lines in your changes missing coverage. Please review.

Project coverage is 50.82%. Comparing base (0b4dab7) to head (efd2f71).
Report is 38 commits behind head on master.

Files with missing lines Patch % Lines
.../dd-trace/src/appsec/iast/taint-tracking/plugin.js 33.33% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #4913       +/-   ##
===========================================
- Coverage   93.86%   50.82%   -43.04%     
===========================================
  Files         107       94       -13     
  Lines        3373     2601      -772     
===========================================
- Hits         3166     1322     -1844     
- Misses        207     1279     +1072     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@pr-commenter
Copy link

pr-commenter bot commented Nov 21, 2024

Benchmarks

Benchmark execution time: 2024-11-27 15:20:31

Comparing candidate commit c44d539 in PR branch express5 with baseline commit d3b9b3a in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 264 metrics, 2 unstable metrics.

@IlyasShabi IlyasShabi force-pushed the express5 branch 4 times, most recently from 11d95b3 to d33e0be Compare November 22, 2024 09:58
@IlyasShabi IlyasShabi force-pushed the express5 branch 2 times, most recently from 0e039c8 to 22aa8c6 Compare November 25, 2024 13:58
@IlyasShabi IlyasShabi force-pushed the express5 branch 2 times, most recently from 27566d4 to ef71918 Compare November 25, 2024 14:19
@@ -98,7 +98,7 @@
},
{
"name": "express",
"versions": [">=4", ">=4.0.0 <4.3.0", ">=4.3.0"]
"versions": [">=4", ">=4.0.0 <4.3.0", ">=4.0.0 <5.0.0", ">=4.3.0 <5.0.0"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IlyasShabi IlyasShabi marked this pull request as ready for review November 25, 2024 14:22
@IlyasShabi IlyasShabi requested review from a team as code owners November 25, 2024 14:22
Comment on lines 164 to 168
const abortController = new AbortController()

queryParserReadCh.publish({ req: this, res: this.res, query, abortController })

if (abortController.signal.aborted) return
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the right place to make iast work, because we need to taint the values each time the request is called, but is this the right place to use the abortController and call to the waf? Calling to the waf each time that req.query code is in the app, could be quite expensive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically the parser can change during the middle of the request maybe ? not sure the edge case is worth the perf decrease

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah maybe the best thing to do would be to just remove the waf call here, and instead call it in the router handle request hook

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you about how expensive this operation could be, but just to clarify:

  1. In express v4, we don’t taint the query string using express’ query method but instead rely on the qs parse function.
  2. IMO, we should harmonize how we taint query strings between express v4 and express v5. This could be achieved either by using both qs and querystring (though this might lead to false positives since we can’t confirm if the source is from express) or by keeping the getter instrumentation here and using it for IAST, as you suggested.
    We could also apply the same approach to express v4 and then (remove ?) the qs and querystring instrumentation altogether.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely, we can remove the qs instrumentation to do it in the same way in express v4 and v5.
I can't recall why are we instrumenting the qs and not reading the query in other position of the code, I think that we can move all to express instrumentation.

Not necessarily in this PR, if you see that could be done easily, go ahead, but we can change express v4 stuff in other PR.

const shimmer = require('../../datadog-shimmer')
const names = ['querystring', 'node:querystring']

const querystringParseCh = channel('datadog:querystring:parse:finish')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this? isn't this redundant?
We have instrumentation for req.query getter in express, why do we need to instrument this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking maybe for IAST, but if you're also confused then idk hahaha

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test fails because in express v4 queries are read directly via query middleware, and IAST subscribes to this event here.

In express v5, query parsing uses both qs (same as express v4) and querystring (code here), so we need to instrument querystring to ensure the same functionality.

Comment on lines 189 to 193
req.query
// If query parsing was aborted, don't continue request handling
if (res.writableEnded) {
return
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit hacky and feature related, I think that we should send a custom event in diagnostic_channels in this point, call to the waf and use the AbortController to decide if we should continue or not.

return router
}

WrappedRouter.prototype = originalRouter.prototype
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shimmer.wrapFunction should deal with these things.

Comment on lines 80 to 86
addHook({ name: 'express', file: ['lib/response.js'], versions: ['>=5.0.0'] }, response => {
shimmer.wrap(response, 'json', wrapResponseJson)
shimmer.wrap(response, 'jsonp', wrapResponseJson)
shimmer.wrap(response, 'render', wrapResponseRender)

return response
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this change needed ? express branch v5.x still has the proto exposed no ?
https://github.com/expressjs/express/blob/5.x/lib/express.js#L63-L64

and unrelated but yes i think that would work for express 4 too.

shimmer.wrap(express.Router, 'process_params', wrapProcessParamsMethod(2))
return express
})

addHook({ name: 'express', file: ['lib/express.js'], versions: ['>=5.0.0'] }, express => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bit above you're hooking on express index, and here on lib/express.js
But the express index is just a one line that re-exports lib/express.js, so why not merge those two hooks ?

const shimmer = require('../../datadog-shimmer')
const names = ['querystring', 'node:querystring']

const querystringParseCh = channel('datadog:querystring:parse:finish')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking maybe for IAST, but if you're also confused then idk hahaha

shimmer.wrapFunction(express, function (original) {
const app = original.call(this, arguments)
const requestProto = Object.getPrototypeOf(app.request)
const requestDescriptor = Object.getOwnPropertyDescriptor(requestProto, 'query')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't work ?

Suggested change
const requestDescriptor = Object.getOwnPropertyDescriptor(requestProto, 'query')
const requestDescriptor = Object.getOwnPropertyDescriptor(express.request.prototype, 'query')

Copy link
Contributor Author

@IlyasShabi IlyasShabi Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this gonna work, we need to execute createApplication first in order to have an app object here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we tried it together and apparently it works 🤷

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

return Layer
})

function wrapParam (original) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need two hooks for params ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've tried together and apparently only one i needed 👍 make sure to double check there is no edge case

Copy link
Contributor

@wconti27 wconti27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM so far, I will re-review once the outstanding comments are resolved.

return function wrappedMethod (options) {
const router = originalRouter.call(this, options)

if (!router._queryParsingWrapped) {
Copy link
Contributor Author

@IlyasShabi IlyasShabi Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not modify router object, use weakSet instead
But first check if it's needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uurien after checking express 5 codebase, we do call handle function only once not at each use(...)
I removed _queryParsingWrapped and seems to be good

@IlyasShabi IlyasShabi marked this pull request as draft November 27, 2024 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants