From 273056b74d24535294ac9950032188ee8d1ede15 Mon Sep 17 00:00:00 2001 From: Trask Stalnaker Date: Sun, 10 Nov 2024 11:38:10 -0800 Subject: [PATCH] Specific URL query string values should be redacted (#971) Co-authored-by: Liudmila Molkova --- .chloggen/971.yaml | 22 +++++++++++++++++++ docs/attributes-registry/url.md | 39 ++++++++++++++++++++++++++++++--- docs/database/elasticsearch.md | 26 +++++++++++++++++++--- docs/http/http-spans.md | 39 ++++++++++++++++++++++++++++++--- docs/url/url.md | 39 ++++++++++++++++++++++++++++++--- model/url/registry.yaml | 32 +++++++++++++++++++++++++-- 6 files changed, 183 insertions(+), 14 deletions(-) create mode 100644 .chloggen/971.yaml diff --git a/.chloggen/971.yaml b/.chloggen/971.yaml new file mode 100644 index 0000000000..874082aab4 --- /dev/null +++ b/.chloggen/971.yaml @@ -0,0 +1,22 @@ +# Use this changelog template to create an entry for release notes. +# +# If your change doesn't affect end users you should instead start +# your pull request title with [chore] or use the "Skip Changelog" label. + +# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix' +change_type: bug_fix + +# The name of the area of concern in the attributes-registry, (e.g. http, cloud, db) +component: url + +# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`). +note: Specific URL query string values should now be redacted by default due to concerns around leaking credentials. + +# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists. +# The values here must be integers. +issues: [ 971 ] + +# (Optional) One or more lines of additional information to render under the primary note. +# These lines will be padded with 2 spaces and then inserted directly into the document. +# Use pipe (|) for multiline entries. +subtext: diff --git a/docs/attributes-registry/url.md b/docs/attributes-registry/url.md index ba44a2f815..740bcfd7b1 100644 --- a/docs/attributes-registry/url.md +++ b/docs/attributes-registry/url.md @@ -30,9 +30,29 @@ Attributes describing URL. **[2]:** The file extension is only set if it exists, as not every url has a file extension. When the file name has multiple extensions `example.tar.gz`, only the last one should be captured `gz`, not `tar.gz`. -**[3]:** For network calls, URL usually has `scheme://host[:port][path][?query][#fragment]` format, where the fragment is not transmitted over HTTP, but if it is known, it SHOULD be included nevertheless. -`url.full` MUST NOT contain credentials passed via URL in form of `https://username:password@www.example.com/`. In such case username and password SHOULD be redacted and attribute's value SHOULD be `https://REDACTED:REDACTED@www.example.com/`. -`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed). Sensitive content provided in `url.full` SHOULD be scrubbed when instrumentations can identify it. +**[3]:** For network calls, URL usually has `scheme://host[:port][path][?query][#fragment]` format, where the fragment +is not transmitted over HTTP, but if it is known, it SHOULD be included nevertheless. + +`url.full` MUST NOT contain credentials passed via URL in form of `https://username:password@www.example.com/`. +In such case username and password SHOULD be redacted and attribute's value SHOULD be `https://REDACTED:REDACTED@www.example.com/`. + +`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed). + +Sensitive content provided in `url.full` SHOULD be scrubbed when instrumentations can identify it. + +![Experimental](https://img.shields.io/badge/-experimental-blue) +Query string values for the following keys SHOULD be redacted by default and replaced by the +value `REDACTED`: + +* [`AWSAccessKeyId`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth) +* [`Signature`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth) +* [`sig`](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview#sas-token) +* [`X-Goog-Signature`](https://cloud.google.com/storage/docs/access-control/signed-urls) + +This list is subject to change over time. + +When a query string value is redacted, the query string key SHOULD still be preserved, e.g. +`https://www.example.com/path?color=blue&sig=REDACTED`. **[4]:** In network monitoring, the observed URL may be a full URL, whereas in access logs, the URL is often just represented as a path. This field is meant to represent the URL as it was observed, complete or not. `url.original` might contain credentials passed via URL in form of `https://username:password@www.example.com/`. In such case password and username SHOULD NOT be redacted and attribute's value SHOULD remain the same. @@ -41,6 +61,19 @@ Attributes describing URL. **[6]:** Sensitive content provided in `url.query` SHOULD be scrubbed when instrumentations can identify it. +![Experimental](https://img.shields.io/badge/-experimental-blue) +Query string values for the following keys SHOULD be redacted by default and replaced by the value `REDACTED`: + +* [`AWSAccessKeyId`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth) +* [`Signature`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth) +* [`sig`](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview#sas-token) +* [`X-Goog-Signature`](https://cloud.google.com/storage/docs/access-control/signed-urls) + +This list is subject to change over time. + +When a query string value is redacted, the query string key SHOULD still be preserved, e.g. +`q=OpenTelemetry&sig=REDACTED`. + **[7]:** This value can be determined precisely with the [public suffix list](http://publicsuffix.org). For example, the registered domain for `foo.example.com` is `example.com`. Trying to approximate this by simply taking the last two labels will not work well for TLDs such as `co.uk`. **[8]:** The subdomain portion of `www.east.mydomain.co.uk` is `east`. If the domain has multiple levels of subdomain, such as `sub2.sub1.example.com`, the subdomain field should contain `sub2.sub1`, with no trailing period. diff --git a/docs/database/elasticsearch.md b/docs/database/elasticsearch.md index 2cccdc5a36..a536172f07 100644 --- a/docs/database/elasticsearch.md +++ b/docs/database/elasticsearch.md @@ -56,9 +56,29 @@ HTTP method names are case-sensitive and `http.request.method` attribute value M Instrumentations for specific web frameworks that consider HTTP methods to be case insensitive, SHOULD populate a canonical equivalent. Tracing instrumentations that do so, MUST also set `http.request.method_original` to the original value. -**[3]:** For network calls, URL usually has `scheme://host[:port][path][?query][#fragment]` format, where the fragment is not transmitted over HTTP, but if it is known, it SHOULD be included nevertheless. -`url.full` MUST NOT contain credentials passed via URL in form of `https://username:password@www.example.com/`. In such case username and password SHOULD be redacted and attribute's value SHOULD be `https://REDACTED:REDACTED@www.example.com/`. -`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed). Sensitive content provided in `url.full` SHOULD be scrubbed when instrumentations can identify it. +**[3]:** For network calls, URL usually has `scheme://host[:port][path][?query][#fragment]` format, where the fragment +is not transmitted over HTTP, but if it is known, it SHOULD be included nevertheless. + +`url.full` MUST NOT contain credentials passed via URL in form of `https://username:password@www.example.com/`. +In such case username and password SHOULD be redacted and attribute's value SHOULD be `https://REDACTED:REDACTED@www.example.com/`. + +`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed). + +Sensitive content provided in `url.full` SHOULD be scrubbed when instrumentations can identify it. + +![Experimental](https://img.shields.io/badge/-experimental-blue) +Query string values for the following keys SHOULD be redacted by default and replaced by the +value `REDACTED`: + +* [`AWSAccessKeyId`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth) +* [`Signature`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth) +* [`sig`](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview#sas-token) +* [`X-Goog-Signature`](https://cloud.google.com/storage/docs/access-control/signed-urls) + +This list is subject to change over time. + +When a query string value is redacted, the query string key SHOULD still be preserved, e.g. +`https://www.example.com/path?color=blue&sig=REDACTED`. **[4]:** Many Elasticsearch url paths allow dynamic values. These SHOULD be recorded in span attributes in the format `db.elasticsearch.path_parts.`, where `` is the url path part name. The implementation SHOULD reference the [elasticsearch schema](https://raw.githubusercontent.com/elastic/elasticsearch-specification/main/output/schema/schema.json) in order to map the path part values to their names. diff --git a/docs/http/http-spans.md b/docs/http/http-spans.md index 0044d83fce..2073e6891f 100644 --- a/docs/http/http-spans.md +++ b/docs/http/http-spans.md @@ -182,9 +182,29 @@ Tracing instrumentations that do so, MUST also set `http.request.method_original **[3]:** When observed from the client side, and when communicating through an intermediary, `server.port` SHOULD represent the server port behind any intermediaries, for example proxies, if it's available. -**[4]:** For network calls, URL usually has `scheme://host[:port][path][?query][#fragment]` format, where the fragment is not transmitted over HTTP, but if it is known, it SHOULD be included nevertheless. -`url.full` MUST NOT contain credentials passed via URL in form of `https://username:password@www.example.com/`. In such case username and password SHOULD be redacted and attribute's value SHOULD be `https://REDACTED:REDACTED@www.example.com/`. -`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed). Sensitive content provided in `url.full` SHOULD be scrubbed when instrumentations can identify it. +**[4]:** For network calls, URL usually has `scheme://host[:port][path][?query][#fragment]` format, where the fragment +is not transmitted over HTTP, but if it is known, it SHOULD be included nevertheless. + +`url.full` MUST NOT contain credentials passed via URL in form of `https://username:password@www.example.com/`. +In such case username and password SHOULD be redacted and attribute's value SHOULD be `https://REDACTED:REDACTED@www.example.com/`. + +`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed). + +Sensitive content provided in `url.full` SHOULD be scrubbed when instrumentations can identify it. + +![Experimental](https://img.shields.io/badge/-experimental-blue) +Query string values for the following keys SHOULD be redacted by default and replaced by the +value `REDACTED`: + +* [`AWSAccessKeyId`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth) +* [`Signature`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth) +* [`sig`](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview#sas-token) +* [`X-Goog-Signature`](https://cloud.google.com/storage/docs/access-control/signed-urls) + +This list is subject to change over time. + +When a query string value is redacted, the query string key SHOULD still be preserved, e.g. +`https://www.example.com/path?color=blue&sig=REDACTED`. **[5]:** If the request fails with an error before response status code was sent or received, `error.type` SHOULD be set to exception type (its fully-qualified class name, if applicable) @@ -434,6 +454,19 @@ SHOULD include the [application root](/docs/http/http-spans.md#http-server-defin **[10]:** Sensitive content provided in `url.query` SHOULD be scrubbed when instrumentations can identify it. +![Experimental](https://img.shields.io/badge/-experimental-blue) +Query string values for the following keys SHOULD be redacted by default and replaced by the value `REDACTED`: + +* [`AWSAccessKeyId`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth) +* [`Signature`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth) +* [`sig`](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview#sas-token) +* [`X-Goog-Signature`](https://cloud.google.com/storage/docs/access-control/signed-urls) + +This list is subject to change over time. + +When a query string value is redacted, the query string key SHOULD still be preserved, e.g. +`q=OpenTelemetry&sig=REDACTED`. + **[11]:** The IP address of the original client behind all proxies, if known (e.g. from [Forwarded#for](https://developer.mozilla.org/docs/Web/HTTP/Headers/Forwarded#for), [X-Forwarded-For](https://developer.mozilla.org/docs/Web/HTTP/Headers/X-Forwarded-For), or a similar header). Otherwise, the immediate client peer address. **[12]:** If protocol version is subject to negotiation (for example using [ALPN](https://www.rfc-editor.org/rfc/rfc7301.html)), this attribute SHOULD be set to the negotiated version. If the actual protocol version is not known, this attribute SHOULD NOT be set. diff --git a/docs/url/url.md b/docs/url/url.md index 2128ddc0f4..af52260559 100644 --- a/docs/url/url.md +++ b/docs/url/url.md @@ -37,14 +37,47 @@ This document defines semantic conventions that describe URL and its components. | [`url.query`](/docs/attributes-registry/url.md) | string | The [URI query](https://www.rfc-editor.org/rfc/rfc3986#section-3.4) component [3] | `q=OpenTelemetry` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | [`url.scheme`](/docs/attributes-registry/url.md) | string | The [URI scheme](https://www.rfc-editor.org/rfc/rfc3986#section-3.1) component identifying the used protocol. | `https`; `ftp`; `telnet` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | -**[1]:** For network calls, URL usually has `scheme://host[:port][path][?query][#fragment]` format, where the fragment is not transmitted over HTTP, but if it is known, it SHOULD be included nevertheless. -`url.full` MUST NOT contain credentials passed via URL in form of `https://username:password@www.example.com/`. In such case username and password SHOULD be redacted and attribute's value SHOULD be `https://REDACTED:REDACTED@www.example.com/`. -`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed). Sensitive content provided in `url.full` SHOULD be scrubbed when instrumentations can identify it. +**[1]:** For network calls, URL usually has `scheme://host[:port][path][?query][#fragment]` format, where the fragment +is not transmitted over HTTP, but if it is known, it SHOULD be included nevertheless. + +`url.full` MUST NOT contain credentials passed via URL in form of `https://username:password@www.example.com/`. +In such case username and password SHOULD be redacted and attribute's value SHOULD be `https://REDACTED:REDACTED@www.example.com/`. + +`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed). + +Sensitive content provided in `url.full` SHOULD be scrubbed when instrumentations can identify it. + +![Experimental](https://img.shields.io/badge/-experimental-blue) +Query string values for the following keys SHOULD be redacted by default and replaced by the +value `REDACTED`: + +* [`AWSAccessKeyId`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth) +* [`Signature`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth) +* [`sig`](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview#sas-token) +* [`X-Goog-Signature`](https://cloud.google.com/storage/docs/access-control/signed-urls) + +This list is subject to change over time. + +When a query string value is redacted, the query string key SHOULD still be preserved, e.g. +`https://www.example.com/path?color=blue&sig=REDACTED`. **[2]:** Sensitive content provided in `url.path` SHOULD be scrubbed when instrumentations can identify it. **[3]:** Sensitive content provided in `url.query` SHOULD be scrubbed when instrumentations can identify it. +![Experimental](https://img.shields.io/badge/-experimental-blue) +Query string values for the following keys SHOULD be redacted by default and replaced by the value `REDACTED`: + +* [`AWSAccessKeyId`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth) +* [`Signature`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth) +* [`sig`](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview#sas-token) +* [`X-Goog-Signature`](https://cloud.google.com/storage/docs/access-control/signed-urls) + +This list is subject to change over time. + +When a query string value is redacted, the query string key SHOULD still be preserved, e.g. +`q=OpenTelemetry&sig=REDACTED`. + diff --git a/model/url/registry.yaml b/model/url/registry.yaml index dc06b6b2fe..d57c04733e 100644 --- a/model/url/registry.yaml +++ b/model/url/registry.yaml @@ -40,7 +40,7 @@ groups: stability: stable type: string brief: Absolute URL describing a network resource according to [RFC3986](https://www.rfc-editor.org/rfc/rfc3986) - note: > + note: | For network calls, URL usually has `scheme://host[:port][path][?query][#fragment]` format, where the fragment is not transmitted over HTTP, but if it is known, it SHOULD be included nevertheless. @@ -48,7 +48,22 @@ groups: In such case username and password SHOULD be redacted and attribute's value SHOULD be `https://REDACTED:REDACTED@www.example.com/`. `url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed). + Sensitive content provided in `url.full` SHOULD be scrubbed when instrumentations can identify it. + + ![Experimental](https://img.shields.io/badge/-experimental-blue) + Query string values for the following keys SHOULD be redacted by default and replaced by the + value `REDACTED`: + + * [`AWSAccessKeyId`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth) + * [`Signature`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth) + * [`sig`](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview#sas-token) + * [`X-Goog-Signature`](https://cloud.google.com/storage/docs/access-control/signed-urls) + + This list is subject to change over time. + + When a query string value is redacted, the query string key SHOULD still be preserved, e.g. + `https://www.example.com/path?color=blue&sig=REDACTED`. examples: ["https://www.foo.bar/search?q=OpenTelemetry#SemConv", "//localhost"] - id: url.original @@ -87,8 +102,21 @@ groups: brief: > The [URI query](https://www.rfc-editor.org/rfc/rfc3986#section-3.4) component examples: ["q=OpenTelemetry"] - note: > + note: | Sensitive content provided in `url.query` SHOULD be scrubbed when instrumentations can identify it. + + ![Experimental](https://img.shields.io/badge/-experimental-blue) + Query string values for the following keys SHOULD be redacted by default and replaced by the value `REDACTED`: + + * [`AWSAccessKeyId`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth) + * [`Signature`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth) + * [`sig`](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview#sas-token) + * [`X-Goog-Signature`](https://cloud.google.com/storage/docs/access-control/signed-urls) + + This list is subject to change over time. + + When a query string value is redacted, the query string key SHOULD still be preserved, e.g. + `q=OpenTelemetry&sig=REDACTED`. - id: url.registered_domain type: string stability: experimental