Skip to content

Commit

Permalink
Specific URL query string values should be redacted (#971)
Browse files Browse the repository at this point in the history
Co-authored-by: Liudmila Molkova <[email protected]>
  • Loading branch information
trask and lmolkova authored Nov 10, 2024
1 parent 43f9e30 commit 273056b
Show file tree
Hide file tree
Showing 6 changed files with 183 additions and 14 deletions.
22 changes: 22 additions & 0 deletions .chloggen/971.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Use this changelog template to create an entry for release notes.
#
# If your change doesn't affect end users you should instead start
# your pull request title with [chore] or use the "Skip Changelog" label.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: bug_fix

# The name of the area of concern in the attributes-registry, (e.g. http, cloud, db)
component: url

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Specific URL query string values should now be redacted by default due to concerns around leaking credentials.

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
# The values here must be integers.
issues: [ 971 ]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:
39 changes: 36 additions & 3 deletions docs/attributes-registry/url.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,29 @@ Attributes describing URL.

**[2]:** The file extension is only set if it exists, as not every url has a file extension. When the file name has multiple extensions `example.tar.gz`, only the last one should be captured `gz`, not `tar.gz`.

**[3]:** For network calls, URL usually has `scheme://host[:port][path][?query][#fragment]` format, where the fragment is not transmitted over HTTP, but if it is known, it SHOULD be included nevertheless.
`url.full` MUST NOT contain credentials passed via URL in form of `https://username:[email protected]/`. In such case username and password SHOULD be redacted and attribute's value SHOULD be `https://REDACTED:[email protected]/`.
`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed). Sensitive content provided in `url.full` SHOULD be scrubbed when instrumentations can identify it.
**[3]:** For network calls, URL usually has `scheme://host[:port][path][?query][#fragment]` format, where the fragment
is not transmitted over HTTP, but if it is known, it SHOULD be included nevertheless.

`url.full` MUST NOT contain credentials passed via URL in form of `https://username:[email protected]/`.
In such case username and password SHOULD be redacted and attribute's value SHOULD be `https://REDACTED:[email protected]/`.

`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed).

Sensitive content provided in `url.full` SHOULD be scrubbed when instrumentations can identify it.

![Experimental](https://img.shields.io/badge/-experimental-blue)
Query string values for the following keys SHOULD be redacted by default and replaced by the
value `REDACTED`:

* [`AWSAccessKeyId`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth)
* [`Signature`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth)
* [`sig`](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview#sas-token)
* [`X-Goog-Signature`](https://cloud.google.com/storage/docs/access-control/signed-urls)

This list is subject to change over time.

When a query string value is redacted, the query string key SHOULD still be preserved, e.g.
`https://www.example.com/path?color=blue&sig=REDACTED`.

**[4]:** In network monitoring, the observed URL may be a full URL, whereas in access logs, the URL is often just represented as a path. This field is meant to represent the URL as it was observed, complete or not.
`url.original` might contain credentials passed via URL in form of `https://username:[email protected]/`. In such case password and username SHOULD NOT be redacted and attribute's value SHOULD remain the same.
Expand All @@ -41,6 +61,19 @@ Attributes describing URL.

**[6]:** Sensitive content provided in `url.query` SHOULD be scrubbed when instrumentations can identify it.

![Experimental](https://img.shields.io/badge/-experimental-blue)
Query string values for the following keys SHOULD be redacted by default and replaced by the value `REDACTED`:

* [`AWSAccessKeyId`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth)
* [`Signature`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth)
* [`sig`](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview#sas-token)
* [`X-Goog-Signature`](https://cloud.google.com/storage/docs/access-control/signed-urls)

This list is subject to change over time.

When a query string value is redacted, the query string key SHOULD still be preserved, e.g.
`q=OpenTelemetry&sig=REDACTED`.

**[7]:** This value can be determined precisely with the [public suffix list](http://publicsuffix.org). For example, the registered domain for `foo.example.com` is `example.com`. Trying to approximate this by simply taking the last two labels will not work well for TLDs such as `co.uk`.

**[8]:** The subdomain portion of `www.east.mydomain.co.uk` is `east`. If the domain has multiple levels of subdomain, such as `sub2.sub1.example.com`, the subdomain field should contain `sub2.sub1`, with no trailing period.
Expand Down
26 changes: 23 additions & 3 deletions docs/database/elasticsearch.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,9 +56,29 @@ HTTP method names are case-sensitive and `http.request.method` attribute value M
Instrumentations for specific web frameworks that consider HTTP methods to be case insensitive, SHOULD populate a canonical equivalent.
Tracing instrumentations that do so, MUST also set `http.request.method_original` to the original value.

**[3]:** For network calls, URL usually has `scheme://host[:port][path][?query][#fragment]` format, where the fragment is not transmitted over HTTP, but if it is known, it SHOULD be included nevertheless.
`url.full` MUST NOT contain credentials passed via URL in form of `https://username:[email protected]/`. In such case username and password SHOULD be redacted and attribute's value SHOULD be `https://REDACTED:[email protected]/`.
`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed). Sensitive content provided in `url.full` SHOULD be scrubbed when instrumentations can identify it.
**[3]:** For network calls, URL usually has `scheme://host[:port][path][?query][#fragment]` format, where the fragment
is not transmitted over HTTP, but if it is known, it SHOULD be included nevertheless.

`url.full` MUST NOT contain credentials passed via URL in form of `https://username:[email protected]/`.
In such case username and password SHOULD be redacted and attribute's value SHOULD be `https://REDACTED:[email protected]/`.

`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed).

Sensitive content provided in `url.full` SHOULD be scrubbed when instrumentations can identify it.

![Experimental](https://img.shields.io/badge/-experimental-blue)
Query string values for the following keys SHOULD be redacted by default and replaced by the
value `REDACTED`:

* [`AWSAccessKeyId`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth)
* [`Signature`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth)
* [`sig`](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview#sas-token)
* [`X-Goog-Signature`](https://cloud.google.com/storage/docs/access-control/signed-urls)

This list is subject to change over time.

When a query string value is redacted, the query string key SHOULD still be preserved, e.g.
`https://www.example.com/path?color=blue&sig=REDACTED`.

**[4]:** Many Elasticsearch url paths allow dynamic values. These SHOULD be recorded in span attributes in the format `db.elasticsearch.path_parts.<key>`, where `<key>` is the url path part name. The implementation SHOULD reference the [elasticsearch schema](https://raw.githubusercontent.com/elastic/elasticsearch-specification/main/output/schema/schema.json) in order to map the path part values to their names.

Expand Down
39 changes: 36 additions & 3 deletions docs/http/http-spans.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,9 +182,29 @@ Tracing instrumentations that do so, MUST also set `http.request.method_original

**[3]:** When observed from the client side, and when communicating through an intermediary, `server.port` SHOULD represent the server port behind any intermediaries, for example proxies, if it's available.

**[4]:** For network calls, URL usually has `scheme://host[:port][path][?query][#fragment]` format, where the fragment is not transmitted over HTTP, but if it is known, it SHOULD be included nevertheless.
`url.full` MUST NOT contain credentials passed via URL in form of `https://username:[email protected]/`. In such case username and password SHOULD be redacted and attribute's value SHOULD be `https://REDACTED:[email protected]/`.
`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed). Sensitive content provided in `url.full` SHOULD be scrubbed when instrumentations can identify it.
**[4]:** For network calls, URL usually has `scheme://host[:port][path][?query][#fragment]` format, where the fragment
is not transmitted over HTTP, but if it is known, it SHOULD be included nevertheless.

`url.full` MUST NOT contain credentials passed via URL in form of `https://username:[email protected]/`.
In such case username and password SHOULD be redacted and attribute's value SHOULD be `https://REDACTED:[email protected]/`.

`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed).

Sensitive content provided in `url.full` SHOULD be scrubbed when instrumentations can identify it.

![Experimental](https://img.shields.io/badge/-experimental-blue)
Query string values for the following keys SHOULD be redacted by default and replaced by the
value `REDACTED`:

* [`AWSAccessKeyId`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth)
* [`Signature`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth)
* [`sig`](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview#sas-token)
* [`X-Goog-Signature`](https://cloud.google.com/storage/docs/access-control/signed-urls)

This list is subject to change over time.

When a query string value is redacted, the query string key SHOULD still be preserved, e.g.
`https://www.example.com/path?color=blue&sig=REDACTED`.

**[5]:** If the request fails with an error before response status code was sent or received,
`error.type` SHOULD be set to exception type (its fully-qualified class name, if applicable)
Expand Down Expand Up @@ -434,6 +454,19 @@ SHOULD include the [application root](/docs/http/http-spans.md#http-server-defin

**[10]:** Sensitive content provided in `url.query` SHOULD be scrubbed when instrumentations can identify it.

![Experimental](https://img.shields.io/badge/-experimental-blue)
Query string values for the following keys SHOULD be redacted by default and replaced by the value `REDACTED`:

* [`AWSAccessKeyId`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth)
* [`Signature`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth)
* [`sig`](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview#sas-token)
* [`X-Goog-Signature`](https://cloud.google.com/storage/docs/access-control/signed-urls)

This list is subject to change over time.

When a query string value is redacted, the query string key SHOULD still be preserved, e.g.
`q=OpenTelemetry&sig=REDACTED`.

**[11]:** The IP address of the original client behind all proxies, if known (e.g. from [Forwarded#for](https://developer.mozilla.org/docs/Web/HTTP/Headers/Forwarded#for), [X-Forwarded-For](https://developer.mozilla.org/docs/Web/HTTP/Headers/X-Forwarded-For), or a similar header). Otherwise, the immediate client peer address.

**[12]:** If protocol version is subject to negotiation (for example using [ALPN](https://www.rfc-editor.org/rfc/rfc7301.html)), this attribute SHOULD be set to the negotiated version. If the actual protocol version is not known, this attribute SHOULD NOT be set.
Expand Down
39 changes: 36 additions & 3 deletions docs/url/url.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,14 +37,47 @@ This document defines semantic conventions that describe URL and its components.
| [`url.query`](/docs/attributes-registry/url.md) | string | The [URI query](https://www.rfc-editor.org/rfc/rfc3986#section-3.4) component [3] | `q=OpenTelemetry` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
| [`url.scheme`](/docs/attributes-registry/url.md) | string | The [URI scheme](https://www.rfc-editor.org/rfc/rfc3986#section-3.1) component identifying the used protocol. | `https`; `ftp`; `telnet` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |

**[1]:** For network calls, URL usually has `scheme://host[:port][path][?query][#fragment]` format, where the fragment is not transmitted over HTTP, but if it is known, it SHOULD be included nevertheless.
`url.full` MUST NOT contain credentials passed via URL in form of `https://username:[email protected]/`. In such case username and password SHOULD be redacted and attribute's value SHOULD be `https://REDACTED:[email protected]/`.
`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed). Sensitive content provided in `url.full` SHOULD be scrubbed when instrumentations can identify it.
**[1]:** For network calls, URL usually has `scheme://host[:port][path][?query][#fragment]` format, where the fragment
is not transmitted over HTTP, but if it is known, it SHOULD be included nevertheless.

`url.full` MUST NOT contain credentials passed via URL in form of `https://username:[email protected]/`.
In such case username and password SHOULD be redacted and attribute's value SHOULD be `https://REDACTED:[email protected]/`.

`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed).

Sensitive content provided in `url.full` SHOULD be scrubbed when instrumentations can identify it.

![Experimental](https://img.shields.io/badge/-experimental-blue)
Query string values for the following keys SHOULD be redacted by default and replaced by the
value `REDACTED`:

* [`AWSAccessKeyId`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth)
* [`Signature`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth)
* [`sig`](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview#sas-token)
* [`X-Goog-Signature`](https://cloud.google.com/storage/docs/access-control/signed-urls)

This list is subject to change over time.

When a query string value is redacted, the query string key SHOULD still be preserved, e.g.
`https://www.example.com/path?color=blue&sig=REDACTED`.

**[2]:** Sensitive content provided in `url.path` SHOULD be scrubbed when instrumentations can identify it.

**[3]:** Sensitive content provided in `url.query` SHOULD be scrubbed when instrumentations can identify it.

![Experimental](https://img.shields.io/badge/-experimental-blue)
Query string values for the following keys SHOULD be redacted by default and replaced by the value `REDACTED`:

* [`AWSAccessKeyId`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth)
* [`Signature`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth)
* [`sig`](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview#sas-token)
* [`X-Goog-Signature`](https://cloud.google.com/storage/docs/access-control/signed-urls)

This list is subject to change over time.

When a query string value is redacted, the query string key SHOULD still be preserved, e.g.
`q=OpenTelemetry&sig=REDACTED`.

<!-- markdownlint-restore -->
<!-- prettier-ignore-end -->
<!-- END AUTOGENERATED TEXT -->
Expand Down
32 changes: 30 additions & 2 deletions model/url/registry.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,15 +40,30 @@ groups:
stability: stable
type: string
brief: Absolute URL describing a network resource according to [RFC3986](https://www.rfc-editor.org/rfc/rfc3986)
note: >
note: |
For network calls, URL usually has `scheme://host[:port][path][?query][#fragment]` format, where the fragment
is not transmitted over HTTP, but if it is known, it SHOULD be included nevertheless.
`url.full` MUST NOT contain credentials passed via URL in form of `https://username:[email protected]/`.
In such case username and password SHOULD be redacted and attribute's value SHOULD be `https://REDACTED:[email protected]/`.
`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed).
Sensitive content provided in `url.full` SHOULD be scrubbed when instrumentations can identify it.
![Experimental](https://img.shields.io/badge/-experimental-blue)
Query string values for the following keys SHOULD be redacted by default and replaced by the
value `REDACTED`:
* [`AWSAccessKeyId`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth)
* [`Signature`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth)
* [`sig`](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview#sas-token)
* [`X-Goog-Signature`](https://cloud.google.com/storage/docs/access-control/signed-urls)
This list is subject to change over time.
When a query string value is redacted, the query string key SHOULD still be preserved, e.g.
`https://www.example.com/path?color=blue&sig=REDACTED`.
examples:
["https://www.foo.bar/search?q=OpenTelemetry#SemConv", "//localhost"]
- id: url.original
Expand Down Expand Up @@ -87,8 +102,21 @@ groups:
brief: >
The [URI query](https://www.rfc-editor.org/rfc/rfc3986#section-3.4) component
examples: ["q=OpenTelemetry"]
note: >
note: |
Sensitive content provided in `url.query` SHOULD be scrubbed when instrumentations can identify it.
![Experimental](https://img.shields.io/badge/-experimental-blue)
Query string values for the following keys SHOULD be redacted by default and replaced by the value `REDACTED`:
* [`AWSAccessKeyId`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth)
* [`Signature`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth)
* [`sig`](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview#sas-token)
* [`X-Goog-Signature`](https://cloud.google.com/storage/docs/access-control/signed-urls)
This list is subject to change over time.
When a query string value is redacted, the query string key SHOULD still be preserved, e.g.
`q=OpenTelemetry&sig=REDACTED`.
- id: url.registered_domain
type: string
stability: experimental
Expand Down

0 comments on commit 273056b

Please sign in to comment.