Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding extra fields to access logging #1

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from
Draft

Conversation

neoreddog
Copy link
Member

Problem:

Access logs don't show the origin IP of requests which impacts observability of network events.

As partly described in Issue #9842 https://github.com/linkerd /linkerd2/issues/9842.

Solution:

Added in the field

  • x_forwarded_for

This field uses the get_header method with a HeaderName::from_static

Validation:

Building and deploying the customised linkerd proxy with access logs enabled alongside a nginx container.

Sending network requests and validating that all pre-existing fields persist and the new ones function as intended.

Fixes:

Part fixes #9842

I agree to the DCO for all the commits in this PR.

Linkerd access logs are missing the origin IP address.

Solved by adding in the field x_forwarded_for to access logs which has the origin ip address.

Validated by compiling and deploying the modified proxy image alongside a nginx container, checking access log content for existing fields + the new field

Part fixes issue #9842

Signed-off-by: Dan Ambrose <[email protected]>
hawkw and others added 8 commits June 12, 2023 11:45
…d#2418)

The latest proxy-api release, v0.10.0, adds fields to the
`OutboundPolicies` API for configuring HTTP request timeouts, based on
the proposed changes to HTTPRoute in kubernetes-sigs/gateway-api#1997.

This branch updates the proxy-api dependency to v0.10.0 and adds the new
timeout configuration fields to the proxy's internal client policy
types. In addition, this branch adds a timeout middleware to the HTTP
client policy stack, so that the timeout described by the
`Rule.request_timeout` field is now applied.

Implementing the `RouteBackend.request_timeout` field with semantics as
close as possible to those described in GEP-1742 will be somewhat more
complex, and will be added in a separate PR.
…kerd#2427)

Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 35.7.7 to 36.2.1.
- [Release notes](https://github.com/tj-actions/changed-files/releases)
- [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md)
- [Commits](tj-actions/changed-files@db5dd7c...c912451)

---
updated-dependencies:
- dependency-name: tj-actions/changed-files
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…erd#2419)

Depends on linkerd#2418 

The latest proxy-api release, v0.10.0, adds fields to the
`OutboundPolicies` API for configuring HTTP request timeouts, based on
the proposed changes to HTTPRoute in kubernetes-sigs/gateway-api#1997.
PR linkerd#2418 updates the proxy to depend on the new proxy-api release, and
implements the `Rule.request_timeout` field added to the API. However,
that branch does *not* add a timeout for the
`RouteBackend.request_timeout` field. This branch changes the proxy to
apply the backend request timeout when configured by the policy
controller.

This branch implements `RouteBackend.request_timeout` by adding an
additional timeout layer in the `MatchedBackend` stack. This applies the
per-backend timeout once a backend is selected for a route. I've also
added stack tests for the interaction between the request and backend
request timeouts.

Note that once retries are added to client policy stacks, it may be
necessary to move the backend request timeout to ensure it occurs
"below" retries, depending on where the retry middleware ends up being
located in the proxy stack.
PRs linkerd#2418 and linkerd#2419 add per-route and per-backend request timeouts
configured by the `OutboundPolicies` API to the `MatchedRoute` and
`MatchedBackend` layers in the outbound `ClientPolicy` stack,
respectively. This means that — unlike in the `ServiceProfile` stack —
two separate request timeouts can be configured in `ClientPolicy`
stacks. However, because both the `MatchedRoute` and `MatchedBackend`
layers are in the HTTP logical stack, the errors emitted by both
timeouts will have a `LogicalError` as their most specific error
metadata, meaning that the log messages and `l5d-proxy-error` headers
recorded for these timeouts do not indicate whether the timeout that
failed the request was the route request timeout or the backend request
timeout.

In order to ensure this information is recorded and exposed to the user,
this branch adds two new error wrapper types, one of which enriches an
error with a `RouteRef`'s metadata, and one of which enriches an error
with a `BackendRef`'s metadata. The `MatchedRoute` stack now wraps all
errors with `RouteRef` metadata, and the `MatchedBackend` stack wraps
errors with `BackendRef` metadata. This way, when the route timeout
fails a request, the error will include the route metadata, while when
the backend request timeout fails a request, the error will include both
the route and backend metadata.

Adding these new error wrappers also has the additional side benefit of
adding this metadata to errors returned by filters, allowing users to
distinguish between errors emitted by a filter on a route rule and
errors emitted by a per-backend filter. Also, any other errors emitted
lower in the stack for requests that are handled by a client policy
stack will now also include this metadata, which seems generally useful.

Example errors, taken from a proxy unit test:

backend request:
```
logical service logical.test.svc.cluster.local:666: route httproute.test.timeout-route: backend service.test.test-svc:666: HTTP response timeout after 1s
```
route request:
```
logical service logical.test.svc.cluster.local:666: route httproute.test.timeout-route: HTTP response timeout after 2s
```
…2431)

When the outbound proxy resolves an outbound policy from the policy
controller's `OutboundPolicies` API, the policy controller may return an
error with the `grpc-status` code `NotFound` in order to indicate that
the destination is not a ClusterIP service. When this occurs, the proxy
will fall back to either using a ServiceProfile, if the ServiceProfile
contains non-trivial configuration, or synthesizing a default client
policy from the ServiceProfile.

However, when the outbound proxy is configured to run in ingress mode,
the fallback behavior does not occur. Instead, the ingress mode proxy
treats any error returned by the policy controller's `OutboundPolicies`
API as fatal. This means that when an ingress controller performs its
own load-balancing and opens a connection to a pod IP directly, the
ingress mode proxy will fail any requests on that connection. This is a
bug, and is the cause of the issues described in linkerd/linkerd2#10908.

This branch fixes this by changing the ingress mode proxy to handle
`NotFound` errors returned by the policy controller. I've added similar
logic for synthesizing default policies from a discovered
ServiceProfile, or using the profile if it's non-trivial. Unfortunately,
we can't just reuse the existing `Outbound::resolver` method, as ingress
discovery may be performed for an original destination address *or* for
a DNS name, and it's necessary to construct fallback policies in either
case. Instead, I've added a new function with similar behavior that's
ingress-specific.

I've manually tested this change against the repro steps[^1] described
in linkerd/linkerd2#10908, and verified that the proxy 503s on 2.13.4,
and that it once again routes correctly after applying this change.

Fixes linkerd/linkerd2#10908.

[^1]: As described in the first comment, using Contour and podinfo.
The proxy currently emits very little useful version information.

This change updates the proxy to support new build-time environment
variables that are used to report version information:

* LINKERD2_PROXY_BUILD_TIME
* LINKERD2_PROXY_VENDOR
* LINKERD2_PROXY_VERSION

Additionally, several pre-existing Git-oriented metadata have been
removed, as they were generally redundant or uninformative. The Rustc
version has also been removed (since it has no real user-facing value
and can be easily determined by the version/tag).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants