Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datadog exporter improvements #5609

Open
wants to merge 13 commits into
base: dev
Choose a base branch
from
Open

Datadog exporter improvements #5609

wants to merge 13 commits into from

Conversation

BrynCooke
Copy link
Contributor

@BrynCooke BrynCooke commented Jul 4, 2024

This PR updates the Datadog exporter with a backport from the latest code.
The exporter code has been placed in the apollo-router/src/plugins/telemetry/tracing/datadog_exporter directory.

It also fixes:

  • span.kind not being propagated
  • Missing span stats. (Users may override configuration for these).

Span metrics are enabled for major built in spans by default.

telemetry:
  exporters:
    tracing:
      experimental_response_trace_id:
        enabled: true
        header_name: apollo-custom-trace-id
        format: datadog
      common:
        service_name: router
      datadog:
        enabled: true
        batch_processor:
          scheduled_delay: 100ms
        span_metrics: # New config option to enable or disable span metrics. This is a free format span to boolean map.
          supergraph: false

  instrumentation:
    spans:
      mode: spec_compliant
      supergraph:
        attributes:
          graphql.operation.name: true

Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

  • Changes are compatible1
  • Documentation2 completed
  • Performance impact assessed and acceptable
  • Tests added and passing3
    • Unit Tests
    • Integration Tests
    • Manual Tests

Exceptions

Note any exceptions here

Notes

Footnotes

  1. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.

  2. Configuration is an important part of many changes. Where applicable please try to document configuration examples.

  3. Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.

This comment has been minimized.

@router-perf
Copy link

router-perf bot commented Jul 4, 2024

CI performance tests

  • const - Basic stress test that runs with a constant number of users
  • demand-control-instrumented - A copy of the step test, but with demand control monitoring and metrics enabled
  • demand-control-uninstrumented - A copy of the step test, but with demand control monitoring enabled
  • enhanced-signature - Enhanced signature enabled
  • events - Stress test for events with a lot of users and deduplication ENABLED
  • events_big_cap_high_rate - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity
  • events_big_cap_high_rate_callback - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity using callback mode
  • events_callback - Stress test for events with a lot of users and deduplication ENABLED in callback mode
  • events_without_dedup - Stress test for events with a lot of users and deduplication DISABLED
  • events_without_dedup_callback - Stress test for events with a lot of users and deduplication DISABLED using callback mode
  • extended-reference-mode - Extended reference mode enabled
  • large-request - Stress test with a 1 MB request payload
  • no-tracing - Basic stress test, no tracing
  • reload - Reload test over a long period of time at a constant rate of users
  • step-jemalloc-tuning - Clone of the basic stress test for jemalloc tuning
  • step-local-metrics - Field stats that are generated from the router rather than FTV1
  • step-with-prometheus - A copy of the step test with the Prometheus metrics exporter enabled
  • step - Basic stress test that steps up the number of users over time
  • xlarge-request - Stress test with 10 MB request payload
  • xxlarge-request - Stress test with 100 MB request payload

@BrynCooke BrynCooke requested review from bnjjj and garypen July 5, 2024 08:27
@BrynCooke BrynCooke marked this pull request as ready for review July 5, 2024 08:27
@BrynCooke BrynCooke requested review from a team as code owners July 5, 2024 08:27
@BrynCooke BrynCooke requested a review from a team as a code owner July 5, 2024 09:01
Copy link
Contributor

@garypen garypen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Configuration comment: When we update otel to the latest library, what will be the impact on this configuration that we're adding. Will it continue to work unchanged or will it become redundant or ...?

### Datadog `span.kind` now populated ([PR #5609](https://github.com/apollographql/router/pull/5609))

Datadog traces use `span.kind` to differentiate between different types of spans.
This change ensures that the `span.kind` is correctly populated using the Open Telemetry span kind which has a 1-2-1 mapping to thouse set out in [dd-trace](https://github.com/DataDog/dd-trace-go/blob/main/ddtrace/ext/span_kind.go).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This change ensures that the `span.kind` is correctly populated using the Open Telemetry span kind which has a 1-2-1 mapping to thouse set out in [dd-trace](https://github.com/DataDog/dd-trace-go/blob/main/ddtrace/ext/span_kind.go).
This change ensures that the `span.kind` is correctly populated using the Open Telemetry span kind which has a 1-2-1 mapping to those set out in [dd-trace](https://github.com/DataDog/dd-trace-go/blob/main/ddtrace/ext/span_kind.go).

@@ -0,0 +1,32 @@
### Datadog span metrics now supported ([PR #5609](https://github.com/apollographql/router/pull/5609))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Datadog span metrics now supported ([PR #5609](https://github.com/apollographql/router/pull/5609))
### Datadog span metrics are now supported ([PR #5609](https://github.com/apollographql/router/pull/5609))

@@ -0,0 +1,32 @@
### Datadog span metrics now supported ([PR #5609](https://github.com/apollographql/router/pull/5609))

When using the APM view in Datadog, span metrics will be displayed for any span that was a top level span or has been explicitly has `_dd.measured` flag set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When using the APM view in Datadog, span metrics will be displayed for any span that was a top level span or has been explicitly has `_dd.measured` flag set.
When using the APM view in Datadog, span metrics will be displayed for any span that was a top level span or has the `_dd.measured` flag set.

@@ -18,8 +18,9 @@ pub(crate) const REQUEST_SPAN_NAME: &str = "request";
pub(crate) const QUERY_PLANNING_SPAN_NAME: &str = "query_planning";
pub(crate) const HTTP_REQUEST_SPAN_NAME: &str = "http_request";
pub(crate) const SUBGRAPH_REQUEST_SPAN_NAME: &str = "subgraph_request";
pub(crate) const QUERY_PARSING_SPAN_NAME: &str = "parse_query";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a "drive-by" fix. Should this be called out in the changelog as additional functionality or split into its own PR?

Comment on lines +96 to +97
/// Which spans will be eligable for span stats to be collected for viewing in the APM view.
/// Defaults to true for `router` and `supergraph`, `subgraph`, `subgraph_request`, `http_request` and `query_planning`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment isn't accurate. It is missing at least execution. I would expect the defaults to be as listed in BUILT_IN_SPAN_NAMES, so probably better to just reference that.

@@ -170,13 +196,97 @@ impl TracingConfigurator for Config {
.expect("cargo version is set as a resource default;qed")
.to_string(),
)
// This prevents spurious errors in the log when talking to the agent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another "drive-by", which probably merits some mention in the changelog.

@@ -166,7 +166,7 @@ The default resource mappings are:
| `subgraph_request` | `graphql.operation.name` |
| `http_request` | `http.route` |

You may override these mappings by specifying the `resource_mapping` configuration:
You may override these mappings by specifying `resource_mapping` configuration:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You may override these mappings by specifying `resource_mapping` configuration:
You may override these mappings by specifying the `resource_mapping` configuration:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants