Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(datadog_agent source): bugs in internal component metric reporting #20044

Merged

Conversation

neuronull
Copy link
Contributor

@neuronull neuronull commented Mar 8, 2024

child of: #20043

This change partially implements the ValidatableComponent trait for the datadog_agent source. It only covers the logs endpoint.

In this implementation, two bugs were found and are also fixed in this PR:

  1. component_received_event_bytes_total metric was being computed after event enrichment with vector metadata.
  2. component_errors_total was not emitte when the request body was successfully decompressed, but failed in JSON parsing.
make test-component-validation
cargo nextest run --no-fail-fast --no-default-features --features component-validation-tests --status-level pass --test-threads 4 components::validation::tests
    Finished test [unoptimized + debuginfo] target(s) in 22.22s
    Starting 7 tests across 3 binaries (461 skipped)
        PASS [   6.060s] vector components::validation::tests::validate_component_tests_validation_components_sources_datadog_agent_yaml
        PASS [   9.931s] vector components::validation::tests::validate_component_tests_validation_components_sources_http_client_yaml
        PASS [  21.398s] vector components::validation::tests::validate_component_tests_validation_components_sinks_http_yaml
        PASS [   5.469s] vector components::validation::tests::validate_component_tests_validation_components_sources_http_server_yaml
        PASS [  21.652s] vector components::validation::tests::validate_component_tests_validation_components_sinks_splunk_hec_logs_yaml
        PASS [  26.212s] vector components::validation::tests::validate_component_tests_validation_components_sinks_datadog_logs_yaml
        PASS [   5.293s] vector components::validation::tests::validate_component_tests_validation_components_sources_splunk_hec_yaml
------------
     Summary [  26.692s] 7 tests run: 7 passed, 461 skipped
2024-03-08T22:34:29.258616Z  INFO vector::components::validation: Successfully validated component 'datadog_agent':
  test case 'happy path': passed
    - sent 2 inputs and received 2 outputs
    - received 1429 telemetry events
    - component_received_events_total: 2
    - component_received_event_bytes_total: 419
    - component_received_bytes_total: 397
    - component_sent_events_total: 2
    - component_sent_event_bytes_total: 483
    - component_sent_bytes_total: 0
    - component_errors_total: 0
    - component_discarded_events_total: 0
  test case 'sad path': passed
    - sent 3 inputs and received 2 outputs
    - received 1376 telemetry events
    - component_received_events_total: 2
    - component_received_event_bytes_total: 419
    - component_received_bytes_total: 568
    - component_sent_events_total: 2
    - component_sent_event_bytes_total: 483
    - component_sent_bytes_total: 0
    - component_errors_total: 1
    - component_discarded_events_total: 0

@neuronull neuronull requested a review from bruceg March 8, 2024 23:01
@neuronull neuronull self-assigned this Mar 8, 2024
@github-actions github-actions bot added the domain: sources Anything related to the Vector's sources label Mar 8, 2024
@neuronull neuronull added source: datadog_agent Anything `datadog_agent` source related domain: observability Anything related to monitoring/observing Vector domain: tests Anything related to Vector's internal tests type: bug A code related bug. labels Mar 8, 2024
@neuronull neuronull changed the title Neuronull/opa 1186 component validation datadog agent source fix(datadog_agent source): bugs in internal component metric reporting Mar 8, 2024
@neuronull neuronull marked this pull request as ready for review March 8, 2024 23:15
@neuronull neuronull requested a review from a team March 8, 2024 23:15
@datadog-vectordotdev
Copy link

datadog-vectordotdev bot commented Mar 8, 2024

Datadog Report

Branch report: neuronull/OPA-1186_component_validation_datadog_agent_source
Commit report: 2980f45
Test service: vector

✅ 0 Failed, 44 Passed, 0 Skipped, 2m 43.63s Wall Time

Base automatically changed from neuronull/OPA-1186_component_validation_extensions to master March 18, 2024 22:20
@github-actions github-actions bot requested a review from a team as a code owner March 18, 2024 22:20
@neuronull neuronull added this pull request to the merge queue Mar 18, 2024
Copy link

Regression Detector Results

Run ID: 15cf8ada-654e-450f-858d-9c99578e895c
Baseline: 80f63bb
Comparison: ad6a48e
Total CPUs: 7

Performance changes are noted in the perf column of each table:

  • ✅ = significantly better comparison variant performance
  • ❌ = significantly worse comparison variant performance
  • ➖ = no significant change in performance

No significant changes in experiment optimization goals

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.

Fine details of change detection per experiment

perf experiment goal Δ mean % Δ mean % CI
syslog_humio_logs ingress throughput +0.91 [+0.81, +1.02]
fluent_elasticsearch ingress throughput +0.83 [+0.36, +1.30]
file_to_blackhole egress throughput +0.67 [-1.74, +3.08]
http_to_s3 ingress throughput +0.63 [+0.35, +0.91]
socket_to_socket_blackhole ingress throughput +0.49 [+0.43, +0.55]
splunk_hec_route_s3 ingress throughput +0.38 [-0.10, +0.86]
otlp_grpc_to_blackhole ingress throughput +0.10 [+0.02, +0.19]
http_to_http_noack ingress throughput +0.07 [-0.03, +0.17]
http_to_http_json ingress throughput +0.06 [-0.03, +0.14]
splunk_hec_to_splunk_hec_logs_acks ingress throughput -0.00 [-0.15, +0.15]
splunk_hec_indexer_ack_blackhole ingress throughput -0.01 [-0.14, +0.13]
splunk_hec_to_splunk_hec_logs_noack ingress throughput -0.05 [-0.16, +0.07]
enterprise_http_to_http ingress throughput -0.11 [-0.19, -0.03]
syslog_splunk_hec_logs ingress throughput -0.31 [-0.40, -0.22]
http_to_http_acks ingress throughput -0.32 [-1.63, +0.99]
http_elasticsearch ingress throughput -0.65 [-0.72, -0.57]
syslog_log2metric_tag_cardinality_limit_blackhole ingress throughput -0.73 [-0.88, -0.59]
datadog_agent_remap_datadog_logs ingress throughput -0.75 [-0.87, -0.64]
syslog_loki ingress throughput -0.86 [-0.95, -0.77]
datadog_agent_remap_blackhole_acks ingress throughput -1.08 [-1.19, -0.97]
datadog_agent_remap_datadog_logs_acks ingress throughput -1.09 [-1.18, -0.99]
syslog_regex_logs2metric_ddmetrics ingress throughput -1.42 [-1.61, -1.23]
syslog_log2metric_splunk_hec_metrics ingress throughput -1.63 [-1.78, -1.48]
datadog_agent_remap_blackhole ingress throughput -2.41 [-2.54, -2.28]
otlp_http_to_blackhole ingress throughput -2.55 [-2.70, -2.40]
syslog_log2metric_humio_metrics ingress throughput -2.56 [-2.70, -2.43]
http_text_to_http_json ingress throughput -2.59 [-2.72, -2.46]

Explanation

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

  1. Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.

  2. Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.

  3. Its configuration does not mark it "erratic".

Merged via the queue into master with commit ad6a48e Mar 19, 2024
50 checks passed
@neuronull neuronull deleted the neuronull/OPA-1186_component_validation_datadog_agent_source branch March 19, 2024 00:19
AndrooTheChen pushed a commit to discord/vector that referenced this pull request Sep 23, 2024
vectordotdev#20044)

* chore(testing): further adjustments to component validation framework

* fix(datadog_agent source): bugs in internal component metric reporting

* TODO for other endpoints

* changelog

* spell checker

* feedback bruce- re-use detect json
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: observability Anything related to monitoring/observing Vector domain: sources Anything related to the Vector's sources domain: tests Anything related to Vector's internal tests source: datadog_agent Anything `datadog_agent` source related type: bug A code related bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants