Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add middleware DatadogDiagnosticMiddleware #735

Merged
merged 4 commits into from
Jul 24, 2024
Merged

Conversation

timmc-edx
Copy link
Member

Adds logging diagnostics for traces in Datadog.

See #692

Merge checklist:
Check off if complete or not applicable:

  • Version bumped
  • Changelog record added
  • Documentation updated (not only docstrings)
  • Fixup commits are squashed away
  • Unit tests added/updated
  • Manual testing instructions provided
  • Noted any: Concerns, dependencies, migration issues, deadlines, tickets

Copy link
Contributor

@robrap robrap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering when and where it might help to have the most basic of unit tests? I know you probably did local testing, but sometimes a really simple unit test can just keep things happy when there are seemingly small changes made.

Otherwise, looks good.

Comment on lines +16 to +17
# .. toggle_warning: This is a noisy feature and so it should only be enabled
# for a percentage of requests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now we are dealing with both Splunk and Datadog, so this makes sense. Also, our Log Ingest was high (over-budget), but SRE may have taken care of that for us? Otherwise, in the future, we could instead control this via Log Index rules that could be tuned based on noisiness. Although the log messages might still affect disk usage, if it really is high.

@timmc-edx
Copy link
Member Author

timmc-edx commented Jul 24, 2024

Hmm... it would require a significant amount of test harness to test the span and resolver logic, and there aren't easy ways to test both the datadog-installed and datadog-missing branches. Honestly, I'm pretty happy to fall back on the try/except here to protect against exceptions and just rely on manual testing for the rest.

In case you're curious about log size and what information is in the spans, here's a sample log line from devstack:

2024-07-24 19:14:21,990 INFO 159041 [edx_arch_experiments.datadog_diagnostics.middleware] [user 3] [ip 172.25.0.1] middleware.py:63 - Datadog span diagnostics: Route = api/user/v1/preferences/(?P<username>[\w .@_+-]+)$; local root span = name='django.request' id=16693121629397193611 trace_id=136418889693043940232725395409779696451 parent_id=None service='django' resource='__django_request' type='web' start=1721848461.88791 end=None duration=None error=0 tags={'DEFAULT_HASHING_ALGORITHM': 'sha256', 'component': 'django', 'django.request.class': 'django.core.handlers.wsgi.WSGIRequest', 'django_version': '4.2.14', 'http.method': 'GET', 'ip_chain.external.types': 'priv', 'ip_chain.raw': '172.25.0.1', 'ip_chain.safest_client_ip': '172.25.0.1', 'ip_chain.types': 'priv', 'is_enforce_session_email_match_enabled': 'False', 'python_version': '3.11.9', 'runtime-id': '64041d8c5a144b4bb2e4317837def5cf', 'session_email_mismatch': 'False', 'span.kind': 'server'} metrics={'_dd.measured': 1, 'cookies.header.size': 933, 'ip_chain.count': 1, 'ip_chain.external.count': 1, 'process_id': 159041, 'safe_sessions.session_cookie_count': 1}; current span = name='django.middleware' id=5174195364098946478 trace_id=136418889693043940232725395409779696451 parent_id=17630798428218980004 service='django' resource='edx_arch_experiments.datadog_diagnostics.middleware.DatadogDiagnosticMiddleware.process_view' type=None start=1721848461.9880073 end=None duration=None error=0 tags={'component': 'django'} metrics={}

Copy link
Contributor

@robrap robrap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mocks are usually pretty simple to set up, although I know they are imperfect.

That said, it's not a blocker and at least I brought it up and had you respond. I agree that the exception handling is nice, but a test could ensure that your exception handler works as expected and doesn't have its own bug. ;)

@timmc-edx
Copy link
Member Author

Yeah, I could add ddtrace to the test requirements and set up a bunch of mocks, but I'm feeling lucky. I think if we go to add additional functionality (or if this fails when deployed) that's when I'd like to invest in tests.

@timmc-edx timmc-edx merged commit c7ca515 into main Jul 24, 2024
7 checks passed
@timmc-edx timmc-edx deleted the timmc/dd-root-span branch July 24, 2024 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants