Skip to content

Commit

Permalink
feat!: Detect anomalous Datadog traces; upgrade to Python 3.11 (#757)
Browse files Browse the repository at this point in the history
- Detect and log anomalous traces in DatadogDiagnosticMiddleware.
- Drop Python 3.8 support and change to Python 3.11 minimum due to testing needs.
- Add tests for existing middleware code

Manual testing:

1. Install in devstack, add middleware to lms/envs/devstack.py, and
   enable Waffle flag.
2. Make some requests and observe lack of new logging.
3. Add `self.dd_tracer.current_root_span().finish()` to start of
   `process_view` (faking an anomalous trace) and reinstall in devstack
4. Make some more requests and observe log output like the following:
   `Anomalous Datadog local root span (duration already set): id = 66abf66b000000001168265a391d04a3; duration = 0.108 sec; worker age = 1.560 sec`

Unit test improvements:

- Install ddtrace at the test.in level
- Add tests for new and existing middleware functionality
- Test with Python 3.11, not 3.8. This isn't just because we're upgrading
  to 3.11 across our repos, but also because I ran into a really weird bug
  with ddtrace and/or jsonschema that only occurs with Python 3.8:
  DataDog/dd-trace-py#9950
  • Loading branch information
timmc-edx committed Aug 5, 2024
1 parent c7ca515 commit 36bc403
Show file tree
Hide file tree
Showing 22 changed files with 674 additions and 315 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.8', '3.12']
python-version: ['3.11', '3.12']
tox-env: [quality, django42]
steps:
- uses: actions/checkout@v4
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
- name: setup python
uses: actions/setup-python@v4
with:
python-version: 3.8
python-version: 3.11

- name: Install pip
run: pip install -r requirements/pip.txt
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/test_publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ jobs:
- uses: actions/checkout@v4
- uses: actions/setup-python@v4
with:
python-version: '3.8'
python-version: '3.11'
architecture: x64
- name: Install pip
run: pip install -r requirements/pip.txt
Expand Down
10 changes: 10 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,16 @@ Change Log
Unreleased
~~~~~~~~~~

[4.0.0] - 2024-08-05
~~~~~~~~~~~~~~~~~~~~
Changed
-------
* Dropped support for Python 3.8; only testing with 3.11 and above now.

Added
-----
* ``DatadogDiagnosticMiddleware`` can now detect and log anomalous traces, enabled by Waffle flag ``datadog.diagnostics.detect_anomalous_trace``

[3.6.0] - 2024-07-24
~~~~~~~~~~~~~~~~~~~~
Added
Expand Down
2 changes: 1 addition & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ One Time Setup
cd edx-arch-experiments
# Set up a virtualenv using virtualenvwrapper with the same name as the repo and activate it
mkvirtualenv -p python3.8 edx-arch-experiments
mkvirtualenv -p python3.11 edx-arch-experiments
Local testing
~~~~~~~~~~~~~
Expand Down
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -469,7 +469,7 @@ def get_version(*file_paths):

# Example configuration for intersphinx: refer to the Python standard library.
intersphinx_mapping = {
'python': ('https://docs.python.org/3.8', None),
'python': ('https://docs.python.org/3.11', None),
'django': ('https://docs.djangoproject.com/en/2.2/', 'https://docs.djangoproject.com/en/2.2/_objects/'),
'model_utils': ('https://django-model-utils.readthedocs.io/en/latest/', None),
}
Expand Down
2 changes: 1 addition & 1 deletion edx_arch_experiments/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
A plugin to include applications under development by the architecture team at 2U.
"""

__version__ = '3.6.0'
__version__ = '4.0.0'
7 changes: 0 additions & 7 deletions edx_arch_experiments/datadog_diagnostics/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,3 @@ Datadog Diagnostics
When installed in the LMS as a plugin app, the ``datadog_diagnostics`` app adds additional logging for debugging our Datadog integration.

This is intended as a temporary situation while we debug the `trace concatenation issue <https://github.com/edx/edx-arch-experiments/issues/692>`_.

Usage
*****

In LMS:

- Install ``edx-arch-experiments`` as a dependency
33 changes: 32 additions & 1 deletion edx_arch_experiments/datadog_diagnostics/middleware.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,31 @@
"""
Diagnostic middleware for Datadog.
To use, install edx-arch-experiments and add
``edx_arch_experiments.datadog_diagnostics.middleware.DatadogDiagnosticMiddleware``
to ``MIDDLEWARE``, then set the below settings as needed.
"""

import logging
import time

from django.core.exceptions import MiddlewareNotUsed
from edx_toggles.toggles import WaffleFlag

log = logging.getLogger(__name__)

# .. toggle_name: datadog.diagnostics.detect_anomalous_trace
# .. toggle_implementation: WaffleFlag
# .. toggle_default: False
# .. toggle_description: Enables logging of anomalous Datadog traces for web requests.
# .. toggle_warning: This is a noisy feature and so it should only be enabled
# for a percentage of requests.
# .. toggle_use_cases: temporary
# .. toggle_creation_date: 2024-08-01
# .. toggle_target_removal_date: 2024-11-01
# .. toggle_tickets: https://github.com/edx/edx-arch-experiments/issues/692
DETECT_ANOMALOUS_TRACE = WaffleFlag('datadog.diagnostics.detect_anomalous_trace', module_name=__name__)

# .. toggle_name: datadog.diagnostics.log_root_span
# .. toggle_implementation: WaffleFlag
# .. toggle_default: False
Expand Down Expand Up @@ -43,6 +60,8 @@ def __init__(self, get_response):
# If import fails, don't even load this middleware.
raise MiddlewareNotUsed # pylint: disable=raise-missing-from

self.worker_start_epoch = time.time()

def __call__(self, request):
return self.get_response(request)

Expand All @@ -63,9 +82,21 @@ def log_diagnostics(self, request):
"""
Contains all the actual logging logic.
"""
local_root_span = self.dd_tracer.current_root_span()

if DETECT_ANOMALOUS_TRACE.is_enabled():
root_duration_s = local_root_span.duration
if root_duration_s is not None:
worker_run_time_s = time.time() - self.worker_start_epoch
log.warning(
f"Anomalous Datadog local root span (duration already set): "
f"id = {local_root_span.trace_id:x}; "
f"duration = {root_duration_s:0.3f} sec; "
f"worker age = {worker_run_time_s:0.3f} sec"
)

if LOG_ROOT_SPAN.is_enabled():
route_pattern = getattr(request.resolver_match, 'route', None)
local_root_span = self.dd_tracer.current_root_span()
current_span = self.dd_tracer.current_span()
# pylint: disable=protected-access
log.info(
Expand Down
110 changes: 110 additions & 0 deletions edx_arch_experiments/datadog_diagnostics/tests/test_middleware.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
"""
Tests for diagnostic middleware.
"""

import re
from unittest.mock import Mock, patch

import ddt
import ddtrace
from django.test import TestCase

from ..middleware import DETECT_ANOMALOUS_TRACE, LOG_ROOT_SPAN, DatadogDiagnosticMiddleware


def fake_view(_request):
"""Fake get_response for middleware."""


@ddt.ddt
class TestDatadogDiagnosticMiddleware(TestCase):
"""Tests for DatadogDiagnosticMiddleware."""

def setUp(self):
self.middleware = DatadogDiagnosticMiddleware(fake_view)

def run_middleware(self):
"""Run the middleware using a fake request."""
resolver = Mock()
resolver.route = "/some/path"
request = Mock()
request.resolver_match = resolver
self.middleware.process_view(request, None, None, None)

@patch('edx_arch_experiments.datadog_diagnostics.middleware.log.error')
def test_log_diagnostics_error_only_once(self, mock_log_error):
"""
If the log_diagnostics function is broken, only log the error once.
The method should still be called every time in case it is still doing
useful work before the error, though.
"""

bad_method = Mock(side_effect=lambda request: 1/0)
self.middleware.log_diagnostics = bad_method

self.run_middleware()
self.run_middleware()

# Called twice
assert len(bad_method.call_args_list) == 2

# But only log once
mock_log_error.assert_called_once_with(
"Encountered error in DatadogDiagnosticMiddleware (suppressing further errors): "
"ZeroDivisionError('division by zero')"
)

@ddt.data(
# Feature disabled
(False, False),
(False, True),
# Enabled, but nothing anomalous
(True, False),
# Enabled and anomaly detected
(True, True),
)
@ddt.unpack
@patch('edx_arch_experiments.datadog_diagnostics.middleware.log.warning')
def test_anomalous_trace(self, enabled, cause_anomaly, mock_log_warning):
with (
patch.object(DETECT_ANOMALOUS_TRACE, 'is_enabled', return_value=enabled),
patch.object(LOG_ROOT_SPAN, 'is_enabled', return_value=False),
# Need at least two levels of spans in order to fake
# an anomaly. (Otherwise current_root_span returns None.)
ddtrace.tracer.trace("test_local_root"),
ddtrace.tracer.trace("inner_span"),
):
if cause_anomaly:
ddtrace.tracer.current_root_span().finish()
self.run_middleware()

if enabled and cause_anomaly:
mock_log_warning.assert_called_once()
log_msg = mock_log_warning.call_args_list[0][0][0] # first arg of first call
assert re.fullmatch(
r"Anomalous Datadog local root span \(duration already set\): "
r"id = [0-9A-Fa-f]+; duration = [0-9]\.[0-9]{3} sec; worker age = [0-9]\.[0-9]{3} sec",
log_msg
)
else:
mock_log_warning.assert_not_called()

@patch('edx_arch_experiments.datadog_diagnostics.middleware.log.info')
def test_log_root_span(self, mock_log_info):
with (
patch.object(DETECT_ANOMALOUS_TRACE, 'is_enabled', return_value=False),
patch.object(LOG_ROOT_SPAN, 'is_enabled', return_value=True),
# Need at least two levels of spans for interesting logging
ddtrace.tracer.trace("test_local_root"),
ddtrace.tracer.trace("inner_span"),
):
self.run_middleware()

mock_log_info.assert_called_once()
log_msg = mock_log_info.call_args_list[0][0][0] # first arg of first call
assert re.fullmatch(
r"Datadog span diagnostics: Route = /some/path; "
r"local root span = name='test_local_root' .*; "
r"current span = name='inner_span' .*",
log_msg
)
60 changes: 29 additions & 31 deletions requirements/base.txt
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
#
# This file is autogenerated by pip-compile with Python 3.12
# This file is autogenerated by pip-compile with Python 3.11
# by the following command:
#
# make upgrade
#
asgiref==3.7.2
asgiref==3.8.1
# via django
attrs==23.2.0
# via
# jsonschema
# referencing
certifi==2024.2.2
certifi==2024.7.4
# via requests
cffi==1.16.0
# via
Expand All @@ -22,11 +22,11 @@ click==8.1.7
# via
# code-annotations
# edx-django-utils
code-annotations==1.6.0
code-annotations==1.8.0
# via edx-toggles
cryptography==42.0.5
cryptography==43.0.0
# via pyjwt
django==3.2.24
django==4.2.14
# via
# -c https://raw.githubusercontent.com/edx/edx-lint/master/edx_lint/files/common_constraints.txt
# -r requirements/base.in
Expand All @@ -47,75 +47,73 @@ django-waffle==4.1.0
# edx-django-utils
# edx-drf-extensions
# edx-toggles
djangorestframework==3.14.0
djangorestframework==3.15.2
# via
# -r requirements/base.in
# drf-jwt
# edx-drf-extensions
dnspython==2.6.1
# via pymongo
drf-jwt==1.19.2
# via edx-drf-extensions
edx-codejail==3.3.3
edx-codejail==3.4.1
# via -r requirements/base.in
edx-django-utils==5.10.1
edx-django-utils==5.15.0
# via
# -r requirements/base.in
# edx-drf-extensions
# edx-toggles
edx-drf-extensions==10.2.0
edx-drf-extensions==10.3.0
# via -r requirements/base.in
edx-opaque-keys==2.5.1
edx-opaque-keys==2.10.0
# via edx-drf-extensions
edx-toggles==5.1.1
edx-toggles==5.2.0
# via -r requirements/base.in
idna==3.6
idna==3.7
# via requests
jinja2==3.1.3
jinja2==3.1.4
# via code-annotations
jsonschema==4.21.1
jsonschema==4.23.0
# via -r requirements/base.in
jsonschema-specifications==2023.12.1
# via jsonschema
markupsafe==2.1.5
# via jinja2
newrelic==9.7.0
newrelic==9.12.0
# via edx-django-utils
pbr==6.0.0
# via stevedore
psutil==5.9.8
psutil==6.0.0
# via edx-django-utils
pycparser==2.21
pycparser==2.22
# via cffi
pyjwt[crypto]==2.8.0
pyjwt[crypto]==2.9.0
# via
# drf-jwt
# edx-drf-extensions
pymongo==3.13.0
pymongo==4.8.0
# via edx-opaque-keys
pynacl==1.5.0
# via edx-django-utils
python-slugify==8.0.4
# via code-annotations
pytz==2024.1
# via
# django
# djangorestframework
pyyaml==6.0.1
# via code-annotations
referencing==0.33.0
referencing==0.35.1
# via
# jsonschema
# jsonschema-specifications
requests==2.31.0
requests==2.32.3
# via edx-drf-extensions
rpds-py==0.18.0
rpds-py==0.19.1
# via
# jsonschema
# referencing
semantic-version==2.10.0
# via edx-drf-extensions
six==1.16.0
# via edx-codejail
sqlparse==0.4.4
sqlparse==0.5.1
# via django
stevedore==5.2.0
# via
Expand All @@ -124,11 +122,11 @@ stevedore==5.2.0
# edx-opaque-keys
text-unidecode==1.3
# via python-slugify
typing-extensions==4.10.0
typing-extensions==4.12.2
# via edx-opaque-keys
urllib3==2.2.1
urllib3==2.2.2
# via requests

# The following packages are considered to be unsafe in a requirements file:
setuptools==69.1.1
setuptools==72.1.0
# via -r requirements/base.in
Loading

0 comments on commit 36bc403

Please sign in to comment.