-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disabling NR APM causes trace concatenation in Datadog #692
Comments
timmc-edx
added a commit
to edx/configuration
that referenced
this issue
Jun 18, 2024
3 tasks
timmc-edx
added a commit
to edx/configuration
that referenced
this issue
Jun 18, 2024
Other notes:
|
timmc-edx
added a commit
to edx/configuration
that referenced
this issue
Jun 18, 2024
3 tasks
timmc-edx
added a commit
to edx/configuration
that referenced
this issue
Jun 18, 2024
Additional thoughts, questions, ideas:
|
timmc-edx
added a commit
to edx/configuration
that referenced
this issue
Jun 25, 2024
Testing this in stage and edge LMS. See edx/edx-arch-experiments#692
3 tasks
timmc-edx
added a commit
to edx/configuration
that referenced
this issue
Jun 25, 2024
) Testing this in stage and edge LMS. See edx/edx-arch-experiments#692
Currently, we're investigating if using a NR Free Tier account for edxapp is enough to get DD traces working. Other possibilities may include trying to get tracing (or APM) disabled everywhere in Edge. This includes where Spans were found in the last day:
|
timmc-edx
added a commit
to edx/newrelic-python-agent
that referenced
this issue
Jun 26, 2024
…tation) (#1) If the Django setting `EDX_NEWRELIC_NO_REPORT` is present and enabled, the agent will not talk to New Relic's servers and will instead use a set of previously captured responses from our sandbox account. Instrumentation (tracing, etc.) will still be in place, but the data will be discarded rather than being reported. See edx/edx-arch-experiments#692
[idea] We might want 3 modes for our hacked NR agent:
|
3 tasks
1 task
timmc-edx
added a commit
that referenced
this issue
Jul 10, 2024
timmc-edx
added a commit
that referenced
this issue
Jul 10, 2024
See #692 Testing setup: https://2u-internal.atlassian.net/wiki/spaces/ENG/pages/1173618788/Running+Datadog+in+devstack And then in lms-shell: ``` make requirements pip install ddtrace pip install -e /edx/src/archexp/ ./wrap-datadog.sh ./server.sh ``` Expect to see this log message: `Attached MissingSpanProccessor for Datadog diagnostics`
timmc-edx
added a commit
that referenced
this issue
Jul 10, 2024
See #692 Testing setup: https://2u-internal.atlassian.net/wiki/spaces/ENG/pages/1173618788/Running+Datadog+in+devstack And then in lms-shell: ``` make requirements pip install ddtrace pip install -e /edx/src/archexp/ ./wrap-datadog.sh ./server.sh ``` Expect to see this log message: `Attached MissingSpanProccessor for Datadog diagnostics` NOTE: This prints "Spans created = 0; spans finished = 0" in devstack when shut down with ctrl-c, but not when restarted due to autoreload (where it prints correct info). Something is initializing Django twice and one span processor is getting span info while the other is printing at shutdown. There's more to debug here, but it seems stable enough to least try deploying it.
7 tasks
timmc-edx
added a commit
that referenced
this issue
Jul 10, 2024
See #692 Testing setup: https://2u-internal.atlassian.net/wiki/spaces/ENG/pages/1173618788/Running+Datadog+in+devstack And then in lms-shell: ``` make requirements pip install ddtrace pip install -e /edx/src/archexp/ ./wrap-datadog.sh ./server.sh ``` Expect to see this log message: `Attached MissingSpanProccessor for Datadog diagnostics` NOTE: This prints "Spans created = 0; spans finished = 0" in devstack when shut down with ctrl-c, but not when restarted due to autoreload (where it prints correct info). Something is initializing Django twice and one span processor is getting span info while the other is printing at shutdown. There's more to debug here, but it seems stable enough to least try deploying it.
timmc-edx
added a commit
that referenced
this issue
Jul 24, 2024
Adds logging diagnostics for traces in Datadog. See #692
7 tasks
timmc-edx
added a commit
that referenced
this issue
Jul 24, 2024
Adds logging diagnostics for traces in Datadog. See #692
3 tasks
timmc-edx
added a commit
to edx/configuration
that referenced
this issue
Aug 16, 2024
No longer needed for edx/edx-arch-experiments#692
3 tasks
timmc-edx
added a commit
to edx/configuration
that referenced
this issue
Aug 20, 2024
No longer needed for edx/edx-arch-experiments#692
timmc-edx
added a commit
to openedx/edx-platform
that referenced
this issue
Sep 13, 2024
- Convert `/heartbeat` view into a celery test - Send Celery tasks to a broker, rather than running in-process - Hardcode a broker URL - Log all celery signals See edx/edx-arch-experiments#692
timmc-edx
added a commit
to openedx/edx-platform
that referenced
this issue
Sep 13, 2024
- Add `/celery_repro` URL to run a sample task - Send Celery tasks to a broker, rather than running in-process - Hardcode a broker URL - Log all celery signals See edx/edx-arch-experiments#692
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Ultimately, this ticket is for disabling New Relic APM across edxapp. We ran into trace related issues in DD when first attempting to disable NR APM. We later caused the same issue in Edge when simply disabling NR Tracing.
This bug has been observed in edxapp (LMS and CMS), enterprise-catalog, and registrar. It can be identified by searching for spans matching
operation_name:django.request -@_top_level:*
.Acceptance criteria
Things to try
tracer.start_span
checkschild_of
for a finished span. Subject to Waffle flags, log the situation and/or setchild_of
toNone
.ddtrace.config.django['distributed_tracing_enabled']=False
Things we have already tried
These should be checked off once they have already been either reverted or made permanent:
DD_TRACE_HEADER_TAGS
andDD_DJANGO_INSTRUMENT_MIDDLEWARE
were added in temp: Add temporary debugging headers for edxapp tracing configuration#41operation_name:django.request
on All Spans since service entry spans were unreliable. Maybe we want to change that back, or maybe not.EDXAPP_NEWRELIC_LICENSE_TEST_FREE
) but we still need to remove it from AWS secrets manager: https://2u-internal.atlassian.net/servicedesk/customer/portal/15/DOS-4980DD_TRACE_PROPAGATION_STYLE_EXTRACT=none
in feat: Stop accepting trace headers in DD for edge/stage LMS configuration#55datadog.diagnostics.log_root_span
anddatadog.diagnostics.detect_anomalous_trace
)DD_TRACE_CELERY_ENABLED=false
, because some of the request spans in anomalous traces have missing parent spans that were celery-related: temp: Try disabling Datadog Celery instrumentation in LMS (stage, edge) configuration#62DATADOG_DIAGNOSTICS_CELERY_LOG_SIGNALS
(using edx-arch-experiments 4.3.0)Details
When we disabled NR APM in edxapp on June 6 we observed two anomalies with traces:
service:edx-edxapp-lms env:prod
dropped precipitously by 2-3x.However, we believe the actual traffic was unchanged. This is corroborated by the Django hit metrics remaining steady, as seen in the Service Catalog. We cannot find any relevant code or config changes that would have been deployed around that time.
Our current understanding is that the majority of Django web requests that are traced are not recorded as service entry spans, but are instead parented to a different trace. This causes several problems:
We can also reproduce this issue by setting "Tracing type: None" in the application settings in NR (usually set to Distributed Tracing).
Links
The text was updated successfully, but these errors were encountered: