Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround for querying deadlocks in extended events #18781

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

nenadnoveljic
Copy link
Contributor

@nenadnoveljic nenadnoveljic commented Oct 7, 2024

What does this PR do?

Implements the workaround for querying deadlocks in extended events (XE) views.

Motivation

SQL Server limitation: Querying the extended events ring buffer may not show the latest events.

Additional Notes

The ring buffer in sys.dm_xe_session_targets.target_data is externalized as one large XML, limited to 4MB, even though it's NVARCHAR(MAX). If the XML exceeds 4MB, the latest events are not shown.

To avoid this, we're creating a dedicated datadog XE session with a 1MB buffer, which prevents XML overflow. Since the XML typically uses half the allocated buffer space, a 1MB buffer gives us an 8x safety margin.

select t.name,  t.xml_size / r.value('@memoryUsed', 'float') xml_size_to_buffer_size_ratio from 
(SELECT  
  s.name, len(t.target_data) xml_size, cast(t.target_data as xml) as target_data_xml
FROM    sys.dm_xe_sessions s
JOIN 
    sys.dm_xe_session_targets t ON s.address = t.event_session_address
WHERE t.target_name = 'ring_buffer' and name in ('system_health', 'datadog')) as t
    CROSS APPLY target_data_xml.nodes('RingBufferTarget') AS x(r)

name	xml_size_to_buffer_size_ratio
system_health	1,081729263927248
datadog	0,5115150146759991

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Changelog entries must be created for modifications to shipped code
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

Copy link

codecov bot commented Oct 7, 2024

Codecov Report

Attention: Patch coverage is 66.66667% with 14 lines in your changes missing coverage. Please review.

Project coverage is 88.27%. Comparing base (50589cb) to head (cefeb94).
Report is 1 commits behind head on master.

Additional details and impacted files
Flag Coverage Δ
activemq ?
cassandra ?
hive ?
hivemq ?
ignite ?
jboss_wildfly ?
kafka ?
presto ?
solr ?
sqlserver 89.23% <66.66%> (+9.48%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Copy link

github-actions bot commented Oct 7, 2024

The changelog type changed or removed was used in this Pull Request, so the next release will bump major version. Please make sure this is a breaking change, or use the fixed or added type instead.

1 similar comment
Copy link

github-actions bot commented Oct 7, 2024

The changelog type changed or removed was used in this Pull Request, so the next release will bump major version. Please make sure this is a breaking change, or use the fixed or added type instead.

@nenadnoveljic nenadnoveljic changed the title Nenadnoveljic/deadlocks xe Workaround for querying deadlocks in extended events Oct 7, 2024
@nenadnoveljic nenadnoveljic marked this pull request as ready for review October 7, 2024 12:55
@nenadnoveljic nenadnoveljic requested review from a team as code owners October 7, 2024 12:55
Copy link

github-actions bot commented Oct 7, 2024

The changelog type changed or removed was used in this Pull Request, so the next release will bump major version. Please make sure this is a breaking change, or use the fixed or added type instead.

cursor.execute(XE_SESSIONS_QUERY)
rows = cursor.fetchall()
if not rows:
raise Exception(NO_XE_SESSION_ERROR)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to create a custom exception class NoXeSessionException(Exception) so that we can catch the specific exception when needed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw i also don't see error handling of in _set_xe_session_name and _query_deadlocks. if an exception is raised here, it will fail the thread. should we catch the exception and log an error message?

Copy link
Contributor Author

@nenadnoveljic nenadnoveljic Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The custom exception NoXeSessionException is now raised in _set_xe_session_name and caught in _query_deadlocks.

Copy link

github-actions bot commented Oct 8, 2024

The changelog type changed or removed was used in this Pull Request, so the next release will bump major version. Please make sure this is a breaking change, or use the fixed or added type instead.

1 similar comment
Copy link

github-actions bot commented Oct 8, 2024

The changelog type changed or removed was used in this Pull Request, so the next release will bump major version. Please make sure this is a breaking change, or use the fixed or added type instead.

Copy link

github-actions bot commented Oct 8, 2024

The changelog type changed or removed was used in this Pull Request, so the next release will bump major version. Please make sure this is a breaking change, or use the fixed or added type instead.

Copy link

github-actions bot commented Oct 8, 2024

The changelog type changed or removed was used in this Pull Request, so the next release will bump major version. Please make sure this is a breaking change, or use the fixed or added type instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants