Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include stack trace on deadlock detection exception #595

Closed
wants to merge 0 commits into from

Conversation

twin-drill
Copy link
Contributor

Use the workflow tid to capture the workflow thread's frame at the time of deadlock.

Added current_thread_id: Optional[int] and get_worker_ident(self) -> Optional[int] to interfaces/implementations.

Currently I only set tid in _run_once for the standard implementation, if it's needed elsewhere please let me know. (This is mainly why it's marked as a draft)

Also, readability around the code in _workflow.py is (imo) suboptimal so I'd appreciate feedback/ideas on that.

@twin-drill twin-drill marked this pull request as ready for review July 29, 2024 18:43
@twin-drill twin-drill requested a review from a team as a code owner July 29, 2024 18:43
@twin-drill
Copy link
Contributor Author

needs at least a cleanup before we pull but I do want to make sure the tid is set where it needs to be set, and secondarily if there are ways to make building the traceback easier/more readable.

temporalio/worker/_workflow.py Outdated Show resolved Hide resolved
temporalio/worker/_workflow_instance.py Outdated Show resolved Hide resolved
temporalio/worker/workflow_sandbox/_runner.py Outdated Show resolved Hide resolved
temporalio/worker/_workflow.py Outdated Show resolved Hide resolved
temporalio/worker/_workflow.py Outdated Show resolved Hide resolved
temporalio/worker/_workflow.py Outdated Show resolved Hide resolved
Copy link
Member

@Sushisource Sushisource left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me. We should change the tests to make additional assertions about the newly upgraded stacks though 🙂

temporalio/worker/_workflow.py Outdated Show resolved Hide resolved
temporalio/worker/_workflow.py Outdated Show resolved Hide resolved
temporalio/worker/_workflow.py Outdated Show resolved Hide resolved
temporalio/worker/_workflow.py Outdated Show resolved Hide resolved
temporalio/worker/_workflow.py Outdated Show resolved Hide resolved
temporalio/worker/_workflow_instance.py Outdated Show resolved Hide resolved
temporalio/worker/_workflow_instance.py Outdated Show resolved Hide resolved
temporalio/worker/_workflow.py Outdated Show resolved Hide resolved
temporalio/worker/workflow_sandbox/_runner.py Outdated Show resolved Hide resolved
temporalio/worker/_workflow_instance.py Outdated Show resolved Hide resolved
tb_frames.pop()

tb = top_tb
for f in tb_frames:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose the workflow is timing out because two coroutines are playing some sort of infinite ping-pong (e.g. the workflow in #527), will the stack trace be huge, and so we need to limit the number of frames? Or does that asyncio coroutine switching not contribute any stack frames?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should set a size limit on this traceback. There is a server-side truncation for failure stack traces that are too large, but it's technically possible we could exceed the 4MB limit to even send the message. I'd truncate at something extreme like 200000 bytes (i.e. 200kb) maybe?

temporalio/worker/_workflow.py Outdated Show resolved Hide resolved
temporalio/worker/_workflow_instance.py Outdated Show resolved Hide resolved
temporalio/worker/_workflow.py Outdated Show resolved Hide resolved
temporalio/worker/_workflow.py Outdated Show resolved Hide resolved
temporalio/worker/_workflow.py Outdated Show resolved Hide resolved
temporalio/worker/_workflow_instance.py Outdated Show resolved Hide resolved
temporalio/worker/_workflow_instance.py Outdated Show resolved Hide resolved
temporalio/worker/_workflow_instance.py Outdated Show resolved Hide resolved
temporalio/worker/_workflow_instance.py Outdated Show resolved Hide resolved
Copy link
Member

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just minor code organization at this point, otherwise LGTM

temporalio/worker/_workflow.py Outdated Show resolved Hide resolved
temporalio/worker/_workflow.py Outdated Show resolved Hide resolved
temporalio/worker/_workflow_instance.py Outdated Show resolved Hide resolved
Copy link
Member

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and much cleaner. Minor/optional suggestions added. May want to wait and see if @dandavison has anything else to add.

May need to update with main which has updated sdk core (and rebuild to get lock file right)

temporalio/worker/_workflow.py Outdated Show resolved Hide resolved
temporalio/worker/_workflow.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Include stack trace on deadlock detection exception
4 participants