Skip to content

Conversation

@benediktstroebl
Copy link

Add a 10-second timeout to results_queue.join() to prevent indefinite hangs when lingering results aren't properly consumed. If a timeout occurs, drain any remaining items from the queue to allow training to continue.

This fixes an issue where training could deadlock between steps if results from a previous step remained unprocessed in the queue.

Add a 10-second timeout to results_queue.join() to prevent indefinite
hangs when lingering results aren't properly consumed. If a timeout
occurs, drain any remaining items from the queue to allow training to
continue.

This fixes an issue where training could deadlock between steps if
results from a previous step remained unprocessed in the queue.
@benediktstroebl benediktstroebl force-pushed the fix-results-queue-deadlock branch from bc5bc97 to 5229cf4 Compare October 7, 2025 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant