Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Clean tuples dict keys from workers_info in /api/v1/retire_workers. #8996

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

fcourtial
Copy link

Fix JSON serialization error in retire_workers API endpoint

When retiring workers through the HTTP API endpoint /api/v1/retire_workers, the response includes worker metrics that contain tuple keys (e.g., digests_total_since_heartbeat). These tuple keys cannot be JSON serialized, causing a 500 error that breaks clients like the Dask Kubernetes Operator.

This PR:

  • Adds a clean_dict function to delete tuple keys during serialization
  • Preserves the dictionary structure while making it JSON-serializable

Example:

# Before - causes 500 error
{
    "metrics": {
        ("execute", "thread-cpu"): 1
    }
}

# After - properly serialized
{
    "metrics": {}
}

@fcourtial fcourtial requested a review from fjetter as a code owner January 28, 2025 17:31
@fcourtial fcourtial changed the title Fcourtial/fix retire workers 500 🐛 Clean tuples dict keys from workers_info in /api/v1/retire_workers. Jan 28, 2025
Copy link
Member

@jacobtomlinson jacobtomlinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this

distributed/http/scheduler/api.py Outdated Show resolved Hide resolved
@fcourtial
Copy link
Author

It should partly solve this issue: #8370

Copy link
Contributor

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

    27 files  +    1      27 suites  +1   11h 46m 9s ⏱️ + 36m 8s
 4 117 tests +    1   4 000 ✅  -     1    111 💤  -  1  5 ❌ +2  1 🔥 +1 
51 629 runs  +1 438  49 322 ✅ +1 372  2 301 💤 +63  5 ❌ +2  1 🔥 +1 

For more details on these failures and errors, see this check.

Results for commit 200dde6. ± Comparison against base commit fd3722d.

@jacobtomlinson
Copy link
Member

I would appreciate @fjetter or @hendrikmakait taking a look at this.

@fcourtial
Copy link
Author

One question would be, are we supposed to retire a worker that still has digests_total_since_heartbeat? I don't want to fix the symptom only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants