-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Fix inconsistent breaker state #16083
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix inconsistent breaker state #16083
Conversation
Hi @elijah-rou. Thanks for your PR. I'm waiting for a knative member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: elijah-rou The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
6a11d8a
to
d006bd9
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16083 +/- ##
==========================================
- Coverage 80.20% 79.99% -0.21%
==========================================
Files 214 214
Lines 16887 17111 +224
==========================================
+ Hits 13544 13688 +144
- Misses 2987 3054 +67
- Partials 356 369 +13 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
/retest |
@elijah-rou: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
d15000e
to
117fa87
Compare
/ok-to-test |
117fa87
to
f2c2e98
Compare
There are empty aliases in OWNER_ALIASES, cleanup is advised. |
ff00a58
to
43edb91
Compare
/retest |
@dprotaso what's happening with build-tests_serving_main? I am a bit confused what is broken |
diff in the protobuf formatting it looks like - the build tests run this script https://github.com/knative/serving/blob/main/hack/verify-codegen.sh We're using this version of protoc - https://github.com/knative/infra/blob/8acbaddaa605095b52df9f569e97a631a412ecb8/images/prow-tests/Dockerfile#L25 If you're messing with proto files we have a flag in our serving/hack/update-codegen.sh Line 35 in e85e3f6
|
Signed-off-by: Knative Automation <[email protected]>
bumping knative.dev/hack f88b7db...af735b2: > af735b2 Fix dot releases (# 434) Signed-off-by: Knative Automation <[email protected]>
…on handling Implement comprehensive pod state management system in the activator to enable graceful handling of pod lifecycle transitions and connection management. Pod State Management: - Add pod state tracking with states: podHealthy, podPending, podDraining, podRemoved - Implement graceful draining with reference counting for in-flight requests - Add proper synchronization for concurrent pod state transitions - Ensure atomic updates to pod tracker assignments Connection Management Improvements: - Add connection-level metrics for monitoring pod health - Track transport failures and timeouts per pod - Implement proper error attribution for healthy vs unhealthy targets - Add pending request tracking metrics Breaker Integration: - Update queue breaker to track pending vs in-flight requests - Add metrics for requests waiting to acquire pod trackers - Implement proper capacity management during state transitions Testing: - Add comprehensive tests for pod state transitions - Add race condition tests for concurrent state updates - Verify graceful draining behavior with in-flight requests - Test proper capacity calculations with multiple activators Performance: - Optimize pod assignment using consistent hashing - Reduce lock contention in hot paths - Improve capacity update calculations This change improves reliability by ensuring connections are properly managed during pod lifecycle events, preventing request failures during scaling operations and pod terminations. chore: go mod vendor after rebase chore: update codegen
… management Add extensive race condition tests to verify thread-safe behavior in: - Pod state transitions (healthy, draining, pending states) - Concurrent Reserve operations and state changes - Reference counting under concurrent access - Weight modifications across goroutines - RevisionThrottler state updates - Tracker acquisition with concurrent modifications These tests help ensure atomic operations properly prevent race conditions in the activator's pod tracking and throttling logic.
43edb91
to
a92824b
Compare
@dprotaso any thoughts? |
Implement comprehensive pod state management system in the activator to enable
graceful handling of pod lifecycle transitions and connection management.
Pod State Management:
Connection Management Improvements:
Breaker Integration:
Testing:
Performance:
This change improves reliability by ensuring connections are properly managed
during pod lifecycle events, preventing request failures during scaling operations
and pod terminations.
This succeeds work initially done in #15811