-
Notifications
You must be signed in to change notification settings - Fork 4.6k
xds: check priority policy restarts timer on same CONNECTING -> CONNECTING transition #8628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #8628 +/- ##
==========================================
- Coverage 82.13% 81.97% -0.17%
==========================================
Files 415 415
Lines 40711 40712 +1
==========================================
- Hits 33437 33372 -65
- Misses 5897 5948 +51
- Partials 1377 1392 +15
🚀 New features to boost your workflow:
|
defer func(old time.Duration) { | ||
DefaultPriorityInitTimeout = old | ||
}(DefaultPriorityInitTimeout) | ||
DefaultPriorityInitTimeout = defaultTestShortTimeout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for existing art like this.
Would you mind adding a helper function instead:
func overrideInitTimeout(t *testing.T, val time.Duration) {
orig := DefaultPriorityInitTimeout
DefaultPriorityInitTimeout = val
t.Cleanup(func() { DefaultPriorityInitTimeout = orig })
}
And all existing tests and this new can have a one-liner to override the init timeout.
// child-0 will be started, and will create a SubConn. | ||
addrs0 := <-cc.NewSubConnAddrsCh | ||
if got, want := addrs0[0].Addr, testBackendAddrStrs[0]; got != want { | ||
t.Fatalf("got unexpected new subconn addr: %v, want %v", got, want) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: How about a more readable error message like New subchannel created for address: %q, want: %q
?
|
||
// handleChildStateUpdate start/close priorities based on the connectivity | ||
// state. | ||
func (b *priorityBalancer) handleChildStateUpdate(childName string, s balancer.State) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you are here, would you mind renaming s
to newState
? And then you can rename the new local variable to origState
or oldState
. That way, it is more explicit which state we are dealing with when
}, | ||
BalancerConfig: &LBConfig{ | ||
Children: map[string]*Child{ | ||
"child-0": {Config: &internalserviceconfig.BalancerConfig{Name: roundrobin.Name}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Could we use pick_first
instead of round_robin
, since the latter anyways delegates to the former.
} | ||
sc0 := <-cc.NewSubConnCh | ||
|
||
// Send CONNECTING for child-0 - this should start the init timer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The init timer is initially started when the child policy for that priority is created, which will happen when we call pb.UpdateClientConnState
above. And as part of that in newChildBalancer
, we also set the state in the childBalancer
to Connecting
.
Instead of relying on keeping track of the time elapsed between these events, what do you think about using the following state transitions which is possible in the real world:
- Move the subchannel corresponding to priority 0 to
Connecting
- Move the subchannel corresponding to priority 0 to
Ready
- Move the subchannel corresponding to priority 0 to
Idle
- Move the subchannel corresponding to priority 0 to
Connecting
And create a local variable for named timeAfterFunc
and set it to what is happening currently:
// As a package global
var timeAfterFunc time.AfterFunc
func (cb *childBalancer) startInitTimer() {
...
// Instead of directly using time.AfterFunc, use the variable timeAfterFunc
timerW.timer = timeAfterFunc(DefaultPriorityInitTimeout, func() {
...
})
}
From the test, change timeAfterFunc
such that you write to a channel controlled by the test, and delegate to the actual afterFunc
that was passed in. So, something like:
initTimerStarted := make(chan struct{}, 1)
origTimeAfterFunc := timeAfterFunc
timeAfterFunc = func(d time.Duration, f func()) *time.Timer {
initTimerStarted <- struct{}{}
time.AfterFunc(d, f)
}
Then, you can verify that a new timer is not created by reading from the channel.
Fixes: #8516
The priority balancer's init timer was restarting when a child balancer received multiple CONNECTING state updates.
This caused unnecessary delays in failover to lower priority children.
RELEASE NOTES: