-
Notifications
You must be signed in to change notification settings - Fork 4.5k
xdsclient: add an e2e style test for fallback involving more than 2 servers #7817 #8427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
xdsclient: add an e2e style test for fallback involving more than 2 servers #7817 #8427
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #8427 +/- ##
==========================================
+ Coverage 82.31% 82.35% +0.03%
==========================================
Files 414 414
Lines 40464 40531 +67
==========================================
+ Hits 33308 33378 +70
+ Misses 5791 5785 -6
- Partials 1365 1368 +3 🚀 New features to boost your workflow:
|
|
||
// Secondary2 fallback, RPCs reach backend3. | ||
secondary2Lis.Restart() | ||
if err := waitForRPCsToReachBackend(ctx, client, backend3.Address); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test to verify if connections are attempted to primary and secondary1 backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if conn, err := secondary2WrappedLis.NewConnCh.Receive(ctx); err == nil { | ||
cw := conn.(*testutils.ConnWrapper) | ||
if _, err := cw.CloseCh.Receive(ctx); err != nil { | ||
t.Fatalf("Failed to get connection to secondary2: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message is wrong. Here we are checking if connection to secondary2 is closed or not because we not have secondary1 running. Change the error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
})) | ||
if conn, err := secondary1WrappedLis.NewConnCh.Receive(ctx); err == nil { | ||
cw := conn.(*testutils.ConnWrapper) | ||
cw.CloseCh.Receive(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we do not have a similar check here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing that out! I’ve updated the code.
@@ -728,3 +728,169 @@ func (s) TestFallback_OnStartup_RPCSuccess(t *testing.T) { | |||
t.Fatal(err) | |||
} | |||
} | |||
|
|||
// TestXDSFallback_ThreeServerPromotion verifies fallback and promotion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be changed to something like :
The test verifies that when the primary management server is unavailable, the system attempts to connect to the first fallback server, and if that is also down, to the second fallback server. It also ensures that the system switches back to the first fallback server once it becomes available again, and eventually returns to the primary server when it comes back online, closing connections to the fallback servers accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
t.Fatalf("Failed to create gRPC client: %v", err) | ||
} | ||
defer cc.Close() | ||
cc.Connect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think this is needed since we will automatically connect when trying to make the first RPC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// Verify that connection attempts were made to primaryWrappedLis and | ||
// secondary1WrappedLis, even though the fallback succeeded using | ||
// secondary2WrappedLis and RPCs reached backend3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Verify that connection attempts were made to primaryWrappedLis and | |
// secondary1WrappedLis, even though the fallback succeeded using | |
// secondary2WrappedLis and RPCs reached backend3. | |
// Verify that connection attempts were made to primaryWrappedLis and | |
// secondary1WrappedLis, before using secondary2WrappedLis to | |
// make succesful RPCs to backend3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo one comment. Adding @easwars as a second reviewer.
} | ||
if _, err := secondary1WrappedLis.NewConnCh.Receive(ctx); err != nil { | ||
t.Fatalf("Expected connection attempt to secondary1 but got error: %v", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this block to before waitForRPCsToReachBackend(ctx, client, backend3.Address)
so that it is consistent with the comment and the order of events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should verify that RPCs are successful to backend3 only after we verify that connections to the primary and secondary were attempted.
secondary1WrappedLis := testutils.NewListenerWrapper(t, nil) | ||
secondary1Lis := testutils.NewRestartableListener(secondary1WrappedLis) | ||
|
||
secondary2WrappedLis := testutils.NewListenerWrapper(t, nil) | ||
secondary2Lis := testutils.NewRestartableListener(secondary2WrappedLis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: instead of secondary1Xxx
and secondary2Xxx
, you could name them as secondaryXxx
and tertiaryXxx
.
secondary1ManagementServer.Update(ctx, e2e.UpdateOptions{ | ||
NodeID: nodeID, | ||
Listeners: []*v3listenerpb.Listener{e2e.DefaultClientListener(serviceName, "route-s1")}, | ||
Routes: []*v3routepb.RouteConfiguration{e2e.DefaultRouteConfig("route-s1", serviceName, "cluster-s1")}, | ||
Clusters: []*v3clusterpb.Cluster{e2e.DefaultCluster("cluster-s1", "endpoints-s1", e2e.SecurityLevelNone)}, | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we only have the listener resource here as well (just like in the primary server)?
secondary2ManagementServer.Update(ctx, e2e.UpdateOptions{ | ||
NodeID: nodeID, | ||
Listeners: []*v3listenerpb.Listener{e2e.DefaultClientListener(serviceName, "route-s2")}, | ||
Routes: []*v3routepb.RouteConfiguration{e2e.DefaultRouteConfig("route-s2", serviceName, "cluster-s2")}, | ||
Clusters: []*v3clusterpb.Cluster{e2e.DefaultCluster("cluster-s2", "endpoints-s2", e2e.SecurityLevelNone)}, | ||
Endpoints: []*v3endpointpb.ClusterLoadAssignment{ | ||
e2e.DefaultEndpoint("endpoints-s2", "localhost", []uint32{testutils.ParsePort(t, backend3.Address)}), | ||
}, | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can use e2e.DefaultClientResources
to simplify this code?
bootstrapContents, err := bootstrap.NewContentsForTesting(bootstrap.ConfigOptionsForTesting{ | ||
Servers: []byte(fmt.Sprintf(`[ | ||
{"server_uri": %q, "channel_creds":[{"type":"insecure"}]}, | ||
{"server_uri": %q, "channel_creds":[{"type":"insecure"}]}, | ||
{"server_uri": %q, "channel_creds":[{"type":"insecure"}]} | ||
]`, primaryManagementServer.Address, secondary1ManagementServer.Address, secondary2ManagementServer.Address)), | ||
Node: []byte(fmt.Sprintf(`{"id":%q}`, nodeID)), | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Can we have this block indented nicely for improved readability. Either like the one in https://github.com/grpc/grpc-go/blob/master/test/xds/xds_client_federation_test.go#L163 or like the ones in the existing tests in this file.
t.Fatal(err) | ||
} | ||
|
||
// Verify that connection attempts were made to primaryWrappedLis and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of stopping all the three listeners up front, would it not be easy to leave them open and then verify that initially a connection is attempted to the primary server. And because the primary does not respond with all resources, once the test stops the primary listener (and thereby shutting down the existing connection), the xDS client should attempt a connection to the secondary server. That can then be verified by the test. Again, after verifying that, the test would stop the secondary listener, thereby terminating that connection. And again, since the secondary also did not respond with all requested resources, the xDS client should attempt a connection to the tertiary server. Now, that one will return all resources, and RPCs should succeed.
Then we configure the secondary server with all resources, and bring it back to life. Then, we expect a connection to the secondary and eventually for RPCs to succeed.
Then we configure the primary server with all resources, and bring it back to life. Then, we expect a connection to the primary and eventually for RPCs to succeed.
Maybe you tried that and there was a problem with that approach? If not, that seems like a cleaner way to go about things. Thoughts?
} | ||
if _, err := secondary1WrappedLis.NewConnCh.Receive(ctx); err != nil { | ||
t.Fatalf("Expected connection attempt to secondary1 but got error: %v", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should verify that RPCs are successful to backend3 only after we verify that connections to the primary and secondary were attempted.
Fixes: #7817
RELEASE NOTES: N/A