Skip to content

xdsclient: add an e2e style test for fallback involving more than 2 servers #7817 #8427

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

vinothkumarr227
Copy link
Contributor

@vinothkumarr227 vinothkumarr227 commented Jul 3, 2025

Fixes: #7817

RELEASE NOTES: N/A

Copy link

codecov bot commented Jul 3, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.35%. Comparing base (8adcc94) to head (07ec5df).
Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8427      +/-   ##
==========================================
+ Coverage   82.31%   82.35%   +0.03%     
==========================================
  Files         414      414              
  Lines       40464    40531      +67     
==========================================
+ Hits        33308    33378      +70     
+ Misses       5791     5785       -6     
- Partials     1365     1368       +3     

see 26 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.


// Secondary2 fallback, RPCs reach backend3.
secondary2Lis.Restart()
if err := waitForRPCsToReachBackend(ctx, client, backend3.Address); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test to verify if connections are attempted to primary and secondary1 backends.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if conn, err := secondary2WrappedLis.NewConnCh.Receive(ctx); err == nil {
cw := conn.(*testutils.ConnWrapper)
if _, err := cw.CloseCh.Receive(ctx); err != nil {
t.Fatalf("Failed to get connection to secondary2: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message is wrong. Here we are checking if connection to secondary2 is closed or not because we not have secondary1 running. Change the error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}))
if conn, err := secondary1WrappedLis.NewConnCh.Receive(ctx); err == nil {
cw := conn.(*testutils.ConnWrapper)
cw.CloseCh.Receive(ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we do not have a similar check here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out! I’ve updated the code.

@@ -728,3 +728,169 @@ func (s) TestFallback_OnStartup_RPCSuccess(t *testing.T) {
t.Fatal(err)
}
}

// TestXDSFallback_ThreeServerPromotion verifies fallback and promotion
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be changed to something like :
The test verifies that when the primary management server is unavailable, the system attempts to connect to the first fallback server, and if that is also down, to the second fallback server. It also ensures that the system switches back to the first fallback server once it becomes available again, and eventually returns to the primary server when it comes back online, closing connections to the fallback servers accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

t.Fatalf("Failed to create gRPC client: %v", err)
}
defer cc.Close()
cc.Connect()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think this is needed since we will automatically connect when trying to make the first RPC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 845 to 847
// Verify that connection attempts were made to primaryWrappedLis and
// secondary1WrappedLis, even though the fallback succeeded using
// secondary2WrappedLis and RPCs reached backend3.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Verify that connection attempts were made to primaryWrappedLis and
// secondary1WrappedLis, even though the fallback succeeded using
// secondary2WrappedLis and RPCs reached backend3.
// Verify that connection attempts were made to primaryWrappedLis and
// secondary1WrappedLis, before using secondary2WrappedLis to
// make succesful RPCs to backend3.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@vinothkumarr227 vinothkumarr227 removed their assignment Jul 25, 2025
@eshitachandwani eshitachandwani self-assigned this Jul 28, 2025
@eshitachandwani eshitachandwani added this to the 1.75 Release milestone Jul 29, 2025
Copy link
Member

@eshitachandwani eshitachandwani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo one comment. Adding @easwars as a second reviewer.

}
if _, err := secondary1WrappedLis.NewConnCh.Receive(ctx); err != nil {
t.Fatalf("Expected connection attempt to secondary1 but got error: %v", err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this block to before waitForRPCsToReachBackend(ctx, client, backend3.Address) so that it is consistent with the comment and the order of events.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should verify that RPCs are successful to backend3 only after we verify that connections to the primary and secondary were attempted.

Comment on lines +747 to +751
secondary1WrappedLis := testutils.NewListenerWrapper(t, nil)
secondary1Lis := testutils.NewRestartableListener(secondary1WrappedLis)

secondary2WrappedLis := testutils.NewListenerWrapper(t, nil)
secondary2Lis := testutils.NewRestartableListener(secondary2WrappedLis)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: instead of secondary1Xxx and secondary2Xxx, you could name them as secondaryXxx and tertiaryXxx.

Comment on lines +775 to +780
secondary1ManagementServer.Update(ctx, e2e.UpdateOptions{
NodeID: nodeID,
Listeners: []*v3listenerpb.Listener{e2e.DefaultClientListener(serviceName, "route-s1")},
Routes: []*v3routepb.RouteConfiguration{e2e.DefaultRouteConfig("route-s1", serviceName, "cluster-s1")},
Clusters: []*v3clusterpb.Cluster{e2e.DefaultCluster("cluster-s1", "endpoints-s1", e2e.SecurityLevelNone)},
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we only have the listener resource here as well (just like in the primary server)?

Comment on lines +783 to +791
secondary2ManagementServer.Update(ctx, e2e.UpdateOptions{
NodeID: nodeID,
Listeners: []*v3listenerpb.Listener{e2e.DefaultClientListener(serviceName, "route-s2")},
Routes: []*v3routepb.RouteConfiguration{e2e.DefaultRouteConfig("route-s2", serviceName, "cluster-s2")},
Clusters: []*v3clusterpb.Cluster{e2e.DefaultCluster("cluster-s2", "endpoints-s2", e2e.SecurityLevelNone)},
Endpoints: []*v3endpointpb.ClusterLoadAssignment{
e2e.DefaultEndpoint("endpoints-s2", "localhost", []uint32{testutils.ParsePort(t, backend3.Address)}),
},
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use e2e.DefaultClientResources to simplify this code?

Comment on lines +799 to +806
bootstrapContents, err := bootstrap.NewContentsForTesting(bootstrap.ConfigOptionsForTesting{
Servers: []byte(fmt.Sprintf(`[
{"server_uri": %q, "channel_creds":[{"type":"insecure"}]},
{"server_uri": %q, "channel_creds":[{"type":"insecure"}]},
{"server_uri": %q, "channel_creds":[{"type":"insecure"}]}
]`, primaryManagementServer.Address, secondary1ManagementServer.Address, secondary2ManagementServer.Address)),
Node: []byte(fmt.Sprintf(`{"id":%q}`, nodeID)),
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can we have this block indented nicely for improved readability. Either like the one in https://github.com/grpc/grpc-go/blob/master/test/xds/xds_client_federation_test.go#L163 or like the ones in the existing tests in this file.

t.Fatal(err)
}

// Verify that connection attempts were made to primaryWrappedLis and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of stopping all the three listeners up front, would it not be easy to leave them open and then verify that initially a connection is attempted to the primary server. And because the primary does not respond with all resources, once the test stops the primary listener (and thereby shutting down the existing connection), the xDS client should attempt a connection to the secondary server. That can then be verified by the test. Again, after verifying that, the test would stop the secondary listener, thereby terminating that connection. And again, since the secondary also did not respond with all requested resources, the xDS client should attempt a connection to the tertiary server. Now, that one will return all resources, and RPCs should succeed.

Then we configure the secondary server with all resources, and bring it back to life. Then, we expect a connection to the secondary and eventually for RPCs to succeed.

Then we configure the primary server with all resources, and bring it back to life. Then, we expect a connection to the primary and eventually for RPCs to succeed.

Maybe you tried that and there was a problem with that approach? If not, that seems like a cleaner way to go about things. Thoughts?

}
if _, err := secondary1WrappedLis.NewConnCh.Receive(ctx); err != nil {
t.Fatalf("Expected connection attempt to secondary1 but got error: %v", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should verify that RPCs are successful to backend3 only after we verify that connections to the primary and secondary were attempted.

@easwars easwars removed their assignment Jul 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

xdsclient: add an e2e style test for fallback involving more than 2 servers
3 participants