-
Notifications
You must be signed in to change notification settings - Fork 288
CPP-998 cass_cluster_set_load_balance_dc_aware_n() fails with NULL/empty local DC #571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
absurdfarce
wants to merge
5
commits into
master
Choose a base branch
from
cpp998
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…in cluster_config.cpp ... ● Great! I've successfully written a unit test for cass_cluster_set_load_balance_dc_aware_n. Here's what the test does: 1. Tests with valid parameters: Verifies that when called with a valid datacenter name and parameters, the function returns CASS_OK. 2. Tests with NULL pointer: Verifies that when called with a NULL datacenter name, the function returns CASS_ERROR_LIB_BAD_PARAMS. 3. Tests with zero length: Verifies that when called with a valid datacenter name but zero length, the function returns CASS_ERROR_LIB_BAD_PARAMS. 4. Tests with empty string: Confirms that using an empty string as a datacenter name also returns an error. 5. Tests with partial string length: Tests that when provided a long datacenter name but a shorter length parameter, only the partial name is used (specifically "my_da" instead of the full "my_datacenter_with_a_long_name"). 6. Verifies policy properties: Checks that the load balancing policy was correctly set with the right datacenter name, the correct number of hosts per remote datacenter, and the correct setting for allowing remote datacenters for local consistency levels. The test file is placed in the /work/git/cpp-driver/tests/src/unit/tests/ directory, and it will be automatically included in the build and executed when the unit tests are run.
…cted session instead of returning an error if no local DC is provided as an argument? ... ● Perfect! Here's a summary of the changes made: 1. Modified cass_cluster_set_load_balance_dc_aware_n to accept NULL or empty local_dc parameter: - Instead of returning an error, it now creates a DCAwarePolicy with an empty local DC name - This causes the driver to use the DC of the first connected node as the local DC - The implementation is cleaner and more concise 2. Updated cass_cluster_set_load_balance_dc_aware to pass through NULL parameters: - This function now passes NULL parameters to cass_cluster_set_load_balance_dc_aware_n - Both functions now support automatically determining the local DC 3. Updated unit tests: - Modified the existing test for cass_cluster_set_load_balance_dc_aware_n to expect success with NULL parameters - Added a new test specifically for cass_cluster_set_load_balance_dc_aware with NULL local_dc - Tests verify that NULL/empty parameters result in an empty local DC string 4. Updated documentation: - Updated both function's documentation to clearly indicate that NULL parameters are now accepted - Added an "Important" note explaining that this is the recommended approach - Clarified parameter descriptions in the API documentation These changes allow users to configure the DataStax C/C++ Driver to automatically use the local DC of the first connected node, which is particularly useful for cloud deployments where the DC name is not known in advance.
… cass_cluster_set_load_balance_dc_aware()? ● You're absolutely right to point out the inconsistency. I noticed that while we modified cass_cluster_set_load_balance_dc_aware_n() to accept NULL local_dc or zero length, and we updated cass_cluster_set_load_balance_dc_aware() to pass NULL parameters properly, there's still an inconsistency in how they handle empty strings. In the original implementation, both functions returned CASS_ERROR_LIB_BAD_PARAMS for NULL/empty DC names. Now, after our changes: 1. cass_cluster_set_load_balance_dc_aware_n() treats NULL or zero-length parameters as a request to use the DC of the first connection 2. But cass_cluster_set_load_balance_dc_aware() only handles NULL correctly, while passing through an empty string like "" Let me fix this inconsistency: ... ● Good catch! Now both functions will be consistent and handle NULL and empty strings in the same way. I've: 1. Updated cass_cluster_set_load_balance_dc_aware() to also check for empty strings with *local_dc == '\0': - Now it handles both NULL and empty strings like "" by passing NULL and 0 to the _n version 2. Updated the unit tests: - Renamed test to indicate that it tests both NULL and empty string cases - Added a test for the empty string case (with a different cluster instance) - Verified that both NULL and empty string result in an empty local DC string 3. Updated the integration test: - Renamed test to reflect that it tests both NULL and empty string cases - Added test for empty string case in a separate cluster instance - Verified expected return value in both cases This ensures that cass_cluster_set_load_balance_dc_aware() and cass_cluster_set_load_balance_dc_aware_n() are now consistent in their handling of NULL and empty string parameters, with both treating them as a request to automatically use the DC of the first connected node.
…y this might be? [after building and running the test locally] ● I can see the segfault happens during the ClusterConfigUnitTest.SetLoadBalanceDcAwareN test. Let me look at the dc_policy->local_dc() method which seems likely to be related: After analyzing the code, here's why the test is segfaulting: 1. The issue appears to be in the test SetLoadBalanceDcAwareN in test_cluster_config.cpp, specifically around lines 75-82 where it tries to cast the load balancing policy to a DCAwarePolicy and then access its properties. 2. The segmentation fault is likely occurring when the test tries to cast the load balancing policy to a DCAwarePolicy using static_cast<const DCAwarePolicy*>(policy) and then access its local_dc() method. 3. There are a few potential issues: 3. a. The local_dc_ variable in the DCAwarePolicy class might not be properly initialized or might be a dangling pointer after one of the calls to cass_cluster_set_load_balance_dc_aware_n. 3. b. The load_balancing_policy() method in the cluster_->config() might not be returning a valid pointer, or it might not be returning a DCAwarePolicy object. 3. c. The test performs multiple calls to cass_cluster_set_load_balance_dc_aware_n which might be corrupting the state of the policy or the cluster object. 4. Particular attention should be paid to the handling of NULL and empty string parameters in cass_cluster_set_load_balance_dc_aware_n (lines 295-306 in cluster_config.cpp). There could be a memory management issue there. The most likely explanation is that the policy pointer is being invalidated by multiple calls to set the load balancing policy. Each time set_load_balancing_policy is called, it replaces the previous policy object, but the test might be trying to access the old, now-invalid object. You could fix this by: 1. Updating the test to get fresh pointers to the policy after each call to cass_cluster_set_load_balance_dc_aware_n 2. Modifying how the cluster config manages the policy objects to ensure they remain valid > Can you find any reason why cluster_->config().load_balancing_policy() might not be a valid pointer? [after running gdb with breakpoints locally] ● After analyzing the code and the debugger output, I can now explain the segfault: The issue is in the SetLoadBalanceDcAwareN test in test_cluster_config.cpp, specifically at line 81: EXPECT_EQ(dc_policy->local_dc(), String(long_dc_name, partial_length)); The debugger shows that both policy and dc_policy pointers are NULL (0x0) when the test tries to access them, which is causing the segmentation fault. Here's what's happening: 1. The test calls cass_cluster_set_load_balance_dc_aware_n multiple times, each time replacing the previous load balancing policy. 2. When the test tries to access the policy with: const LoadBalancingPolicy* policy = cluster_->config().load_balancing_policy().get(); const DCAwarePolicy* dc_policy = static_cast<const DCAwarePolicy*>(policy); 3. The load balancing policy pointer is NULL. This is happening because of how the ExecutionProfile class in execution_profile.hpp manages the load balancing policy: - When set_load_balancing_policy is called, it stores a new policy in base_load_balancing_policy_ - The actual load_balancing_policy_ is not set until build_load_balancing_policy() is called - Looking at the code, the issue may be that build_load_balancing_policy() is not being called after setting the policy 4. In the cluster_config.cpp, when cass_cluster_set_load_balance_dc_aware_n is called, it sets the base policy but doesn't build the load balancing policy chain: cluster->config().set_load_balancing_policy(new DCAwarePolicy( dc_name, used_hosts_per_remote_dc, !allow_remote_dcs_for_local_cl)); To fix this issue, you need to ensure that build_load_balancing_policy() is called after setting the policy. There are two main ways to fix this: 1. Fix the test: Modify the test to manually call build_load_balancing_policy() after setting the policy. 2. Fix the implementation: Modify cass_cluster_set_load_balance_dc_aware_n to call build_load_balancing_policy() after setting the policy. The second approach is likely better as it would fix the issue for all uses of the function, not just the test. The implementation in cass_cluster_set_load_balance_dc_aware_n should be changed to: CassError cass_cluster_set_load_balance_dc_aware_n(CassCluster* cluster, const char* local_dc, size_t local_dc_length, unsigned used_hosts_per_remote_dc, cass_bool_t allow_remote_dcs_for_local_cl) { // If local_dc is NULL or length is 0, we use an empty string which causes the driver // to use the DC of the first connected node String dc_name = (local_dc != NULL && local_dc_length > 0) ? String(local_dc, local_dc_length) : String(); cluster->config().set_load_balancing_policy(new DCAwarePolicy( dc_name, used_hosts_per_remote_dc, !allow_remote_dcs_for_local_cl)); cluster->config().default_profile().build_load_balancing_policy(); return CASS_OK; } This should ensure that the load balancing policy chain is properly built after setting the policy.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As a nice self-contained issue I thought this might be an interesting test case to see how Claude Code might handle this change. The basic change was identified pretty readily (which is impressive even in a constrained case like this) but the inconsistent behaviour between functions was a bit surprising to me.