-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
verify_dt_converge #30
Comments
The issue with update_2a appears to be with the PB change to the GSET (there is a change to a PB object using the PB client, then a change to a HTTP object using the HTTP client). The PB change returns {error, timeout} - hence why check2a fails. It is not obvious why this update should timeout, given that other updates to other objects have not timed out. It is consistently this update which times out though (intermittently still - sometimes this update returns The positive here, is that the update did fail, and so this doesn't indicate a fundamental problem with the data type i.e. it was failing to merge updates which had succeeded |
Pausing for 1s after the partition in the cluster, before starting the updates, stops the timeout from occurring - and leads to the test passing more consistently. Note that the HTTP client can also fail, but in the case where the HTTP client fails the failure in the test is immediately as the HTTP client does not handle the request timeout (it crashes instead) like the PB client:
|
This appears to be an issue with the test harness not waiting for the cluster to be stable after built. By default the rt:build_cluster function doesn't wait for transfers to complete. In the case of this test, the vnode (to be used as the PUT coordinator) is initiated almost immediately before the PUT_FSM starts to use it. It appears if this is > 5ms before, the test works, and if it is < 5ms before the test fails, due to the update timeout - there the update times out as a coordinator is selected but the message is never received by the coordinator. This is potentially a bug - that a vnode can seem up and be selected as a coordinator, but something in the path to routing that vnode a request isn't ready. Hence the PUT_FSM sends the local request to coordinate the PUT, and the vnode never receives. Adding |
The test fails intermittently, although it appears to fail consistently when testing gset (and not other datatypes).
Test seems to most often fail at check 3a, with only one of the expected values being returned:
So the values
[<<"DANG">>,<<"Z^2">>]
do not appear to get merged - i.e. the update in 2a, on this side of the partition does not happen.Note that the test doesn't assert that the check is
ok
until the final check. So the test may appear to fail on the final check, but it has already "failed" on the previous check, ti just the wait_until timed out and returned an error which was ignored.The text was updated successfully, but these errors were encountered: