Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_checkBigStorage randomly fails with an AssertionError #85

Open
JustAnotherArchivist opened this issue Jul 31, 2018 · 4 comments
Open

Comments

@JustAnotherArchivist
Copy link
Contributor

test_checkBigStorage sometimes fails with an AssertionError in line 595 or 596. The failures happen only sometimes, suggesting that it's a timing issue and one of the stopFuncs might need another condition (e.g. the one after the log compaction, to ensure that the data was indeed dumped to disk?).
I've seen failures on both lines mentioned above, i.e. there are cases where o1 has the correct value but o2 doesn't and vice-versa. In all cases I've seen, getValue('test') returns None, i.e. the value is missing entirely, not corrupted.

My platform is a Debian machine with Python 3.6.

Bash command to run this test repeatedly:

declare -i i=0; while [[ $i -lt 10 ]]; do pytest -k 'test_checkBigStorage' test_syncobj.py; i+=1; done
@JustAnotherArchivist
Copy link
Contributor Author

JustAnotherArchivist commented Jul 31, 2018

I just noticed that some files dump1.bin.1.tmp etc. are left over in the PySyncObj directory after these test failures. The tmp extension again suggests that the issue is related to the log compaction/serialiser.

@JustAnotherArchivist
Copy link
Contributor Author

JustAnotherArchivist commented Jul 31, 2018

This leftover temporary file is the one produced by Serializer.setTransmissionData. The filename corresponds to the object that fails, i.e. if the assert for o1 fails, dump1.bin.1.tmp remains in the directory.

@JustAnotherArchivist
Copy link
Contributor Author

JustAnotherArchivist commented Aug 1, 2018

In the meantime, I've also seen cases where no .tmp files were left over but only dump1.bin and/or dump2.bin. I've also had a case where there was both a dump1.bin and a dump1.bin.1.tmp (and a dump2.bin, in this particular case).

The failures happen especially when the machine is under high load, so it really looks timing-based. My theory is that the serialisation and/or deserialisation doesn't finish before the doTicks timeout is reached. Perhaps it would be wise to add a method or two to SyncObj to wait for the (de)serialisation to complete.

Also, I think it might be a good idea to let a test fail entirely if doTicks is stopped because of the timeout rather than stopFunc, at least in most cases.

@JustAnotherArchivist
Copy link
Contributor Author

JustAnotherArchivist commented Aug 1, 2018

Correction: I meant the dumpN.bin.1.tmp file, not dumpN.bin.tmp (which would be produced by Serializer.serialize). I've corrected the comments above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant