test_checkBigStorage randomly fails with an AssertionError #85

JustAnotherArchivist · 2018-07-31T01:26:14Z

test_checkBigStorage sometimes fails with an AssertionError in line 595 or 596. The failures happen only sometimes, suggesting that it's a timing issue and one of the stopFuncs might need another condition (e.g. the one after the log compaction, to ensure that the data was indeed dumped to disk?).
I've seen failures on both lines mentioned above, i.e. there are cases where o1 has the correct value but o2 doesn't and vice-versa. In all cases I've seen, getValue('test') returns None, i.e. the value is missing entirely, not corrupted.

My platform is a Debian machine with Python 3.6.

Bash command to run this test repeatedly:

declare -i i=0; while [[ $i -lt 10 ]]; do pytest -k 'test_checkBigStorage' test_syncobj.py; i+=1; done

The text was updated successfully, but these errors were encountered:

JustAnotherArchivist · 2018-07-31T01:34:38Z

I just noticed that some files dump1.bin.1.tmp etc. are left over in the PySyncObj directory after these test failures. The tmp extension again suggests that the issue is related to the log compaction/serialiser.

JustAnotherArchivist · 2018-07-31T01:45:02Z

This leftover temporary file is the one produced by Serializer.setTransmissionData. The filename corresponds to the object that fails, i.e. if the assert for o1 fails, dump1.bin.1.tmp remains in the directory.

JustAnotherArchivist · 2018-08-01T11:55:29Z

In the meantime, I've also seen cases where no .tmp files were left over but only dump1.bin and/or dump2.bin. I've also had a case where there was both a dump1.bin and a dump1.bin.1.tmp (and a dump2.bin, in this particular case).

The failures happen especially when the machine is under high load, so it really looks timing-based. My theory is that the serialisation and/or deserialisation doesn't finish before the doTicks timeout is reached. Perhaps it would be wise to add a method or two to SyncObj to wait for the (de)serialisation to complete.

Also, I think it might be a good idea to let a test fail entirely if doTicks is stopped because of the timeout rather than stopFunc, at least in most cases.

JustAnotherArchivist · 2018-08-01T12:26:48Z

Correction: I meant the dumpN.bin.1.tmp file, not dumpN.bin.tmp (which would be produced by Serializer.serialize). I've corrected the comments above.

JustAnotherArchivist mentioned this issue Jul 31, 2018

test_logCompactionRegressionTest1 randomly sees a silent EOFError #86

Open

JustAnotherArchivist mentioned this issue Oct 23, 2018

Network separation #92

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test_checkBigStorage randomly fails with an AssertionError #85

test_checkBigStorage randomly fails with an AssertionError #85

JustAnotherArchivist commented Jul 31, 2018

JustAnotherArchivist commented Jul 31, 2018 •

edited

Loading

JustAnotherArchivist commented Jul 31, 2018 •

edited

Loading

JustAnotherArchivist commented Aug 1, 2018 •

edited

Loading

JustAnotherArchivist commented Aug 1, 2018 •

edited

Loading

test_checkBigStorage randomly fails with an AssertionError #85

test_checkBigStorage randomly fails with an AssertionError #85

Comments

JustAnotherArchivist commented Jul 31, 2018

JustAnotherArchivist commented Jul 31, 2018 • edited Loading

JustAnotherArchivist commented Jul 31, 2018 • edited Loading

JustAnotherArchivist commented Aug 1, 2018 • edited Loading

JustAnotherArchivist commented Aug 1, 2018 • edited Loading

JustAnotherArchivist commented Jul 31, 2018 •

edited

Loading

JustAnotherArchivist commented Jul 31, 2018 •

edited

Loading

JustAnotherArchivist commented Aug 1, 2018 •

edited

Loading

JustAnotherArchivist commented Aug 1, 2018 •

edited

Loading