-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CASSANDRA-20253: Forward port randomized testing improvements from cep-15-accord to trunk #3835
base: trunk
Are you sure you want to change the base?
Conversation
@@ -554,7 +554,7 @@ public enum CassandraRelevantProperties | |||
TCM_USE_NO_OP_REPLICATOR("cassandra.test.use_no_op_replicator", "false"), | |||
|
|||
TEST_BBFAILHELPER_ENABLED("test.bbfailhelper.enabled"), | |||
TEST_BLOB_SHARED_SEED("cassandra.test.blob.shared.seed"), | |||
TEST_BLOB_SHARED_SEED("cassandra.test.blob.shared.seed", "42"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
current logic used current time, which meant that tests were non-deterministic.... the seed must be set in order to be able to reproduce issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow. Is the problem that we just didn't output the seed in whatever failure message was produced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, the tests use qt()
and controls the seed for the test. The blob logic has a giant blob that is shared by all tests, and uses a seed to populate the bytes... this seed is decoupled from the test seed, and would use currentTimeMillis
; this caused the tests to be non-deterministic and not repeatable when errors were found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, not something I'd expect to unify in this patch.
@@ -132,21 +132,21 @@ private Pair<SnapshotsHolder, Map<String, String>> generateParams() | |||
case 1: | |||
if (!state.truncatedSnapshots.isEmpty()) | |||
{ | |||
keyspace = state.rs.pick(state.truncatedSnapshots.keySet()).split("\\.")[0]; | |||
keyspace = state.rs.pickUnorderedSet(state.truncatedSnapshots.keySet()).split("\\.")[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in pulling all the RandomSource
in this method got renamed to be 2 different methods
pickUnorderedSet
and pickOrderedSet
... unorded needs the sub type to be Comparable
and will do a sort so the result is deterministic... pickOrderedSet
doesn't sort as the set order should be deterministic already... this test uses HashMap
which is non-determiistic so we need pickUnordedSet
} | ||
|
||
/** | ||
* This is a change from accord as that uses {@link accord.utils.Utils#reverse}, which doesn't exist in this forward port. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the only change... i didn't want to bring in Utils
and add more conflict with cep-15-accord... so in-lined... when accord merges this whole file will go away...
} | ||
} | ||
|
||
public static class TestableNetworkTopologyStrategy extends NetworkTopologyStrategy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NetworkTopologyStrategy
validates the datacenters match a config, but that makes it a bit harder to use in some contexts... so this one avoids validating datacenters!
* allowed in a primary key. If a test needs to filter out those cases, can just | ||
* {@code .map(CassandraGenerators::simplify)} to resolve. | ||
*/ | ||
public static IPartitioner simplify(IPartitioner partitioner) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in a follow up patch ill prob remove this... LocalPartitioner
has caused flakyness for several tests in cep-15-accord so plan to refactor that logic to be less problematic, which lets us drop this function... keeping for now just to be in-sync
@@ -466,7 +535,7 @@ private static final class LazySharedBlob | |||
|
|||
static | |||
{ | |||
long blobSeed = TEST_BLOB_SHARED_SEED.getLong(System.currentTimeMillis()); | |||
long blobSeed = TEST_BLOB_SHARED_SEED.getLong(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this default was never correct, as it made tests non-reproducable
* limitations under the License. | ||
*/ | ||
|
||
package accord.utils.random; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole general accord.utils
package naming is something I suppose we should address in the long term if the things in it are really not Accord-specific. No sense in changing before cep-15-accord
merges though, of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think there are a few things...
- we have several testing utils that should have a different life cycle than accord, so likely should be moved out for better isolation
- when cep-15-accord rebases this forward part has caused issues... the cep-15-accord history hasn't been squashed so all changes to these packages exist in cep-15-accord history, causing conflicts on rebase to trunk... to avoid this i added a commit on top of trunk last time i did the rebase to rename this package; i then removed after the rebase completed...
{ | ||
public static float[] randomWeights(RandomSource random, int length) | ||
{ | ||
float[] weights = new float[length - 1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have a length
of 1, is the method supposed to just return an empty array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guess so... most of the higher up APIs that end up calling this check for 0/1 and optimize around those (or block)...
|
||
static float[] randomWeights(RandomSource random, float[] bias) | ||
{ | ||
float[] weights = new float[bias.length - 1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we ignore one of the bias
floats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accord.utils.random.Picker.Weighted#pickIndex
so, we use binary search to find what weight to select, and if the selected float
is larger than max weight, we choose the last offset in bias
i believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, like these methods don't explode, I guess, because they use the lengths of the newly created arrays in their loop conditions, but it's a little confusing that we arbitrarily memory hole the last bias in the context of this method itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its very leaky... you must use it "correctly" and only once place in the code knows what that means...
@@ -88,6 +91,63 @@ public static <T> Gen<T> oneOf(Map<Gen<T>, Integer> values) | |||
return rs -> gen.next(rs).next(rs); | |||
} | |||
|
|||
public static <T> OneOfBuilder<T> oneOf() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally not going to make comments about unused items, given I don't know what bits will be used later in some capacity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its only used in org.apache.cassandra.tcm.log.LogStateTestBase
on cep-15-accord, but not in trunk.
I am asking Alex about this, as the usage is for TCM reconstruct of older epochs. This feature was added specifically for Accord, but we had a lot of issues with it so we migrated off it... but seems we still have the API/Tests for it.
throw new AssertionError(); | ||
} | ||
}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some things in this class that we might able to slim down with some tweaking of types. This method might be able to go away with this patch, for instance:
Subject: [PATCH] DRY up mixedDistribution()
---
Index: test/unit/accord/utils/Gens.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/test/unit/accord/utils/Gens.java b/test/unit/accord/utils/Gens.java
--- a/test/unit/accord/utils/Gens.java (revision 7bc06a322c211f52fbb6df60f2058a32cd57dafc)
+++ b/test/unit/accord/utils/Gens.java (date 1738086893079)
@@ -331,15 +331,16 @@
throw new IllegalArgumentException("Range is too large; min=" + minInclusive + ", max=" + maxExclusive);
if (numBuckets <= 0 || numBuckets > domainSize)
throw new IllegalArgumentException("Num buckets must be between 1 and " + domainSize + "; given " + numBuckets);
- int[] bucket, indexes;
- bucket = new int[numBuckets];
+ Integer[] bucket, indexes;
+ bucket = new Integer[numBuckets];
int delta = domainSize / numBuckets;
for (int i = 0; i < numBuckets; i++)
bucket[i] = minInclusive + i * delta;
- indexes = IntStream.range(0, bucket.length).toArray();
- Gen<Gen.IntGen> indexDistro = mixedDistribution(indexes);
+
+ indexes = (Integer[]) IntStream.range(0, bucket.length).boxed().toArray();
+ Gen<Gen<Integer>> indexDistro = mixedDistribution(indexes);
return rs -> {
- Gen.IntGen indexGen = indexDistro.next(rs);
+ Gen<Integer> indexGen = indexDistro.next(rs);
switch (rs.nextInt(0, 2))
{
case 0: // uniform
@@ -530,30 +531,9 @@
};
}
- public static <T> Gen<Gen.IntGen> mixedDistribution(int[] list)
+ public static Gen<Gen<Integer>> mixedDistribution(Integer[] list)
{
- return rs -> {
- switch (rs.nextInt(0, 4))
- {
- case 0: // uniform
- return r -> list[rs.nextInt(0, list.length)];
- case 1: // median biased
- int median = rs.nextInt(0, list.length);
- return r -> list[r.nextBiasedInt(0, median, list.length)];
- case 2: // zipf
- int[] array = list;
- if (rs.nextBoolean())
- {
- array = Arrays.copyOf(array, array.length);
- reverse(array);
- }
- return pickZipf(array);
- case 3: // random weight
- return randomWeights(list).next(rs);
- default:
- throw new AssertionError();
- }
- };
+ return mixedDistribution(Arrays.asList(list));
}
/**
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a pattern of generic and primitive versions, this helps avoid extra allocations. Given that this change would force boxing I would prefer not to unify.
@@ -555,7 +555,7 @@ default void checkPostconditions(State state, Result expected, | |||
default void process(State state, SystemUnderTest sut) throws Throwable | |||
{ | |||
checkPostconditions(state, apply(state), | |||
sut, run(sut)); | |||
sut, run(sut)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Might as well one-liner this one if we're going to reformat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a personal style choice. state, apply(state)
is everything going on for the state
, and sut, run(sut)
is the same for sut
... i felt this was easier to read as 2 lines rather than 1.
If you do not agree, i can normalize by making one line; just giving my argument
weights[i] = sum += random.nextFloat() * bias[i]; | ||
sum += random.nextFloat() * bias[weights.length]; | ||
for (int i = 0 ; i < weights.length ; ++i) | ||
weights[i] /= sum; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Also, this method seems to normalize the random weights, but that's not reflected in the name.
…p-15-accord to trunk
edc2c78
to
188d42a
Compare
No description provided.