Get rid of test front-loading in the duration_based_chunks algo #27

sondrelg · 2021-06-22T21:42:02Z

Should hopefully resolve #26

alexforencich · 2021-06-22T21:56:40Z

Nope, still fails. And if I change the test durations in that test case to all 1 except the last one which gets 100, all of the tests end up in the first group and I get 6 empty groups.

sondrelg · 2021-06-22T21:59:29Z

How would you suggest changing the implementation @alexforencich? I'm not seeing the issue 🙂

alexforencich · 2021-06-22T22:02:53Z

So I did not run the test correctly (have to actually install the package before you can run pytest). After fixing that problem, the test case in the PR passes. However, it fails if I change it to:

    durations = {}
    durations["a"] = 1
    durations["b"] = 1
    durations["c"] = 1
    durations["d"] = 1
    durations["e"] = 1
    durations["f"] = 1
    durations["g"] = 1
    durations["h"] = 100

In this case, only the first two groups are utilized and the rest are empty.

alexforencich · 2021-06-22T22:24:23Z

It seems to me like the goal of this algorithm is to preserve the test ordering. So perhaps a better method would be to compute the cumulative sum into a new array, and then partition that on total_duration*index/slices. Then adjust the indicies so they each correspond to at least one test.

Incidentally, I think you can get slightly better performance out of least_duration by sorting the tests by duration first so they are processed from longest to shortest.

However, the best option might be to do something like this: assign each test an ID number based on the original order, then sort based on duration, use the algorithm in least_duration to group the tests, then sort each group based on the ID. This will preserve the test order, but produce more optimal grouping, and there will be no issue with empty groups unless there are more groups than tests (and in that case, what you can do is enable some random test in all of the otherwise-empty groups).

sondrelg · 2021-06-23T08:11:00Z

So I did not run the test correctly (have to actually install the package before you can run pytest). After fixing that problem, the test case in the PR passes. However, it fails if I change it to:
    durations = {}
    durations["a"] = 1
    durations["b"] = 1
    durations["c"] = 1
    durations["d"] = 1
    durations["e"] = 1
    durations["f"] = 1
    durations["g"] = 1
    durations["h"] = 100
In this case, only the first two groups are utilized and the rest are empty.

That almost seems like a feature, not a bug; but more importantly, is that something that was changed in #14? We discussed fixing the change in behavior and the possibility of guaranteeing that every group has a test in #26, but to be clear - this PR just tries to fix the first point.

Would the test above behave differently in previous versions?

alexforencich · 2021-06-23T08:24:19Z

Well, if it wasn't causing github actions to crash, I wouldn't care as much. But it is, so it's a bug. Looks like yes, the splitting algorithm was rewritten in #14. Looks like the previous version maybe was maybe duplicating tests in this case (not explicitly, but due to a side-effect of how it was written) so there weren't any empty groups, but this wasn't noticed since it didn't cause pytest to exit prematurely.

mbkroese · 2021-06-23T16:38:19Z

Incidentally, I think you can get slightly better performance out of least_duration by sorting the tests by duration first so they are processed from longest to shortest.

@alexforencich you're right, that would be better. I just stuck with the current solution because that preserved relative order, which I thought was important for some. I'm happy to add this at a later stage.

sondrelg · 2021-06-23T17:01:09Z

Out of curiosity, why do people care about ordering? I really only want my test groups to be as even as possible, because the longest group sets the total time of my pipeline.

mbkroese · 2021-06-23T17:03:10Z

Just imagine you work for a company that has a large suite of tests and the corresponding technical debt. The tests already have order dependency (by accident) but it's too big a problem to fix all at once.

sondrelg · 2021-06-23T17:04:42Z

Yep lack of test isolation is the only thing I can think of 🙂

alexforencich · 2021-06-23T19:36:05Z

Well, you still might have ordering dependency issues as you're not going to run all of the tests.

For relative order, you can restore that by assigning monotonically increasing IDs before doing the partitioning, then sort each group by this ID before running the tests.

sondrelg · 2021-06-26T10:09:17Z

Updated the PR now @alexforencich @jerry-git

The duration_based_chunks algo should now have the following properties:

Guarantees that every group is assigned a test (as long as test count > splits)
Maintains order

alexforencich · 2021-06-29T05:27:05Z

Definitely looks a lot better. Still greedy I think, but not sure if there is a nice way of improving that without making it a lot more complicated. At any rate, if you want the most even split, least_duration is probably a better choice.

Next step is to add some "safety" code to go ensure that there is at least one test enabled in each group, even if there are fewer tests than groups, by enabling a test in empty groups. But that's not for this PR.

sondrelg · 2021-07-09T18:51:55Z

@jerry-git this is ready to review if you get a chance 🙂

jerry-git · 2021-07-21T10:33:20Z

Ahoy! Sorry this took a while, I was on vacation mode 🍹 I'll have a look.

tests/test_algorithms.py

jerry-git · 2021-07-21T10:45:30Z

tests/test_algorithms.py

+        durations["g"] = 1
+        durations["h"] = 1
+
+        items = [item(x) for x in ["a", "b", "c", "d", "e", "f", "g", "h"]]


Would it be the same if the loop would be over sorted(durations.keys())?

What do you mean, sorry?

I think he's saying you can do items = sorted(durations.keys()) instead of duplicating all of the items.

yeah I'm basically asking if this would be the same:

items = [item(x) for x in sorted(durations.keys())]

IMO it would be better considering readability as then it's explicit that there's one item for each thingy in durations

I just copied in the example test from here and made sure we had one example that was extremely front-loaded and one that was extremely back-loaded. I can make it less explicit if you prefer.

tests/test_algorithms.py

jerry-git · 2021-07-21T10:58:01Z

tests/test_plugin.py

+        (4, 1, "duration_based_chunks", ["test_1", "test_2", "test_3"]),
+        (4, 2, "duration_based_chunks", ["test_4", "test_5"]),
+        (4, 3, "duration_based_chunks", ["test_6"]),
+        (4, 4, "duration_based_chunks", ["test_7", "test_8", "test_9", "test_10"]),


Doesn't this mean that the expected durations for each group will be:

1: 3s (1+1+1)

2: 2s (1+1)

3: 2s (2)

4: 8s (2+2+2+2)

If so, it doesn't feel right 🤔

1-5 are 1 second tests, while 5-10 are 2 second tests, so I believe the timings are

3 seconds

3 seconds

2 seconds

8 seconds

And yes this is weird, but it's also what you should expect if you want to stop adding tests before going over the threshold of 15/4=3,75 seconds, and keeping absolute order 🤷

It's a very simple greedy algorithm; it's pretty much unavoidable that you're going to get strange far-from-optimal results like this. The least_durations algorithm is much better and IMO should be the default and this algorithm should simply be removed. But this algorithm does have the benefit of maintaining the sequence of tests, which I suppose could potentially be beneficial in some cases if there are ordering dependencies between tests.

Actually, here's something to consider potentially: exceed the threshold, but keep track of a running total overage and use that when determining if you're over the threshold. (If you never exceed the threshold, then you're basically guaranteed to end up with the last group being larger than the rest) There are probably a few other heuristics that you could apply here to try to improve things a little bit. For instance, something like this: ignore overage if the slot is empty and unconditionally assign, if it isn't empty then check to see if the resulting overage is less than half of the time required for that test, if so pack it, otherwise bump it to the next group.

The least_durations algorithm is much better and IMO should be the default and this algorithm should simply be removed

I have a use case which requires this plugin to support maintaining the order of the tests, which is why we need to keep duration_based_chunks or something similar. However, I can consider changing the default but that'd probably bite some other users of this plugin as well so need to consider that carefully 🙂

I'll close this for now then 👍

Well, without these changes, is duration_based_chunks still producing empty groups? Or did that get fixed in a separate PR?

Yes probably. I noticed there are a few conflicts, so perhaps there's been another fix merged, but that's what this was meant to fix.

OK, well my point was that even if the splitting isn't as perfect as it could be, at least this fix doesn't result in empty groups, so it should be merged and then possibly improved, instead of being closed.

Sure thing. I've reopened it - feel free to make any additional changes you wish (cc @jerry-git) and merge. I'm on holiday now myself so likely won't be able to apply any code changes for a while.

jerry-git · 2021-07-22T06:00:11Z

tests/test_algorithms.py

+        groups = duration_based_chunks(8, items, durations)
+
+        for i in range(7):


Why 8 above and range(7) here in the loop below? Could we somehow get rid of the magical numbers and use e.g. variables which names would help the reader to understand what's going on?

Maybe use for g in groups: instead?

jerry-git · 2021-07-22T06:02:10Z

tests/test_algorithms.py

+        for i in range(7):
+            assert groups[i].selected != []


Is this same as assert all(group.selected for group in groups)?

That looks like it would probably work; maybe just add a comment to indicate that we're making sure there aren't any empty groups. Or maybe explicitly check that the size of each group is nonzero.

we're making sure there aren't any empty groups ... check that the size of each group is nonzero

IMO all(group.selected for group in groups) does both and is explicit enough so no comments or further explanations would be needed. Just my opinion ofc 🙂

jerry-git · 2023-04-12T06:43:06Z

Closing as stale, feel free to reopen 🙂

Get rid of test front-loading in the duration_based_chunks algo

517201d

sondrelg force-pushed the sondrelg/26 branch from 4e1a461 to f059b44 Compare June 26, 2021 10:07

jerry-git reviewed Jul 21, 2021

View reviewed changes

tests/test_algorithms.py Outdated Show resolved Hide resolved

jerry-git reviewed Jul 21, 2021

View reviewed changes

tests/test_algorithms.py Outdated Show resolved Hide resolved

jerry-git reviewed Jul 21, 2021

View reviewed changes

Update tests to reflect changes in the algorithm

3eca51e

sondrelg force-pushed the sondrelg/26 branch from f059b44 to 3eca51e Compare July 21, 2021 21:33

jerry-git reviewed Jul 22, 2021

View reviewed changes

sondrelg closed this Jul 22, 2021

sondrelg reopened this Jul 22, 2021

sondrelg mentioned this pull request Jan 31, 2023

duration_based_chunks algorithm produces empty groups, causing pytest to fail when collecting tests #26

Open

jerry-git closed this Apr 12, 2023

		groups = duration_based_chunks(8, items, durations)

		for i in range(7):

Get rid of test front-loading in the duration_based_chunks algo #27

Get rid of test front-loading in the duration_based_chunks algo #27

Uh oh!

Conversation

sondrelg commented Jun 22, 2021

Uh oh!

alexforencich commented Jun 22, 2021

Uh oh!

sondrelg commented Jun 22, 2021

Uh oh!

alexforencich commented Jun 22, 2021

Uh oh!

alexforencich commented Jun 22, 2021

Uh oh!

sondrelg commented Jun 23, 2021

Uh oh!

alexforencich commented Jun 23, 2021

Uh oh!

mbkroese commented Jun 23, 2021

Uh oh!

sondrelg commented Jun 23, 2021

Uh oh!

mbkroese commented Jun 23, 2021

Uh oh!

sondrelg commented Jun 23, 2021

Uh oh!

alexforencich commented Jun 23, 2021

Uh oh!

sondrelg commented Jun 26, 2021

Uh oh!

alexforencich commented Jun 29, 2021

Uh oh!

sondrelg commented Jul 9, 2021

Uh oh!

jerry-git commented Jul 21, 2021

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexforencich Jul 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexforencich Jul 21, 2021 •

edited

Loading