mtest: add option to slice tests #14092

pks-t · 2025-01-08T10:45:49Z

Executing tests can take a very long time. As an example, the Git test suite on Windows takes around 4 hours to execute. The Git project has been working around the issue by splitting up CI jobs into multiple slices: one job creates the build artifacts, and then we spawn N test jobs with those artifacts, where each test job executes 1/Nth of the tests.

This can be scripted rather easily by using meson test --list, selecting every Nth line, but there may be other projects that have a similar need. Wire up a new option "--slice i/n" to meson test that does implements this logic.

Note: I haven't yet added any tests and didn't update documentation yet because I first want to double check whether this is something that you think makes sense. If so I'll iterate on the MR.

dcbaker · 2025-01-08T18:11:27Z

The implementation looks fine, I don't have any objections to the functionality (slicing or sharding is pretty common in testing scenarios).

This does need some documentation updates, and a release snippet in docs/markdown/snippets

eli-schwartz · 2025-01-08T19:05:52Z

mesonbuild/mtest.py

@@ -1976,6 +2000,9 @@ def get_tests(self, errorfile: T.Optional[T.IO] = None) -> T.List[TestSerialisat
        tests = [t for t in self.tests if self.test_suitable(t)]
        if self.options.args:
            tests = list(self.tests_from_args(tests))
+        if self.options.slice:
+            our_slice, nslices = self.options.slice
+            tests = tests[our_slice::nslices]


We require slices in the format 1/10 through 10/10 to select each slice (nslices >= subslice > 0). Indexing into that naively will skip the first test and slice 10 will repeat all tests from slice 0 other than the first test.

Indexing into that naively will skip the first test and slice 10 will repeat all tests from slice 0 other than the first test.

Indeed, I also found that a couple minutes ago while writing tests for this change. Fixed now.

pks-t · 2025-01-08T19:10:01Z

I've backfilled docs and tests now.

pks-t · 2025-01-08T19:10:28Z

One question is what to do when we have more slices than tests. Should we raise an error? Right now we don't run any tests in that case.

eli-schwartz · 2025-01-08T19:14:20Z

unittests/allplatformstests.py

+        for arg, expectation in {
+            '1/1': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+            '1/2': [1, 3, 5, 7, 9],
+            '2/2': [2, 4, 6, 8, 10],
+            '1/10': [1],
+            '2/20': [2],
+            '10/20': [10],
+            '11/20': [ ],
+        }.items():


Inconsistently indented (the later block indents more, to allow distinguishing between the dictionary values and the indented for: block contents).

Thanks, fixed.

I'm... confused? You changed the other instance instead of this one?

It felt more in line with the rest of the file, which also tends to indent by 4 spaces. Also happy to switch to 8 spaces though in case that's the preferred coding style.

In "programs: favor version numbers with dots", you did the same thing, I assumed deliberately. 8-space indents specifically when defining a dictionary inline inside of a "for" loop, 4-space indents everywhere else. I was surprised to see you not do the same here. ;)

The coding style uses the basic python style of 4-space indents by default. The issue is when you do this:

def my_important_func(arg1: str, arg2: str, arg3: str, arg4: str, arg5: str, arg6: str, arg7: str, arg8: str, arg9: str, arg10: str, arg11: str, arg12: str, arg13: str, arg14: str, arg15: str): for x, y in { 'key1': 'val1', 'key2': 'val2', }.items(): func(x, y)

Indents are of course expected and mandatory for declaring block nesting, given function def -> func-body and for -> loop-body. Indents are not required for foo(<args>), but you can: the current one is 193 columns, which overflows the 80 columns people often like to stick with, and at least one autoformatter converts this to an unreadable mess:

def my_important_func( arg1: str, arg2: str, arg3: str, arg4: str, arg5: str, arg6: str, arg7: str, arg8: str, arg9: str, arg10: str, arg11: str, arg12: str, arg13: str, arg14: str, arg15: str, ): for x, y in { "key1": "val1", "key2": "val2", }.items(): func(x, y)

Note how it's difficult to visually distinguish between the function parameters and the function body, since they share an indent level. The correct way to do this is of course:

def my_important_func(arg1: str, arg2: str, arg3: str, arg4: str, arg5: str, arg6: str, arg7: str, arg8: str, arg9: str, arg10: str, arg11: str, arg12: str, arg13: str, arg14: str, arg15: str): for x, y in some_inline_dict_probably: func(x, y)

Or in some cases,

def my_important_func(long_func_argument: T.Dict[str, T.List[str]], arg2: T.Optional[MyComplexAnnotation] = None, arg3: str = '', ) -> ReturnedAnnotation: for x, y in some_inline_dict_probably: func(x, y)

The same concern manifests with inline dictionaries. Indenting them 4-spaces means they have the same look and feel as the body of the for loop, but we want it to be visually distinct. One way of doing that is to have an 8-space indent instead, and walk back to a 4-space indent in the actual body:

for arg, expectation in { '1/1': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], '1/2': [1, 3, 5, 7, 9], }.items(): func(arg, expectation)

Another approach is:

for arg, expectation in {'1/1': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], '1/2': [1, 3, 5, 7, 9], '2/2': [2, 4, 6, 8, 10], '1/10': [1], '2/20': [2], '10/20': [10], '11/20': [ ], }.items(): func(arg, expectation)

I have a mild preference for this latter style (though if dictionary elements are long I can understand why people would want to avoid it) but can also tolerate the former. What I really do not like at all is having the dictionary keys line up with the block contents.

Fair, thanks for the explanation! I've adopted the latter style.

And I am more than happy to talk about the pros and cons that go into evaluating this! :) We are a bit quirky after all, compared to the average python project. (In particular, the fact that we don't just use an autoformatter and accept whatever we are told to accept.)

review has been addressed

eli-schwartz · 2025-01-09T06:24:49Z

One question is what to do when we have more slices than tests. Should we raise an error? Right now we don't run any tests in that case.

I think there are two choices about what to do here:

raise an error
cap it at the number of tests, and slice one test per distributed test job

But I can't make up my mind which one is preferable.

eli-schwartz · 2025-01-09T06:33:58Z

I think that what ends up happening is we produce len(tests) slices, each with one test, plus NUM_SLICES - len(tests) empty slices, so we will end up running all tests across the the first set of jobs (one test per job!), but eventually some jobs will run nothing at all. That's effectively what "max capping" would actually mean.

Do we want this? Do we want to force people to get the number of slices correct? I think the practical distinction would mean to e.g. not spin up as many CI runners which you don't actually need.

pks-t · 2025-01-09T06:35:48Z

One question is what to do when we have more slices than tests. Should we raise an error? Right now we don't run any tests in that case.

I think there are two choices about what to do here:
* raise an error

* cap it at the number of tests, and slice one test per distributed test job
But I can't make up my mind which one is preferable.

The former one is a bit awkward because we cannot do it when parsing arguments -- we ain't got the number of tests available yet. But the latter one has its problems, as well, as the question is what happens in the case where you've got 10 tests, but ask for 11/20. Even if we cap 20 to 10 we'd now select test 11 of that range, which is nonsensical. It's also unclear what that one should be corrected to.

So I think I lean into the direction of raising an error. It's also easier to relax in the future if we ever think an alternate behaviour is more sensible.

Do we want this? Do we want to force people to get the number of slices correct? I think the practical distinction would mean to e.g. not spin up as many CI runners which you don't actually need.

Yeah, indeed, and I would call that a win. I'd rather want to give people a hint that their assumptions are wrong than to silently do nothing.

eli-schwartz · 2025-01-09T06:43:13Z

The former one is a bit awkward because we cannot do it when parsing arguments -- we ain't got the number of tests available yet.

Not all validation needs to be done inside argument parsing. :D This sort of mirrors the handling we do in interpreter.py -- where indeed it's preferred if the logic fits, to do argument validation inside the function entry decorator (@typed_kwargs using a list of KwargInfo validators) which has no access to state. But there are plenty of cases where the need is a lot more complex and we push it off until the function body.

Yeah, indeed, and I would call that a win. I'd rather want to give people a hint that their assumptions are wrong than to silently do nothing.

Fair enough, let's do that.

Executing tests can take a very long time. As an example, the Git test suite on Windows takes around 4 hours to execute. The Git project has been working around the issue by splitting up CI jobs into multiple slices: one job creates the build artifacts, and then we spawn N test jobs with those artifacts, where each test job executes 1/Nth of the tests. This can be scripted rather easily by using `meson test --list`, selecting every Nth line, but there may be other projects that have a similar need. Wire up a new option "--slice i/n" to `meson test` that does implements this logic. Signed-off-by: Patrick Steinhardt <[email protected]>

pks-t · 2025-01-09T06:45:59Z

Okay, added in that validation now. Let me know what you think!

Add a new job that builds and tests Meson-based builds with Visual Studio. While the build job is mandatory, the test job is marked as "manual" so that it doesn't run by default. We already have a bunch of Windows-based jobs, and the computational overhead that these cause is simply out of proportion to run the test suite twice. A couple notes: - We disable Perl. This is because we pick up Perl from Git for Windows, which outputs different paths ("/c/" instead of "C:\") than what we expect in our tests. - We don't use the Git for Windows SDK. Instead, the build only depends on Visual Studio, Meson and Git for Windows. All the other dependencies like curl, pcre2 and zlib get pulled in and compiled automatically by Meson and thus do not have to be provided by the system. - We open-code "ci/run-test-slice.sh". This is because we only have direct access to PowerShell, so we manually implement the logic. There is an upstream pull request for the Meson build system [1] to implement test slicing in Meson directly. All tests are passing. [1]: mesonbuild/meson#14092 Signed-off-by: Patrick Steinhardt <[email protected]>

Add a new job to GitHub Actions and GitLab CI that builds and tests Meson-based builds with Visual Studio. While the build job is mandatory, the test job is marked as "manual" so that it doesn't run by default. We already have a bunch of Windows-based jobs, and the computational overhead that these cause is simply out of proportion to run the test suite twice. A couple notes: - We disable Perl. This is because we pick up Perl from Git for Windows, which outputs different paths ("/c/" instead of "C:\") than what we expect in our tests. - We don't use the Git for Windows SDK. Instead, the build only depends on Visual Studio, Meson and Git for Windows. All the other dependencies like curl, pcre2 and zlib get pulled in and compiled automatically by Meson and thus do not have to be provided by the system. - We open-code "ci/run-test-slice.sh". This is because we only have direct access to PowerShell, so we manually implement the logic. There is an upstream pull request for the Meson build system [1] to implement test slicing in Meson directly. All tests are passing. [1]: mesonbuild/meson#14092 Signed-off-by: Patrick Steinhardt <[email protected]>

Add a new job to GitHub Actions and GitLab CI that builds and tests Meson-based builds with Visual Studio. A couple notes: - While the build job is mandatory, the test job is marked as "manual" on GitLab so that it doesn't run by default. We already have a bunch of Windows-based jobs, and the computational overhead that these cause is simply out of proportion to run the test suite twice. The same isn't true for GitHub as I could not find a way to make a subset of jobs manually triggered. - We disable Perl. This is because we pick up Perl from Git for Windows, which outputs different paths ("/c/" instead of "C:\") than what we expect in our tests. - We don't use the Git for Windows SDK. Instead, the build only depends on Visual Studio, Meson and Git for Windows. All the other dependencies like curl, pcre2 and zlib get pulled in and compiled automatically by Meson and thus do not have to be provided by the system. - We open-code "ci/run-test-slice.sh". This is because we only have direct access to PowerShell, so we manually implement the logic. There is an upstream pull request for the Meson build system [1] to implement test slicing in Meson directly. - We don't process test artifacts for failed CI jobs. This is done to keep down prerequisites to a minimum. All tests are passing. [1]: mesonbuild/meson#14092 Signed-off-by: Patrick Steinhardt <[email protected]>

pks-t force-pushed the pks-test-slices branch from 57093eb to e2158ec Compare January 8, 2025 11:07

dcbaker added the meson tests label Jan 8, 2025

pks-t force-pushed the pks-test-slices branch from e2158ec to 3798b81 Compare January 8, 2025 18:29

pks-t requested a review from jpakkane as a code owner January 8, 2025 18:29

eli-schwartz requested changes Jan 8, 2025

View reviewed changes

pks-t force-pushed the pks-test-slices branch from 3798b81 to c69b686 Compare January 8, 2025 19:09

pks-t force-pushed the pks-test-slices branch from c69b686 to 4383258 Compare January 8, 2025 19:11

eli-schwartz previously requested changes Jan 8, 2025

View reviewed changes

pks-t force-pushed the pks-test-slices branch 2 times, most recently from 4d79335 to 6250d1b Compare January 9, 2025 05:53

pks-t force-pushed the pks-test-slices branch from 6250d1b to 623605c Compare January 9, 2025 06:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mtest: add option to slice tests #14092

mtest: add option to slice tests #14092

pks-t commented Jan 8, 2025

dcbaker commented Jan 8, 2025

eli-schwartz Jan 8, 2025

pks-t Jan 8, 2025

pks-t commented Jan 8, 2025

pks-t commented Jan 8, 2025

eli-schwartz Jan 8, 2025

pks-t Jan 8, 2025

eli-schwartz Jan 8, 2025

pks-t Jan 8, 2025

eli-schwartz Jan 8, 2025 •

edited

Loading

pks-t Jan 9, 2025

eli-schwartz Jan 9, 2025

eli-schwartz commented Jan 9, 2025

eli-schwartz commented Jan 9, 2025

pks-t commented Jan 9, 2025

eli-schwartz commented Jan 9, 2025 •

edited

Loading

pks-t commented Jan 9, 2025

mtest: add option to slice tests #14092

Are you sure you want to change the base?

mtest: add option to slice tests #14092

Conversation

pks-t commented Jan 8, 2025

dcbaker commented Jan 8, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pks-t commented Jan 8, 2025

pks-t commented Jan 8, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eli-schwartz Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eli-schwartz commented Jan 9, 2025

eli-schwartz commented Jan 9, 2025

pks-t commented Jan 9, 2025

eli-schwartz commented Jan 9, 2025 • edited Loading

pks-t commented Jan 9, 2025

eli-schwartz Jan 8, 2025 •

edited

Loading

eli-schwartz commented Jan 9, 2025 •

edited

Loading