Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mtest: add option to slice tests #14092

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

pks-t
Copy link
Contributor

@pks-t pks-t commented Jan 8, 2025

Executing tests can take a very long time. As an example, the Git test suite on Windows takes around 4 hours to execute. The Git project has been working around the issue by splitting up CI jobs into multiple slices: one job creates the build artifacts, and then we spawn N test jobs with those artifacts, where each test job executes 1/Nth of the tests.

This can be scripted rather easily by using meson test --list, selecting every Nth line, but there may be other projects that have a similar need. Wire up a new option "--slice i/n" to meson test that does implements this logic.

Note: I haven't yet added any tests and didn't update documentation yet because I first want to double check whether this is something that you think makes sense. If so I'll iterate on the MR.

@dcbaker
Copy link
Member

dcbaker commented Jan 8, 2025

The implementation looks fine, I don't have any objections to the functionality (slicing or sharding is pretty common in testing scenarios).

This does need some documentation updates, and a release snippet in docs/markdown/snippets

@@ -1976,6 +2000,9 @@ def get_tests(self, errorfile: T.Optional[T.IO] = None) -> T.List[TestSerialisat
tests = [t for t in self.tests if self.test_suitable(t)]
if self.options.args:
tests = list(self.tests_from_args(tests))
if self.options.slice:
our_slice, nslices = self.options.slice
tests = tests[our_slice::nslices]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We require slices in the format 1/10 through 10/10 to select each slice (nslices >= subslice > 0). Indexing into that naively will skip the first test and slice 10 will repeat all tests from slice 0 other than the first test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indexing into that naively will skip the first test and slice 10 will repeat all tests from slice 0 other than the first test.

Indeed, I also found that a couple minutes ago while writing tests for this change. Fixed now.

@pks-t
Copy link
Contributor Author

pks-t commented Jan 8, 2025

I've backfilled docs and tests now.

@pks-t
Copy link
Contributor Author

pks-t commented Jan 8, 2025

One question is what to do when we have more slices than tests. Should we raise an error? Right now we don't run any tests in that case.

Comment on lines 5084 to 5092
for arg, expectation in {
'1/1': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'1/2': [1, 3, 5, 7, 9],
'2/2': [2, 4, 6, 8, 10],
'1/10': [1],
'2/20': [2],
'10/20': [10],
'11/20': [ ],
}.items():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistently indented (the later block indents more, to allow distinguishing between the dictionary values and the indented for: block contents).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm... confused? You changed the other instance instead of this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It felt more in line with the rest of the file, which also tends to indent by 4 spaces. Also happy to switch to 8 spaces though in case that's the preferred coding style.

Copy link
Member

@eli-schwartz eli-schwartz Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In "programs: favor version numbers with dots", you did the same thing, I assumed deliberately. 8-space indents specifically when defining a dictionary inline inside of a "for" loop, 4-space indents everywhere else. I was surprised to see you not do the same here. ;)

The coding style uses the basic python style of 4-space indents by default. The issue is when you do this:

def my_important_func(arg1: str, arg2: str, arg3: str, arg4: str, arg5: str, arg6: str, arg7: str, arg8: str, arg9: str, arg10: str, arg11: str, arg12: str, arg13: str, arg14: str, arg15: str):
    for x, y in {
        'key1': 'val1',
        'key2': 'val2',
    }.items():
        func(x, y)

Indents are of course expected and mandatory for declaring block nesting, given function def -> func-body and for -> loop-body. Indents are not required for foo(<args>), but you can: the current one is 193 columns, which overflows the 80 columns people often like to stick with, and at least one autoformatter converts this to an unreadable mess:

def my_important_func(
    arg1: str,
    arg2: str,
    arg3: str,
    arg4: str,
    arg5: str,
    arg6: str,
    arg7: str,
    arg8: str,
    arg9: str,
    arg10: str,
    arg11: str,
    arg12: str,
    arg13: str,
    arg14: str,
    arg15: str,
):
    for x, y in {
        "key1": "val1",
        "key2": "val2",
    }.items():
        func(x, y)

Note how it's difficult to visually distinguish between the function parameters and the function body, since they share an indent level. The correct way to do this is of course:

def my_important_func(arg1: str, arg2: str, arg3: str, arg4: str, arg5: str,
                      arg6: str, arg7: str, arg8: str, arg9: str, arg10: str,
                      arg11: str, arg12: str, arg13: str, arg14: str, arg15: str):
    for x, y in some_inline_dict_probably:
        func(x, y)

Or in some cases,

def my_important_func(long_func_argument: T.Dict[str, T.List[str]],
                      arg2: T.Optional[MyComplexAnnotation] = None,
                      arg3: str = '',
                      ) -> ReturnedAnnotation:
    for x, y in some_inline_dict_probably:
        func(x, y)

The same concern manifests with inline dictionaries. Indenting them 4-spaces means they have the same look and feel as the body of the for loop, but we want it to be visually distinct. One way of doing that is to have an 8-space indent instead, and walk back to a 4-space indent in the actual body:

for arg, expectation in {
        '1/1': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
        '1/2': [1, 3, 5, 7, 9],
}.items():
    func(arg, expectation)

Another approach is:

for arg, expectation in {'1/1': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
                         '1/2': [1, 3, 5, 7, 9],
                         '2/2': [2, 4, 6, 8, 10],
                         '1/10': [1],
                         '2/20': [2],
                         '10/20': [10],
                         '11/20': [ ],
                         }.items():
    func(arg, expectation)

I have a mild preference for this latter style (though if dictionary elements are long I can understand why people would want to avoid it) but can also tolerate the former. What I really do not like at all is having the dictionary keys line up with the block contents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, thanks for the explanation! I've adopted the latter style.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I am more than happy to talk about the pros and cons that go into evaluating this! :) We are a bit quirky after all, compared to the average python project. (In particular, the fact that we don't just use an autoformatter and accept whatever we are told to accept.)

@pks-t pks-t force-pushed the pks-test-slices branch 2 times, most recently from 4d79335 to 6250d1b Compare January 9, 2025 05:53
@eli-schwartz eli-schwartz dismissed their stale review January 9, 2025 06:21

review has been addressed

@eli-schwartz
Copy link
Member

One question is what to do when we have more slices than tests. Should we raise an error? Right now we don't run any tests in that case.

I think there are two choices about what to do here:

  • raise an error
  • cap it at the number of tests, and slice one test per distributed test job

But I can't make up my mind which one is preferable.

@eli-schwartz
Copy link
Member

I think that what ends up happening is we produce len(tests) slices, each with one test, plus NUM_SLICES - len(tests) empty slices, so we will end up running all tests across the the first set of jobs (one test per job!), but eventually some jobs will run nothing at all. That's effectively what "max capping" would actually mean.

Do we want this? Do we want to force people to get the number of slices correct? I think the practical distinction would mean to e.g. not spin up as many CI runners which you don't actually need.

@pks-t
Copy link
Contributor Author

pks-t commented Jan 9, 2025

One question is what to do when we have more slices than tests. Should we raise an error? Right now we don't run any tests in that case.

I think there are two choices about what to do here:

* raise an error

* cap it at the number of tests, and slice one test per distributed test job

But I can't make up my mind which one is preferable.

The former one is a bit awkward because we cannot do it when parsing arguments -- we ain't got the number of tests available yet. But the latter one has its problems, as well, as the question is what happens in the case where you've got 10 tests, but ask for 11/20. Even if we cap 20 to 10 we'd now select test 11 of that range, which is nonsensical. It's also unclear what that one should be corrected to.

So I think I lean into the direction of raising an error. It's also easier to relax in the future if we ever think an alternate behaviour is more sensible.

Do we want this? Do we want to force people to get the number of slices correct? I think the practical distinction would mean to e.g. not spin up as many CI runners which you don't actually need.

Yeah, indeed, and I would call that a win. I'd rather want to give people a hint that their assumptions are wrong than to silently do nothing.

@eli-schwartz
Copy link
Member

eli-schwartz commented Jan 9, 2025

The former one is a bit awkward because we cannot do it when parsing arguments -- we ain't got the number of tests available yet.

Not all validation needs to be done inside argument parsing. :D This sort of mirrors the handling we do in interpreter.py -- where indeed it's preferred if the logic fits, to do argument validation inside the function entry decorator (@typed_kwargs using a list of KwargInfo validators) which has no access to state. But there are plenty of cases where the need is a lot more complex and we push it off until the function body.

Yeah, indeed, and I would call that a win. I'd rather want to give people a hint that their assumptions are wrong than to silently do nothing.

Fair enough, let's do that.

Executing tests can take a very long time. As an example, the Git test
suite on Windows takes around 4 hours to execute. The Git project has
been working around the issue by splitting up CI jobs into multiple
slices: one job creates the build artifacts, and then we spawn N test
jobs with those artifacts, where each test job executes 1/Nth of the
tests.

This can be scripted rather easily by using `meson test --list`,
selecting every Nth line, but there may be other projects that have a
similar need. Wire up a new option "--slice i/n" to `meson test` that
does implements this logic.

Signed-off-by: Patrick Steinhardt <[email protected]>
@pks-t
Copy link
Contributor Author

pks-t commented Jan 9, 2025

Okay, added in that validation now. Let me know what you think!

pks-t added a commit to pks-t/git that referenced this pull request Jan 9, 2025
Add a new job that builds and tests Meson-based builds with Visual
Studio. While the build job is mandatory, the test job is marked as
"manual" so that it doesn't run by default. We already have a bunch of
Windows-based jobs, and the computational overhead that these cause is
simply out of proportion to run the test suite twice.

A couple notes:

  - We disable Perl. This is because we pick up Perl from Git for
    Windows, which outputs different paths ("/c/" instead of "C:\") than
    what we expect in our tests.

  - We don't use the Git for Windows SDK. Instead, the build only
    depends on Visual Studio, Meson and Git for Windows. All the other
    dependencies like curl, pcre2 and zlib get pulled in and compiled
    automatically by Meson and thus do not have to be provided by the
    system.

  - We open-code "ci/run-test-slice.sh". This is because we only have
    direct access to PowerShell, so we manually implement the logic.
    There is an upstream pull request for the Meson build system [1] to
    implement test slicing in Meson directly.

All tests are passing.

[1]: mesonbuild/meson#14092

Signed-off-by: Patrick Steinhardt <[email protected]>
pks-t added a commit to pks-t/git that referenced this pull request Jan 9, 2025
Add a new job to GitHub Actions and GitLab CI that builds and tests
Meson-based builds with Visual Studio. While the build job is mandatory,
the test job is marked as "manual" so that it doesn't run by default. We
already have a bunch of Windows-based jobs, and the computational
overhead that these cause is simply out of proportion to run the test
suite twice.

A couple notes:

  - We disable Perl. This is because we pick up Perl from Git for
    Windows, which outputs different paths ("/c/" instead of "C:\") than
    what we expect in our tests.

  - We don't use the Git for Windows SDK. Instead, the build only
    depends on Visual Studio, Meson and Git for Windows. All the other
    dependencies like curl, pcre2 and zlib get pulled in and compiled
    automatically by Meson and thus do not have to be provided by the
    system.

  - We open-code "ci/run-test-slice.sh". This is because we only have
    direct access to PowerShell, so we manually implement the logic.
    There is an upstream pull request for the Meson build system [1] to
    implement test slicing in Meson directly.

All tests are passing.

[1]: mesonbuild/meson#14092

Signed-off-by: Patrick Steinhardt <[email protected]>
pks-t added a commit to pks-t/git that referenced this pull request Jan 9, 2025
Add a new job to GitHub Actions and GitLab CI that builds and tests
Meson-based builds with Visual Studio. While the build job is mandatory,
the test job is marked as "manual" so that it doesn't run by default. We
already have a bunch of Windows-based jobs, and the computational
overhead that these cause is simply out of proportion to run the test
suite twice.

A couple notes:

  - We disable Perl. This is because we pick up Perl from Git for
    Windows, which outputs different paths ("/c/" instead of "C:\") than
    what we expect in our tests.

  - We don't use the Git for Windows SDK. Instead, the build only
    depends on Visual Studio, Meson and Git for Windows. All the other
    dependencies like curl, pcre2 and zlib get pulled in and compiled
    automatically by Meson and thus do not have to be provided by the
    system.

  - We open-code "ci/run-test-slice.sh". This is because we only have
    direct access to PowerShell, so we manually implement the logic.
    There is an upstream pull request for the Meson build system [1] to
    implement test slicing in Meson directly.

All tests are passing.

[1]: mesonbuild/meson#14092

Signed-off-by: Patrick Steinhardt <[email protected]>
pks-t added a commit to pks-t/git that referenced this pull request Jan 13, 2025
Add a new job to GitHub Actions and GitLab CI that builds and tests
Meson-based builds with Visual Studio.

A couple notes:

  - While the build job is mandatory, the test job is marked as "manual"
    on GitLab so that it doesn't run by default. We already have a bunch
    of Windows-based jobs, and the computational overhead that these
    cause is simply out of proportion to run the test suite twice.

    The same isn't true for GitHub as I could not find a way to make a
    subset of jobs manually triggered.

  - We disable Perl. This is because we pick up Perl from Git for
    Windows, which outputs different paths ("/c/" instead of "C:\") than
    what we expect in our tests.

  - We don't use the Git for Windows SDK. Instead, the build only
    depends on Visual Studio, Meson and Git for Windows. All the other
    dependencies like curl, pcre2 and zlib get pulled in and compiled
    automatically by Meson and thus do not have to be provided by the
    system.

  - We open-code "ci/run-test-slice.sh". This is because we only have
    direct access to PowerShell, so we manually implement the logic.
    There is an upstream pull request for the Meson build system [1] to
    implement test slicing in Meson directly.

  - We don't process test artifacts for failed CI jobs. This is done to
    keep down prerequisites to a minimum.

All tests are passing.

[1]: mesonbuild/meson#14092

Signed-off-by: Patrick Steinhardt <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants