Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++/STL: recover from aborted tests #105

Merged
merged 3 commits into from
Mar 22, 2024
Merged

Conversation

generalmimon
Copy link
Member

@generalmimon generalmimon commented Mar 18, 2024

Currently, when a single C++ test aborts at runtime (typically due to a memory access violation), the entire test suite is terminated. See Class test_observer > test_aborted():

Called when a critical error is detected.

The critical errors are mainly the signals sent by the system and caught by the Boost.Test framework. Since the running binary may be in incoherent/instable state, the test execution is aborted and all remaining tests are discarded.

This PR allows the CI to recover from that by detecting the aborted test and running the test executable again excluding the previously aborted tests (using the --run_test=!<absolute_spec> option, see Test unit filtering > Relative specification).

@generalmimon generalmimon force-pushed the cpp-recover-from-aborted branch from 26ebddd to 805c811 Compare March 19, 2024 12:30
@generalmimon
Copy link
Member Author

generalmimon commented Mar 19, 2024

The recovery works as expected on target cpp_stl_11/gcc11-linux-x86_64 (./docker-ci -t cpp_stl -u _11 -i gcc11). However, it does not work on cpp_stl_11/clang3.4-linux-x86_64 (./docker-ci -t cpp_stl -u _11 -i clang3.4):

#### CppBuilder: running tests
#### CppBuilder: run attempt 1 (log: /tests/test_out/cpp_stl_11/results-1.xml)
#### CppBuilder: running command: ["/tests/compiled/cpp_stl_11/bin/ks_tests",
  "--log_format=XML", "--log_sink=/tests/test_out/cpp_stl_11/results-1.xml",
  "--log_level=all", "--report_level=detailed"],
  log: "/tests/test_out/cpp_stl_11/test_run-1.stdout"

Test suite "C++/STL test units for Kaitai Struct" aborted with:
  160 assertions out of 161 passed
  1 assertion out of 161 failed
  32 test cases out of 226 passed
  1 test case out of 226 failed
  193 test cases out of 226 skipped
  1 test case out of 226 aborted

  ...

  Test case "test_instance_std" passed with:
    1 assertion out of 1 passed

  Test case "test_debug_array_user" aborted with:
    1 assertion out of 1 failed

  Test case "test_imports_abs_abs" skipped due to test aborting
  ...
  Test case "test_instance_std_array" skipped due to test aborting
#### CppBuilder: process_status: #<Process::Status: pid 2095 exit 201>
#### CppBuilder: run attempt 1 aborted (1 tests aborted)
#### CppBuilder: run attempt 2 (log: /tests/test_out/cpp_stl_11/results-2.xml)
#### CppBuilder: running command: ["/tests/compiled/cpp_stl_11/bin/ks_tests",
  "--log_format=XML", "--log_sink=/tests/test_out/cpp_stl_11/results-2.xml",
  "--log_level=all", "--report_level=detailed", "--run_test=!test_debug_array_user"],
  log: "/tests/test_out/cpp_stl_11/test_run-2.stdout"
Test setup error: no test cases matching filter
#### CppBuilder: process_status: #<Process::Status: pid 2098 exit 200>

As you can see, as soon as the --run_test=!test_debug_array_user option is added, the Boost.Test executable fails with:

Test setup error: no test cases matching filter

It seems that this is because the {clang3.4,gcc4.8}-linux-x86_64 targets use Boost 1.54.0 (as I mention here), which is a version from 2013 and apparently doesn't support test exclusion yet (according to the docs for Boost Test 1.54.0, it supports the --run_test option, but only with positive specifications).

I see two ways to solve this problem:

  1. Don't use --run_test=!<test_name> to exclude tests, instead use another approach compatible with Boost 1.54.0.
  2. Install some newer version of Boost in clang3.4 and gcc4.8 that supports --run_test=!<test_name>.

I'd prefer option 2, but of course that will be a little more complicated than just apt-get install libboost-test-dev. We'll probably have to build Boost.Test from source (as we already do with CMake). A good tip from SO is to build only the libraries we need (in our case only the Boost.Test library, probably only for static linking) instead of building everything in the Boost suite, which might take a long time.

@generalmimon generalmimon marked this pull request as ready for review March 19, 2024 13:44
generalmimon added a commit to kaitai-io/kaitai_struct_docker_images that referenced this pull request Mar 22, 2024
See
kaitai-io/kaitai_struct_tests#105 (comment)
- Boost 1.54 is too old for our purposes. Fortunately, it's quite easy
to build Boost from source (according to the documentation at
https://www.boost.org/doc/libs/1_84_0/more/getting_started/unix-variants.html)
and it doesn't take long if we only need to build the Boost.Test
library. It's much faster than building CMake which is already being
done (on my machine, downloading and building Boost.Test takes 2
minutes, whereas building CMake takes 21 minutes).
The conditions for considering a test as aborted are purely empirical,
so it may turn out in the future that there can be an aborted test for
which some condition doesn't apply (at the moment I'm not aware of any
such case).

I just know that simply the presence of the `<Exception>` tag isn't
enough to reliably detect an aborted test, because some `<Exception>`s
are non-fatal (for example
https://github.com/kaitai-io/ci_artifacts/blob/64438fbd11da816bcba374c23cd0e31c01fe4abe/test_out/cpp_stl_11/results.xml
contains two `<TestCase>` tags with `<Exception>`, one `abort() has been
called` is non-fatal and the other one `memory access violation occurred
at address 0x00000010` is fatal).

This is why we also check the presence of the `Test is aborted` message
by `boost/test/impl/unit_test_log.ipp`, as it seems to always be
printed from this source file exactly in the case of an aborted test
(see https://www.boost.org/doc/libs/1_84_0/boost/test/impl/unit_test_log.ipp):

```cpp
void
unit_test_log_t::test_aborted()
{
    BOOST_TEST_LOG_ENTRY( log_messages ) << "Test is aborted";
}
```
Although this generates numbered `valgrind-*.xml` files, at the moment
only the last (successful, with no aborted tests) attempt ends up being
copied to `valgrind.xml` and used by `aggregate/convert_to_json`. This
can be improved later.
@generalmimon generalmimon force-pushed the cpp-recover-from-aborted branch from 805c811 to 25b2182 Compare March 22, 2024 13:16
@generalmimon generalmimon merged commit 3b650da into master Mar 22, 2024
@generalmimon generalmimon deleted the cpp-recover-from-aborted branch March 22, 2024 13:18
@@ -304,33 +305,58 @@ def file_to_test(filename)
end

def run_tests
xml_log = "#{@abs_cpp_test_out_dir}/results.xml"
attempt = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solution adds yet another trying loop with its own attempts counter, etc.

As alternative idea: will it bring any benefits we detect offending (aborted) test and then feed that information back to normal partial builder mechanism to exclude offending files out of the binary to get us the stable set of tests in one binary in the end?

On one hand:

  • this avoids looking into boost::test invocation flags which might be not compatible with boost versions you've mentioned
  • this requires test parsing to determine offending test, but you did that already

On the other hand, I see that such a feedback mechanism from run_tests does not exist now and inventing it on whole partial_builder.rb might be trickier...

Copy link
Member Author

@generalmimon generalmimon Mar 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solution adds yet another trying loop with its own attempts counter, etc.

Yeah. I don't think you can eliminate it completely in principle, because you don't know in advance which tests will abort, and as soon one test aborts, no more tests are run (this is by design, Boost has no "continue running the suite despite an aborted test" option, that's not a thing). If you want them to run, you have no option but to run the test executable again without that offending test.

As alternative idea: will it bring any benefits we detect offending (aborted) test and then feed that information back to normal partial builder mechanism to exclude offending files out of the binary to get us the stable set of tests in one binary in the end?

I briefly considered this approach as well, but don't see any benefits. In my view, it would only be more complicated and less flexible. As I explain above, you have to run the test suite N + 1 times in any case (where N is the number of aborted tests), only now with the unnecessary extra step of repeating the linking the object files into the ks_tests executable.

  • this avoids looking into boost::test invocation flags which might be not compatible with boost versions you've mentioned

This is no longer a problem because since kaitai-io/kaitai_struct_docker_images@1ca846d the {clang3.4,gcc4.8}-linux-x86_64 targets build the latest Boost 1.84 from source (instead of using ancient the Boost 1.54), which works like a charm. None of Boost versions we use at the moment have a problem with --run_test=!<test_name> anymore (as evidenced by https://ci.kaitai.io/ - coincidentally, all cpp_stl_{98,11} targets currently have at least one aborted test NavParentRecursive, yet they all recovered successfully), so there's no need to avoid it.


An actual disadvantage of the approach you're suggesting is that with the "stable" test binary at the end, you won't be able to run the aborted tests via Valgrind after CppBuilder has finished. This means that the only information about an aborted will be the message from when the OS signal was caught by Boost, i.e.:

memory access violation at address: 0x00000008: no mapping at fault address

... which isn't really helpful to diagnose the error - there's no file name, no line number, no stack trace. Valgrind can give you all that, if you only let it run the aborted test (cpp_stl_11/gcc11-linux-x86_64 > test_out/cpp_stl_11/valgrind-1.xml:36-57):

<error>
  <unique>0x0</unique>
  <tid>1</tid>
  <kind>InvalidRead</kind>
  <what>Invalid read of size 8</what>
  <stack>
    <frame>
      <ip>0x54235A</ip>
      <obj>/tests/compiled/cpp_stl_11/bin/ks_tests</obj>
      <fn>debug_array_user_t::cat_t::_read()</fn>
      <dir>/tests/compiled/cpp_stl_11</dir>
      <file>debug_array_user.cpp</file>
      <line>37</line>
    </frame>
    <frame>
      <ip>0x542205</ip>
      <obj>/tests/compiled/cpp_stl_11/bin/ks_tests</obj>
      <fn>debug_array_user_t::_read()</fn>
      <dir>/tests/compiled/cpp_stl_11</dir>
      <file>debug_array_user.cpp</file>
      <line>20</line>
    </frame>

(These Valgrind logs of aborted tests are currently produced, but not processed by aggregate/convert_to_json - I left this as a future improvement.)

'--log_level=all',
'--report_level=detailed',
]
excluded_tests.each { |test| tests_cli << "--run_test=!#{test}" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is also a bit dangerous on Windows: I recall that command line was limited to 32KiB. If we'll concatenate all test names we have now ~7KiB worth of such strings, but it potentially plants an unpleasant surprise for the future.

The only way to work that around as far as I remember is for executable to provide alternative means to source command line args (e.g. read them from file), but boost::test does not seem to support that.

Copy link
Member Author

@generalmimon generalmimon Mar 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, this is a valid concern (I thought of that too, it just didn't feel that important to me) and not only on Windows, Unix-like OSes have a limit too, though probably higher than 32 KiB. In practice, I believe we are relatively far from reaching that limit (as you mentioned, even if every single test aborted, which already sounds extraordinary, we would still be well below that limit).

As you noted, by default make Boost.Test doesn't seem to support reading arguments from file. But I see two ways we could solve this problem:

  1. Even if Boost.Test itself doesn't support reading CLI args from file, we could probably implement this functionality ourselves by customizing the module's entry point. The new entry point would simply read the arguments from a file and pass them to the actual main function provided by Boost.Test.

  2. We could take inspiration from how xargs works, i.e. splitting the long list of arguments into multiple batches so each batch is below the maximum limit. We cannot reliably do this with the exclusion list, but we can with an inclusion list - so we would use the absolute specification, positively listing each test to run.

    This approach has a number of downsides compared to approach 1 - the cpp_builder would have to keep track of the entire set of test names, find out the command line length limit (or use xargs to take care of that, although that introduces a dependency) and the number of tests that we can run at once (i.e. in one batch) would be limited by how many we can list in the command line, so the test executable would have to be run multiple times just to execute a single snapshot of the test set.

Approach 1 seems much better and it should be pretty easy to implement, probably just a few lines of code.

But I don't see it as a high priority right now because it's not currently causing a problem. My view of our testing system is that it must be only as good as it proves necessary, and no better, because it's an internal tool, not meant be used by end users. We don't need to support C++ compilers / Boost versions that we don't run, or situations that have never happened or cannot happen with the current state of the project.

You know that I'm a perfectionist and I'd rather see everything working in the general case, but when it comes to the testing system, even I realize that solving all potential problems "in advance" doesn't make any difference (it's better to focus our efforts to pretty much any other KS component), only solving real problems does.

read_next=1
set -- "$@" "--run_test=!${line}"
done < "${CPP_TEST_OUT_DIR}/aborted_tests-${run_attempt}.txt"
done
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, one more argument towards re-doing the builds excluding offending file after run: we won't need to maintain such complex magic around valgrind runs.

Copy link
Member Author

@generalmimon generalmimon Mar 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to do that now either - as far as the contents of ci.json is concerned at the moment, only the log from the last Valgrind run (excluding any aborted tests) is copied to valgrind.xml and used by aggregate/convert_to_json. This is explained at the bottom of valgrind-cpp_stl (valgrind-cpp_stl:45-47):

# Currently, aggregate/convert_to_json only processes "valgrind.xml", so we
# make a copy with that name of the last (successful) attempt's log
cp -v "$CPP_TEST_OUT_DIR/valgrind-${run_attempt}.xml" "$CPP_TEST_OUT_DIR/valgrind.xml"

However, we'll likely eventually want to aggregate all valgrind-*.xml logs that this new version of valgrind-cpp_stl produces, because these Valgrind logs from running aborted tests contain all the details of the memory error causing the abort, as I mentioned above in #105 (comment).

So although the generated valgrind-*.xml logs are not used at the moment, they can be already helpful when manually inspecting the CI artifacts, and it doesn't hurt to have them available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants