C++/STL: recover from aborted tests #105

generalmimon · 2024-03-18T22:55:36Z

Currently, when a single C++ test aborts at runtime (typically due to a memory access violation), the entire test suite is terminated. See Class test_observer > test_aborted():

Called when a critical error is detected.

The critical errors are mainly the signals sent by the system and caught by the Boost.Test framework. Since the running binary may be in incoherent/instable state, the test execution is aborted and all remaining tests are discarded.

This PR allows the CI to recover from that by detecting the aborted test and running the test executable again excluding the previously aborted tests (using the --run_test=!<absolute_spec> option, see Test unit filtering > Relative specification).

generalmimon · 2024-03-19T13:43:53Z

The recovery works as expected on target cpp_stl_11/gcc11-linux-x86_64 (./docker-ci -t cpp_stl -u _11 -i gcc11). However, it does not work on cpp_stl_11/clang3.4-linux-x86_64 (./docker-ci -t cpp_stl -u _11 -i clang3.4):

#### CppBuilder: running tests
#### CppBuilder: run attempt 1 (log: /tests/test_out/cpp_stl_11/results-1.xml)
#### CppBuilder: running command: ["/tests/compiled/cpp_stl_11/bin/ks_tests",
  "--log_format=XML", "--log_sink=/tests/test_out/cpp_stl_11/results-1.xml",
  "--log_level=all", "--report_level=detailed"],
  log: "/tests/test_out/cpp_stl_11/test_run-1.stdout"

Test suite "C++/STL test units for Kaitai Struct" aborted with:
  160 assertions out of 161 passed
  1 assertion out of 161 failed
  32 test cases out of 226 passed
  1 test case out of 226 failed
  193 test cases out of 226 skipped
  1 test case out of 226 aborted

  ...

  Test case "test_instance_std" passed with:
    1 assertion out of 1 passed

  Test case "test_debug_array_user" aborted with:
    1 assertion out of 1 failed

  Test case "test_imports_abs_abs" skipped due to test aborting
  ...
  Test case "test_instance_std_array" skipped due to test aborting
#### CppBuilder: process_status: #<Process::Status: pid 2095 exit 201>
#### CppBuilder: run attempt 1 aborted (1 tests aborted)
#### CppBuilder: run attempt 2 (log: /tests/test_out/cpp_stl_11/results-2.xml)
#### CppBuilder: running command: ["/tests/compiled/cpp_stl_11/bin/ks_tests",
  "--log_format=XML", "--log_sink=/tests/test_out/cpp_stl_11/results-2.xml",
  "--log_level=all", "--report_level=detailed", "--run_test=!test_debug_array_user"],
  log: "/tests/test_out/cpp_stl_11/test_run-2.stdout"
Test setup error: no test cases matching filter
#### CppBuilder: process_status: #<Process::Status: pid 2098 exit 200>

As you can see, as soon as the --run_test=!test_debug_array_user option is added, the Boost.Test executable fails with:

Test setup error: no test cases matching filter

It seems that this is because the {clang3.4,gcc4.8}-linux-x86_64 targets use Boost 1.54.0 (as I mention here), which is a version from 2013 and apparently doesn't support test exclusion yet (according to the docs for Boost Test 1.54.0, it supports the --run_test option, but only with positive specifications).

I see two ways to solve this problem:

Don't use --run_test=!<test_name> to exclude tests, instead use another approach compatible with Boost 1.54.0.
Install some newer version of Boost in clang3.4 and gcc4.8 that supports --run_test=!<test_name>.

I'd prefer option 2, but of course that will be a little more complicated than just apt-get install libboost-test-dev. We'll probably have to build Boost.Test from source (as we already do with CMake). A good tip from SO is to build only the libraries we need (in our case only the Boost.Test library, probably only for static linking) instead of building everything in the Boost suite, which might take a long time.

See kaitai-io/kaitai_struct_tests#105 (comment) - Boost 1.54 is too old for our purposes. Fortunately, it's quite easy to build Boost from source (according to the documentation at https://www.boost.org/doc/libs/1_84_0/more/getting_started/unix-variants.html) and it doesn't take long if we only need to build the Boost.Test library. It's much faster than building CMake which is already being done (on my machine, downloading and building Boost.Test takes 2 minutes, whereas building CMake takes 21 minutes).

The conditions for considering a test as aborted are purely empirical, so it may turn out in the future that there can be an aborted test for which some condition doesn't apply (at the moment I'm not aware of any such case). I just know that simply the presence of the `<Exception>` tag isn't enough to reliably detect an aborted test, because some `<Exception>`s are non-fatal (for example https://github.com/kaitai-io/ci_artifacts/blob/64438fbd11da816bcba374c23cd0e31c01fe4abe/test_out/cpp_stl_11/results.xml contains two `<TestCase>` tags with `<Exception>`, one `abort() has been called` is non-fatal and the other one `memory access violation occurred at address 0x00000010` is fatal). This is why we also check the presence of the `Test is aborted` message by `boost/test/impl/unit_test_log.ipp`, as it seems to always be printed from this source file exactly in the case of an aborted test (see https://www.boost.org/doc/libs/1_84_0/boost/test/impl/unit_test_log.ipp): ```cpp void unit_test_log_t::test_aborted() { BOOST_TEST_LOG_ENTRY( log_messages ) << "Test is aborted"; } ```

Although this generates numbered `valgrind-*.xml` files, at the moment only the last (successful, with no aborted tests) attempt ends up being copied to `valgrind.xml` and used by `aggregate/convert_to_json`. This can be improved later.

aggregate/boost_test_parser.rb

GreyCat · 2024-03-23T09:30:13Z

builder/cpp_builder.rb

@@ -304,33 +305,58 @@ def file_to_test(filename)
  end

  def run_tests
-    xml_log = "#{@abs_cpp_test_out_dir}/results.xml"
+    attempt = 1


This solution adds yet another trying loop with its own attempts counter, etc.

As alternative idea: will it bring any benefits we detect offending (aborted) test and then feed that information back to normal partial builder mechanism to exclude offending files out of the binary to get us the stable set of tests in one binary in the end?

On one hand:

this avoids looking into boost::test invocation flags which might be not compatible with boost versions you've mentioned

this requires test parsing to determine offending test, but you did that already

On the other hand, I see that such a feedback mechanism from run_tests does not exist now and inventing it on whole partial_builder.rb might be trickier...

This solution adds yet another trying loop with its own attempts counter, etc.

Yeah. I don't think you can eliminate it completely in principle, because you don't know in advance which tests will abort, and as soon one test aborts, no more tests are run (this is by design, Boost has no "continue running the suite despite an aborted test" option, that's not a thing). If you want them to run, you have no option but to run the test executable again without that offending test.

As alternative idea: will it bring any benefits we detect offending (aborted) test and then feed that information back to normal partial builder mechanism to exclude offending files out of the binary to get us the stable set of tests in one binary in the end?

I briefly considered this approach as well, but don't see any benefits. In my view, it would only be more complicated and less flexible. As I explain above, you have to run the test suite N + 1 times in any case (where N is the number of aborted tests), only now with the unnecessary extra step of repeating the linking the object files into the ks_tests executable.

this avoids looking into boost::test invocation flags which might be not compatible with boost versions you've mentioned

This is no longer a problem because since kaitai-io/kaitai_struct_docker_images@1ca846d the {clang3.4,gcc4.8}-linux-x86_64 targets build the latest Boost 1.84 from source (instead of using ancient the Boost 1.54), which works like a charm. None of Boost versions we use at the moment have a problem with --run_test=!<test_name> anymore (as evidenced by https://ci.kaitai.io/ - coincidentally, all cpp_stl_{98,11} targets currently have at least one aborted test NavParentRecursive, yet they all recovered successfully), so there's no need to avoid it.

An actual disadvantage of the approach you're suggesting is that with the "stable" test binary at the end, you won't be able to run the aborted tests via Valgrind after CppBuilder has finished. This means that the only information about an aborted will be the message from when the OS signal was caught by Boost, i.e.:

memory access violation at address: 0x00000008: no mapping at fault address

... which isn't really helpful to diagnose the error - there's no file name, no line number, no stack trace. Valgrind can give you all that, if you only let it run the aborted test (cpp_stl_11/gcc11-linux-x86_64 > test_out/cpp_stl_11/valgrind-1.xml:36-57):

<error> <unique>0x0</unique> <tid>1</tid> <kind>InvalidRead</kind> <what>Invalid read of size 8</what> <stack> <frame> <ip>0x54235A</ip> <obj>/tests/compiled/cpp_stl_11/bin/ks_tests</obj> <fn>debug_array_user_t::cat_t::_read()</fn> <dir>/tests/compiled/cpp_stl_11</dir> <file>debug_array_user.cpp</file> <line>37</line> </frame> <frame> <ip>0x542205</ip> <obj>/tests/compiled/cpp_stl_11/bin/ks_tests</obj> <fn>debug_array_user_t::_read()</fn> <dir>/tests/compiled/cpp_stl_11</dir> <file>debug_array_user.cpp</file> <line>20</line> </frame>

(These Valgrind logs of aborted tests are currently produced, but not processed by aggregate/convert_to_json - I left this as a future improvement.)

GreyCat · 2024-03-23T09:40:13Z

builder/cpp_builder.rb

+        '--log_level=all',
+        '--report_level=detailed',
+      ]
+      excluded_tests.each { |test| tests_cli << "--run_test=!#{test}" }


This one is also a bit dangerous on Windows: I recall that command line was limited to 32KiB. If we'll concatenate all test names we have now ~7KiB worth of such strings, but it potentially plants an unpleasant surprise for the future.

The only way to work that around as far as I remember is for executable to provide alternative means to source command line args (e.g. read them from file), but boost::test does not seem to support that.

In theory, this is a valid concern (I thought of that too, it just didn't feel that important to me) and not only on Windows, Unix-like OSes have a limit too, though probably higher than 32 KiB. In practice, I believe we are relatively far from reaching that limit (as you mentioned, even if every single test aborted, which already sounds extraordinary, we would still be well below that limit).

As you noted, by default make Boost.Test doesn't seem to support reading arguments from file. But I see two ways we could solve this problem:

Even if Boost.Test itself doesn't support reading CLI args from file, we could probably implement this functionality ourselves by customizing the module's entry point. The new entry point would simply read the arguments from a file and pass them to the actual main function provided by Boost.Test.

We could take inspiration from how xargs works, i.e. splitting the long list of arguments into multiple batches so each batch is below the maximum limit. We cannot reliably do this with the exclusion list, but we can with an inclusion list - so we would use the absolute specification, positively listing each test to run.

This approach has a number of downsides compared to approach 1 - the cpp_builder would have to keep track of the entire set of test names, find out the command line length limit (or use xargs to take care of that, although that introduces a dependency) and the number of tests that we can run at once (i.e. in one batch) would be limited by how many we can list in the command line, so the test executable would have to be run multiple times just to execute a single snapshot of the test set.

Approach 1 seems much better and it should be pretty easy to implement, probably just a few lines of code.

But I don't see it as a high priority right now because it's not currently causing a problem. My view of our testing system is that it must be only as good as it proves necessary, and no better, because it's an internal tool, not meant be used by end users. We don't need to support C++ compilers / Boost versions that we don't run, or situations that have never happened or cannot happen with the current state of the project.

You know that I'm a perfectionist and I'd rather see everything working in the general case, but when it comes to the testing system, even I realize that solving all potential problems "in advance" doesn't make any difference (it's better to focus our efforts to pretty much any other KS component), only solving real problems does.

builder/spec/cpp/cpp_builder_spec.rb

GreyCat · 2024-03-23T09:47:09Z

valgrind-cpp_stl

+		read_next=1
+		set -- "$@" "--run_test=!${line}"
+	done < "${CPP_TEST_OUT_DIR}/aborted_tests-${run_attempt}.txt"
+done


Again, one more argument towards re-doing the builds excluding offending file after run: we won't need to maintain such complex magic around valgrind runs.

We don't need to do that now either - as far as the contents of ci.json is concerned at the moment, only the log from the last Valgrind run (excluding any aborted tests) is copied to valgrind.xml and used by aggregate/convert_to_json. This is explained at the bottom of valgrind-cpp_stl (valgrind-cpp_stl:45-47):

# Currently, aggregate/convert_to_json only processes "valgrind.xml", so we # make a copy with that name of the last (successful) attempt's log cp -v "$CPP_TEST_OUT_DIR/valgrind-${run_attempt}.xml" "$CPP_TEST_OUT_DIR/valgrind.xml"

However, we'll likely eventually want to aggregate all valgrind-*.xml logs that this new version of valgrind-cpp_stl produces, because these Valgrind logs from running aborted tests contain all the details of the memory error causing the abort, as I mentioned above in #105 (comment).

So although the generated valgrind-*.xml logs are not used at the moment, they can be already helpful when manually inspecting the CI artifacts, and it doesn't hurt to have them available.

generalmimon force-pushed the cpp-recover-from-aborted branch from 26ebddd to 805c811 Compare March 19, 2024 12:30

generalmimon marked this pull request as ready for review March 19, 2024 13:44

generalmimon added 3 commits March 22, 2024 14:12

C++/STL: recover from aborted tests

dc9b6fc

generalmimon force-pushed the cpp-recover-from-aborted branch from 805c811 to 25b2182 Compare March 22, 2024 13:16

generalmimon merged commit 3b650da into master Mar 22, 2024

generalmimon deleted the cpp-recover-from-aborted branch March 22, 2024 13:18

GreyCat reviewed Mar 22, 2024

View reviewed changes

aggregate/boost_test_parser.rb Show resolved Hide resolved

GreyCat reviewed Mar 23, 2024

View reviewed changes

builder/spec/cpp/cpp_builder_spec.rb Show resolved Hide resolved

GreyCat reviewed Mar 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C++/STL: recover from aborted tests #105

C++/STL: recover from aborted tests #105

generalmimon commented Mar 18, 2024 •

edited

Loading

generalmimon commented Mar 19, 2024 •

edited

Loading

GreyCat Mar 23, 2024

generalmimon Mar 23, 2024 •

edited

Loading

GreyCat Mar 23, 2024

generalmimon Mar 23, 2024 •

edited

Loading

GreyCat Mar 23, 2024

generalmimon Mar 23, 2024 •

edited

Loading

C++/STL: recover from aborted tests #105

C++/STL: recover from aborted tests #105

Conversation

generalmimon commented Mar 18, 2024 • edited Loading

generalmimon commented Mar 19, 2024 • edited Loading

GreyCat Mar 23, 2024

Choose a reason for hiding this comment

generalmimon Mar 23, 2024 • edited Loading

Choose a reason for hiding this comment

GreyCat Mar 23, 2024

Choose a reason for hiding this comment

generalmimon Mar 23, 2024 • edited Loading

Choose a reason for hiding this comment

GreyCat Mar 23, 2024

Choose a reason for hiding this comment

generalmimon Mar 23, 2024 • edited Loading

Choose a reason for hiding this comment

generalmimon commented Mar 18, 2024 •

edited

Loading

generalmimon commented Mar 19, 2024 •

edited

Loading

generalmimon Mar 23, 2024 •

edited

Loading

generalmimon Mar 23, 2024 •

edited

Loading

generalmimon Mar 23, 2024 •

edited

Loading