-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++/STL: recover from aborted tests #105
Conversation
26ebddd
to
805c811
Compare
The recovery works as expected on target
As you can see, as soon as the
It seems that this is because the I see two ways to solve this problem:
I'd prefer option 2, but of course that will be a little more complicated than just |
See kaitai-io/kaitai_struct_tests#105 (comment) - Boost 1.54 is too old for our purposes. Fortunately, it's quite easy to build Boost from source (according to the documentation at https://www.boost.org/doc/libs/1_84_0/more/getting_started/unix-variants.html) and it doesn't take long if we only need to build the Boost.Test library. It's much faster than building CMake which is already being done (on my machine, downloading and building Boost.Test takes 2 minutes, whereas building CMake takes 21 minutes).
The conditions for considering a test as aborted are purely empirical, so it may turn out in the future that there can be an aborted test for which some condition doesn't apply (at the moment I'm not aware of any such case). I just know that simply the presence of the `<Exception>` tag isn't enough to reliably detect an aborted test, because some `<Exception>`s are non-fatal (for example https://github.com/kaitai-io/ci_artifacts/blob/64438fbd11da816bcba374c23cd0e31c01fe4abe/test_out/cpp_stl_11/results.xml contains two `<TestCase>` tags with `<Exception>`, one `abort() has been called` is non-fatal and the other one `memory access violation occurred at address 0x00000010` is fatal). This is why we also check the presence of the `Test is aborted` message by `boost/test/impl/unit_test_log.ipp`, as it seems to always be printed from this source file exactly in the case of an aborted test (see https://www.boost.org/doc/libs/1_84_0/boost/test/impl/unit_test_log.ipp): ```cpp void unit_test_log_t::test_aborted() { BOOST_TEST_LOG_ENTRY( log_messages ) << "Test is aborted"; } ```
Although this generates numbered `valgrind-*.xml` files, at the moment only the last (successful, with no aborted tests) attempt ends up being copied to `valgrind.xml` and used by `aggregate/convert_to_json`. This can be improved later.
805c811
to
25b2182
Compare
@@ -304,33 +305,58 @@ def file_to_test(filename) | |||
end | |||
|
|||
def run_tests | |||
xml_log = "#{@abs_cpp_test_out_dir}/results.xml" | |||
attempt = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This solution adds yet another trying loop with its own attempts counter, etc.
As alternative idea: will it bring any benefits we detect offending (aborted) test and then feed that information back to normal partial builder mechanism to exclude offending files out of the binary to get us the stable set of tests in one binary in the end?
On one hand:
- this avoids looking into boost::test invocation flags which might be not compatible with boost versions you've mentioned
- this requires test parsing to determine offending test, but you did that already
On the other hand, I see that such a feedback mechanism from run_tests
does not exist now and inventing it on whole partial_builder.rb
might be trickier...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This solution adds yet another trying loop with its own attempts counter, etc.
Yeah. I don't think you can eliminate it completely in principle, because you don't know in advance which tests will abort, and as soon one test aborts, no more tests are run (this is by design, Boost has no "continue running the suite despite an aborted test" option, that's not a thing). If you want them to run, you have no option but to run the test executable again without that offending test.
As alternative idea: will it bring any benefits we detect offending (aborted) test and then feed that information back to normal partial builder mechanism to exclude offending files out of the binary to get us the stable set of tests in one binary in the end?
I briefly considered this approach as well, but don't see any benefits. In my view, it would only be more complicated and less flexible. As I explain above, you have to run the test suite N + 1
times in any case (where N
is the number of aborted tests), only now with the unnecessary extra step of repeating the linking the object files into the ks_tests
executable.
- this avoids looking into boost::test invocation flags which might be not compatible with boost versions you've mentioned
This is no longer a problem because since kaitai-io/kaitai_struct_docker_images@1ca846d the {clang3.4,gcc4.8}-linux-x86_64
targets build the latest Boost 1.84 from source (instead of using ancient the Boost 1.54), which works like a charm. None of Boost versions we use at the moment have a problem with --run_test=!<test_name>
anymore (as evidenced by https://ci.kaitai.io/ - coincidentally, all cpp_stl_{98,11}
targets currently have at least one aborted test NavParentRecursive
, yet they all recovered successfully), so there's no need to avoid it.
An actual disadvantage of the approach you're suggesting is that with the "stable" test binary at the end, you won't be able to run the aborted tests via Valgrind after CppBuilder
has finished. This means that the only information about an aborted will be the message from when the OS signal was caught by Boost, i.e.:
memory access violation at address: 0x00000008: no mapping at fault address
... which isn't really helpful to diagnose the error - there's no file name, no line number, no stack trace. Valgrind can give you all that, if you only let it run the aborted test (cpp_stl_11/gcc11-linux-x86_64
> test_out/cpp_stl_11/valgrind-1.xml:36-57):
<error>
<unique>0x0</unique>
<tid>1</tid>
<kind>InvalidRead</kind>
<what>Invalid read of size 8</what>
<stack>
<frame>
<ip>0x54235A</ip>
<obj>/tests/compiled/cpp_stl_11/bin/ks_tests</obj>
<fn>debug_array_user_t::cat_t::_read()</fn>
<dir>/tests/compiled/cpp_stl_11</dir>
<file>debug_array_user.cpp</file>
<line>37</line>
</frame>
<frame>
<ip>0x542205</ip>
<obj>/tests/compiled/cpp_stl_11/bin/ks_tests</obj>
<fn>debug_array_user_t::_read()</fn>
<dir>/tests/compiled/cpp_stl_11</dir>
<file>debug_array_user.cpp</file>
<line>20</line>
</frame>
(These Valgrind logs of aborted tests are currently produced, but not processed by aggregate/convert_to_json
- I left this as a future improvement.)
'--log_level=all', | ||
'--report_level=detailed', | ||
] | ||
excluded_tests.each { |test| tests_cli << "--run_test=!#{test}" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is also a bit dangerous on Windows: I recall that command line was limited to 32KiB. If we'll concatenate all test names we have now ~7KiB worth of such strings, but it potentially plants an unpleasant surprise for the future.
The only way to work that around as far as I remember is for executable to provide alternative means to source command line args (e.g. read them from file), but boost::test does not seem to support that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, this is a valid concern (I thought of that too, it just didn't feel that important to me) and not only on Windows, Unix-like OSes have a limit too, though probably higher than 32 KiB. In practice, I believe we are relatively far from reaching that limit (as you mentioned, even if every single test aborted, which already sounds extraordinary, we would still be well below that limit).
As you noted, by default make Boost.Test doesn't seem to support reading arguments from file. But I see two ways we could solve this problem:
-
Even if Boost.Test itself doesn't support reading CLI args from file, we could probably implement this functionality ourselves by customizing the module's entry point. The new entry point would simply read the arguments from a file and pass them to the actual
main
function provided by Boost.Test. -
We could take inspiration from how
xargs
works, i.e. splitting the long list of arguments into multiple batches so each batch is below the maximum limit. We cannot reliably do this with the exclusion list, but we can with an inclusion list - so we would use the absolute specification, positively listing each test to run.This approach has a number of downsides compared to approach 1 - the
cpp_builder
would have to keep track of the entire set of test names, find out the command line length limit (or usexargs
to take care of that, although that introduces a dependency) and the number of tests that we can run at once (i.e. in one batch) would be limited by how many we can list in the command line, so the test executable would have to be run multiple times just to execute a single snapshot of the test set.
Approach 1 seems much better and it should be pretty easy to implement, probably just a few lines of code.
But I don't see it as a high priority right now because it's not currently causing a problem. My view of our testing system is that it must be only as good as it proves necessary, and no better, because it's an internal tool, not meant be used by end users. We don't need to support C++ compilers / Boost versions that we don't run, or situations that have never happened or cannot happen with the current state of the project.
You know that I'm a perfectionist and I'd rather see everything working in the general case, but when it comes to the testing system, even I realize that solving all potential problems "in advance" doesn't make any difference (it's better to focus our efforts to pretty much any other KS component), only solving real problems does.
read_next=1 | ||
set -- "$@" "--run_test=!${line}" | ||
done < "${CPP_TEST_OUT_DIR}/aborted_tests-${run_attempt}.txt" | ||
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, one more argument towards re-doing the builds excluding offending file after run: we won't need to maintain such complex magic around valgrind runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to do that now either - as far as the contents of ci.json
is concerned at the moment, only the log from the last Valgrind run (excluding any aborted tests) is copied to valgrind.xml
and used by aggregate/convert_to_json
. This is explained at the bottom of valgrind-cpp_stl
(valgrind-cpp_stl:45-47
):
# Currently, aggregate/convert_to_json only processes "valgrind.xml", so we
# make a copy with that name of the last (successful) attempt's log
cp -v "$CPP_TEST_OUT_DIR/valgrind-${run_attempt}.xml" "$CPP_TEST_OUT_DIR/valgrind.xml"
However, we'll likely eventually want to aggregate all valgrind-*.xml
logs that this new version of valgrind-cpp_stl
produces, because these Valgrind logs from running aborted tests contain all the details of the memory error causing the abort, as I mentioned above in #105 (comment).
So although the generated valgrind-*.xml
logs are not used at the moment, they can be already helpful when manually inspecting the CI artifacts, and it doesn't hurt to have them available.
Currently, when a single C++ test aborts at runtime (typically due to a
memory access violation
), the entire test suite is terminated. See Classtest_observer
>test_aborted()
:This PR allows the CI to recover from that by detecting the aborted test and running the test executable again excluding the previously aborted tests (using the
--run_test=!<absolute_spec>
option, see Test unit filtering > Relative specification).