Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Maven Surefire Plugin rerunning flaky tests #539

Open
fibbers opened this issue Dec 14, 2023 · 21 comments
Open

Support Maven Surefire Plugin rerunning flaky tests #539

fibbers opened this issue Dec 14, 2023 · 21 comments

Comments

@fibbers
Copy link

fibbers commented Dec 14, 2023

Hi,

First of all, I think this plugin works great and provides many GitHub reporting options. 👌

I'm currently in the process of running my tests in parallel to speedup my test run. Some of my tests now fail because of concurrency issues and that's why I also configured the Maven Surefire Plugin to rerun failed tests.

However, I notice a few things:

  1. my regular mvn test command ultimately succeeds (i.e. after rerunning some tests, all my tests pass):
    image

  2. however, publish-unit-test-result-action creates a check with a failed status (❌ Maven Test Results)

  3. the annotations show failed tests, since the first attempt they did indeed fail (however, succeeding attempts pass):
    image
    Note that my mvn test command completes with the following output (Flakes signifies the reruns):

    2023-12-14T17:23:48.1537837Z [WARNING] Tests run: 5587, Failures: 0, Errors: 0, Skipped: 116, Flakes: 15
    2023-12-14T17:23:48.1538689Z [INFO] 
    2023-12-14T17:23:48.1539320Z [INFO] ------------------------------------------------------------------------
    2023-12-14T17:23:48.1540050Z [INFO] BUILD SUCCESS
    2023-12-14T17:23:48.1540749Z [INFO] ------------------------------------------------------------------------
    2023-12-14T17:23:48.1541494Z [INFO] Total time:  21:03 min
    2023-12-14T17:23:48.1542240Z [INFO] Finished at: 2023-12-14T18:23:48+01:00
    2023-12-14T17:23:48.1543112Z [INFO] ------------------------------------------------------------------------
    
  4. the duration shown in the summary (57m) seems to be the sum of the parallel runs instead of the wall clock time


My simplified workflow:

- name: Run tests with Maven
  run: mvn --file pom.xml test
- name: Publish Test Results
  uses: EnricoMi/publish-unit-test-result-action@v2
  if: always()
  with:
    files: |
      example/target/surefire-reports/*.xml
    check_name: "Maven Test Results"
    report_individual_runs: "true"
    action_fail_on_inconclusive: "true"

Log output of the Publish Test Results step (sensitive parts removed):

2023-12-14 18:23:52 +0100 - publish -  INFO - Available memory to read files: 14.5 GiB
2023-12-14 18:23:52 +0100 - publish -  INFO - Reading files example/target/surefire-reports/*.xml (448 files, 40.5 MiB)
2023-12-14 18:23:53 +0100 - publish -  INFO - Detected 448 JUnit XML files (40.5 MiB)
2023-12-14 18:23:53 +0100 - publish -  INFO - Finished reading 448 files in 0.64 seconds
2023-12-14 18:23:54 +0100 - publish -  INFO - Publishing failure results for commit abcdef123
2023-12-14 18:23:55 +0100 - publish -  INFO - Created check https://github.com/example/runs/12649858973
2023-12-14 18:23:55 +0100 - publish -  INFO - Created job summary
2023-12-14 18:24:01 +0100 - publish -  INFO - Edited comment for pull request #...: https://github.com/example/pull/123#issuecomment-123

Example of annotations of 2 different tests where in both cases only the first attempt failed:
image

Just for completeness, but probably not necessary, relevant parts of pom.xml:

	<plugin>
		<groupId>org.apache.maven.plugins</groupId>
		<artifactId>maven-surefire-plugin</artifactId>
		<version>3.2.3</version>
		<configuration>
			<rerunFailingTestsCount>3</rerunFailingTestsCount>
		</configuration>
	</plugin>

Example of (part of) an xml report file that has a passing test after a rerun:

<testcase name="testSomething" classname="example.SomeTest" time="0.001">
    <flakyFailure message="[someAssertionHere] &#10;Expecting actual not to be empty" type="java.lang.AssertionError">
        <stackTrace><![CDATA[java.lang.AssertionError: 
[someAssertionHere] 
Expecting actual not to be empty
at ...
]]></stackTrace>
        <system-out><![CDATA[... 
]]></system-out>
        <system-err><![CDATA[... 
]]></system-err>
    </flakyFailure>
</testcase>

Is there a configuration parameter I missed that might create a succeeding Check (✅ Maven Test Results) and that causes the report to show only succeeding tests (optionally after retries)? Or is rerunning simply not supported?

The repository I'm working on is private, so unfortunately I can't give any direct links, but let me know if more information is needed.

@EnricoMi
Copy link
Owner

Thanks for the detailed description of your use case. There is no option available currently. Rather than changing the logic, I would specifically handle the "flaky" annotations in the test result files: https://maven.apache.org/surefire/maven-surefire-plugin/examples/rerun-failing-tests.html

Can you provide an example test result with one failure and success reruns, as well as one result file with all rerun failures?

@fibbers
Copy link
Author

fibbers commented Dec 15, 2023

Sure, for completeness, I made 4 tests fail/error by either throwing an exception or doing a false assertion:

  • throwsOnlyFirstAttempt()
  • throwsAlways()
  • failsAssertionOnlyFirstAttempt()
  • failsAssertionAlways()

Output of mvn test command with rerunFailingTestsCount=3:

[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] example.ExampleTests.failsAssertionAlways
[ERROR]   Run 1: ExampleTests.failsAssertionAlways:40 Always fails assertion ==> expected: <1> but was: <2>
[ERROR]   Run 2: ExampleTests.failsAssertionAlways:40 Always fails assertion ==> expected: <1> but was: <2>
[ERROR]   Run 3: ExampleTests.failsAssertionAlways:40 Always fails assertion ==> expected: <1> but was: <2>
[ERROR]   Run 4: ExampleTests.failsAssertionAlways:40 Always fails assertion ==> expected: <1> but was: <2>
[INFO]
[ERROR] Errors:
[ERROR] example.ExampleTests.throwsAlways
[ERROR]   Run 1: ExampleTests.throwsAlways:26 Runtime Always throws exception
[ERROR]   Run 2: ExampleTests.throwsAlways:26 Runtime Always throws exception
[ERROR]   Run 3: ExampleTests.throwsAlways:26 Runtime Always throws exception
[ERROR]   Run 4: ExampleTests.throwsAlways:26 Runtime Always throws exception
[INFO]
[WARNING] Flakes:
[WARNING] example.ExampleTests.failsAssertionOnlyFirstAttempt
[ERROR]   Run 1: ExampleTests.failsAssertionOnlyFirstAttempt:32 First attempt fails assertion ==> expected: <1> but was: <2>
[INFO]   Run 2: PASS
[INFO]
[WARNING] example.ExampleTests.throwsOnlyFirstAttempt
[ERROR]   Run 1: ExampleTests.throwsOnlyFirstAttempt:18 Runtime First attempt throws exception
[INFO]   Run 2: PASS
[INFO]
[INFO]
[ERROR] Tests run: 4, Failures: 1, Errors: 1, Skipped: 0, Flakes: 2
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  24.788 s
[INFO] Finished at: 2023-12-15T10:22:37+01:00
[INFO] ------------------------------------------------------------------------

TEST-example.ExampleTests.xml:

<?xml version="1.0" encoding="UTF-8"?>
<testsuite xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="https://maven.apache.org/surefire/maven-surefire-plugin/xsd/surefire-test-report-3.0.xsd" version="3.0" name="example.ExampleTests" time="0.002" tests="2" errors="1" skipped="0" failures="1">
  <testcase name="throwsAlways" classname="example.ExampleTests" time="0.01">
    <error message="Always throws exception" type="java.lang.RuntimeException"><![CDATA[java.lang.RuntimeException: Always throws exception
	at example.ExampleTests.throwsAlways(ExampleTests.java:26)
]]></error>
    <rerunError message="Always throws exception" type="java.lang.RuntimeException">
      <stackTrace><![CDATA[java.lang.RuntimeException: Always throws exception
	at example.ExampleTests.throwsAlways(ExampleTests.java:26)
]]></stackTrace>
    </rerunError>
    <rerunError message="Always throws exception" type="java.lang.RuntimeException">
      <stackTrace><![CDATA[java.lang.RuntimeException: Always throws exception
	at example.ExampleTests.throwsAlways(ExampleTests.java:26)
]]></stackTrace>
    </rerunError>
    <rerunError message="Always throws exception" type="java.lang.RuntimeException">
      <stackTrace><![CDATA[java.lang.RuntimeException: Always throws exception
	at example.ExampleTests.throwsAlways(ExampleTests.java:26)
]]></stackTrace>
    </rerunError>
  </testcase>
  <testcase name="failsAssertionAlways" classname="example.ExampleTests" time="0.003">
    <failure message="Always fails assertion ==&gt; expected: &lt;1&gt; but was: &lt;2&gt;" type="org.opentest4j.AssertionFailedError"><![CDATA[org.opentest4j.AssertionFailedError: Always fails assertion ==> expected: <1> but was: <2>
	at example.ExampleTests.failsAssertionAlways(ExampleTests.java:40)
]]></failure>
    <rerunFailure message="Always fails assertion ==&gt; expected: &lt;1&gt; but was: &lt;2&gt;" type="org.opentest4j.AssertionFailedError">
      <stackTrace><![CDATA[org.opentest4j.AssertionFailedError: Always fails assertion ==> expected: <1> but was: <2>
	at example.ExampleTests.failsAssertionAlways(ExampleTests.java:40)
]]></stackTrace>
    </rerunFailure>
    <rerunFailure message="Always fails assertion ==&gt; expected: &lt;1&gt; but was: &lt;2&gt;" type="org.opentest4j.AssertionFailedError">
      <stackTrace><![CDATA[org.opentest4j.AssertionFailedError: Always fails assertion ==> expected: <1> but was: <2>
	at example.ExampleTests.failsAssertionAlways(ExampleTests.java:40)
]]></stackTrace>
    </rerunFailure>
    <rerunFailure message="Always fails assertion ==&gt; expected: &lt;1&gt; but was: &lt;2&gt;" type="org.opentest4j.AssertionFailedError">
      <stackTrace><![CDATA[org.opentest4j.AssertionFailedError: Always fails assertion ==> expected: <1> but was: <2>
	at example.ExampleTests.failsAssertionAlways(ExampleTests.java:40)
]]></stackTrace>
    </rerunFailure>
  </testcase>
  <testcase name="failsAssertionOnlyFirstAttempt" classname="example.ExampleTests" time="0.0">
    <flakyFailure message="First attempt fails assertion ==&gt; expected: &lt;1&gt; but was: &lt;2&gt;" type="org.opentest4j.AssertionFailedError">
      <stackTrace><![CDATA[org.opentest4j.AssertionFailedError: First attempt fails assertion ==> expected: <1> but was: <2>
	at example.ExampleTests.failsAssertionOnlyFirstAttempt(ExampleTests.java:32)
]]></stackTrace>
    </flakyFailure>
  </testcase>
  <testcase name="throwsOnlyFirstAttempt" classname="example.ExampleTests" time="0.001">
    <flakyError message="First attempt throws exception" type="java.lang.RuntimeException">
      <stackTrace><![CDATA[java.lang.RuntimeException: First attempt throws exception
	at example.ExampleTests.throwsOnlyFirstAttempt(ExampleTests.java:18)
]]></stackTrace>
    </flakyError>
  </testcase>
</testsuite>

example.ExampleTests.txt:

-------------------------------------------------------------------------------
Test set: example.ExampleTests
-------------------------------------------------------------------------------
Tests run: 2, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 0.002 s <<< FAILURE! -- in example.ExampleTests
example.ExampleTests.failsAssertionAlways -- Time elapsed: 0.001 s <<< FAILURE!
org.opentest4j.AssertionFailedError: Always fails assertion ==> expected: <1> but was: <2>
	at example.ExampleTests.failsAssertionAlways(ExampleTests.java:40)

example.ExampleTests.throwsAlways -- Time elapsed: 0.001 s <<< ERROR!
java.lang.RuntimeException: Always throws exception
	at example.ExampleTests.throwsAlways(ExampleTests.java:26)

@EnricoMi EnricoMi changed the title Maven Surefire Plugin reruns tests and completes successfully, but the plugin marks the test results as failed Support Maven Surefire Plugin rerunning flaky tests Dec 15, 2023
@EnricoMi
Copy link
Owner

Thanks for the input. With the provided XML the action reports one test as error, one as failure and the last two as pass.

Can you add another two tests that fail / error twice and pass on third attempt?

@fibbers
Copy link
Author

fibbers commented Dec 15, 2023

Sure, see below for these 7 tests so far:

  • throwsOnlyFirstAttempt()
  • throwsFirstTwoAttempts() <-- added
  • throwsAlways()
  • failsAssertionOnlyFirstAttempt()
  • failsAssertionFirstTwoAttempts() <-- added
  • failsAssertionAlways()
  • alwaysPasses() <-- added, to be sure
Output of mvn test

[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] example.ExampleTests.failsAssertionAlways
[ERROR]   Run 1: ExampleTests.failsAssertionAlways:60 Always fails assertion ==> expected: <1> but was: <2>
[ERROR]   Run 2: ExampleTests.failsAssertionAlways:60 Always fails assertion ==> expected: <1> but was: <2>
[ERROR]   Run 3: ExampleTests.failsAssertionAlways:60 Always fails assertion ==> expected: <1> but was: <2>
[ERROR]   Run 4: ExampleTests.failsAssertionAlways:60 Always fails assertion ==> expected: <1> but was: <2>
[INFO]
[ERROR] Errors:
[ERROR] example.ExampleTests.throwsAlways
[ERROR]   Run 1: ExampleTests.throwsAlways:37 Runtime Always throws exception
[ERROR]   Run 2: ExampleTests.throwsAlways:37 Runtime Always throws exception
[ERROR]   Run 3: ExampleTests.throwsAlways:37 Runtime Always throws exception
[ERROR]   Run 4: ExampleTests.throwsAlways:37 Runtime Always throws exception
[INFO]
[WARNING] Flakes:
[WARNING] example.ExampleTests.failsAssertionFirstTwoAttempts
[ERROR]   Run 1: ExampleTests.failsAssertionFirstTwoAttempts:52 First two attempts fail assertion ==> expected: <1> but was: <2>
[ERROR]   Run 2: ExampleTests.failsAssertionFirstTwoAttempts:52 First two attempts fail assertion ==> expected: <1> but was: <2>
[INFO]   Run 3: PASS
[INFO]
[WARNING] example.ExampleTests.failsAssertionOnlyFirstAttempt
[ERROR]   Run 1: ExampleTests.failsAssertionOnlyFirstAttempt:43 First attempt fails assertion ==> expected: <1> but was: <2>
[INFO]   Run 2: PASS
[INFO]
[WARNING] example.ExampleTests.throwsFirstTwoAttempts
[ERROR]   Run 1: ExampleTests.throwsFirstTwoAttempts:29 Runtime First two attempts throw exception
[ERROR]   Run 2: ExampleTests.throwsFirstTwoAttempts:29 Runtime First two attempts throw exception
[INFO]   Run 3: PASS
[INFO]
[WARNING] example.ExampleTests.throwsOnlyFirstAttempt
[ERROR]   Run 1: ExampleTests.throwsOnlyFirstAttempt:20 Runtime First attempt throws exception
[INFO]   Run 2: PASS
[INFO]
[INFO]
[ERROR] Tests run: 7, Failures: 1, Errors: 1, Skipped: 0, Flakes: 4
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  23.284 s
[INFO] Finished at: 2023-12-15T14:58:05+01:00
[INFO] ------------------------------------------------------------------------

TEST-example.ExampleTests.xml

<?xml version="1.0" encoding="UTF-8"?>
<testsuite xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="https://maven.apache.org/surefire/maven-surefire-plugin/xsd/surefire-test-report-3.0.xsd" version="3.0" name="example.ExampleTests" time="0.001" tests="2" errors="1" skipped="0" failures="1">
  <testcase name="throwsAlways" classname="example.ExampleTests" time="0.01">
    <error message="Always throws exception" type="java.lang.RuntimeException"><![CDATA[java.lang.RuntimeException: Always throws exception
	at example.ExampleTests.throwsAlways(ExampleTests.java:37)
]]></error>
    <rerunError message="Always throws exception" type="java.lang.RuntimeException">
      <stackTrace><![CDATA[java.lang.RuntimeException: Always throws exception
	at example.ExampleTests.throwsAlways(ExampleTests.java:37)
]]></stackTrace>
    </rerunError>
    <rerunError message="Always throws exception" type="java.lang.RuntimeException">
      <stackTrace><![CDATA[java.lang.RuntimeException: Always throws exception
	at example.ExampleTests.throwsAlways(ExampleTests.java:37)
]]></stackTrace>
    </rerunError>
    <rerunError message="Always throws exception" type="java.lang.RuntimeException">
      <stackTrace><![CDATA[java.lang.RuntimeException: Always throws exception
	at example.ExampleTests.throwsAlways(ExampleTests.java:37)
]]></stackTrace>
    </rerunError>
  </testcase>
  <testcase name="throwsFirstTwoAttempts" classname="example.ExampleTests" time="0.0">
    <flakyError message="First two attempts throw exception" type="java.lang.RuntimeException">
      <stackTrace><![CDATA[java.lang.RuntimeException: First two attempts throw exception
	at example.ExampleTests.throwsFirstTwoAttempts(ExampleTests.java:29)
]]></stackTrace>
    </flakyError>
    <flakyError message="First two attempts throw exception" type="java.lang.RuntimeException">
      <stackTrace><![CDATA[java.lang.RuntimeException: First two attempts throw exception
	at example.ExampleTests.throwsFirstTwoAttempts(ExampleTests.java:29)
]]></stackTrace>
    </flakyError>
  </testcase>
  <testcase name="failsAssertionAlways" classname="example.ExampleTests" time="0.003">
    <failure message="Always fails assertion ==&gt; expected: &lt;1&gt; but was: &lt;2&gt;" type="org.opentest4j.AssertionFailedError"><![CDATA[org.opentest4j.AssertionFailedError: Always fails assertion ==> expected: <1> but was: <2>
	at example.ExampleTests.failsAssertionAlways(ExampleTests.java:60)
]]></failure>
    <rerunFailure message="Always fails assertion ==&gt; expected: &lt;1&gt; but was: &lt;2&gt;" type="org.opentest4j.AssertionFailedError">
      <stackTrace><![CDATA[org.opentest4j.AssertionFailedError: Always fails assertion ==> expected: <1> but was: <2>
	at example.ExampleTests.failsAssertionAlways(ExampleTests.java:60)
]]></stackTrace>
    </rerunFailure>
    <rerunFailure message="Always fails assertion ==&gt; expected: &lt;1&gt; but was: &lt;2&gt;" type="org.opentest4j.AssertionFailedError">
      <stackTrace><![CDATA[org.opentest4j.AssertionFailedError: Always fails assertion ==> expected: <1> but was: <2>
	at example.ExampleTests.failsAssertionAlways(ExampleTests.java:60)
]]></stackTrace>
    </rerunFailure>
    <rerunFailure message="Always fails assertion ==&gt; expected: &lt;1&gt; but was: &lt;2&gt;" type="org.opentest4j.AssertionFailedError">
      <stackTrace><![CDATA[org.opentest4j.AssertionFailedError: Always fails assertion ==> expected: <1> but was: <2>
	at example.ExampleTests.failsAssertionAlways(ExampleTests.java:60)
]]></stackTrace>
    </rerunFailure>
  </testcase>
  <testcase name="failsAssertionOnlyFirstAttempt" classname="example.ExampleTests" time="0.001">
    <flakyFailure message="First attempt fails assertion ==&gt; expected: &lt;1&gt; but was: &lt;2&gt;" type="org.opentest4j.AssertionFailedError">
      <stackTrace><![CDATA[org.opentest4j.AssertionFailedError: First attempt fails assertion ==> expected: <1> but was: <2>
	at example.ExampleTests.failsAssertionOnlyFirstAttempt(ExampleTests.java:43)
]]></stackTrace>
    </flakyFailure>
  </testcase>
  <testcase name="alwaysPasses" classname="example.ExampleTests" time="0.001"/>
  <testcase name="throwsOnlyFirstAttempt" classname="example.ExampleTests" time="0.001">
    <flakyError message="First attempt throws exception" type="java.lang.RuntimeException">
      <stackTrace><![CDATA[java.lang.RuntimeException: First attempt throws exception
	at example.ExampleTests.throwsOnlyFirstAttempt(ExampleTests.java:20)
]]></stackTrace>
    </flakyError>
  </testcase>
  <testcase name="failsAssertionFirstTwoAttempts" classname="example.ExampleTests" time="0.0">
    <flakyFailure message="First two attempts fail assertion ==&gt; expected: &lt;1&gt; but was: &lt;2&gt;" type="org.opentest4j.AssertionFailedError">
      <stackTrace><![CDATA[org.opentest4j.AssertionFailedError: First two attempts fail assertion ==> expected: <1> but was: <2>
	at example.ExampleTests.failsAssertionFirstTwoAttempts(ExampleTests.java:52)
]]></stackTrace>
    </flakyFailure>
    <flakyFailure message="First two attempts fail assertion ==&gt; expected: &lt;1&gt; but was: &lt;2&gt;" type="org.opentest4j.AssertionFailedError">
      <stackTrace><![CDATA[org.opentest4j.AssertionFailedError: First two attempts fail assertion ==> expected: <1> but was: <2>
	at example.ExampleTests.failsAssertionFirstTwoAttempts(ExampleTests.java:52)
]]></stackTrace>
    </flakyFailure>
  </testcase>
</testsuite>

example.ExampleTests.txt

(this shows Tests run: 2, not sure why though, and this file is probably insignificant)

-------------------------------------------------------------------------------
Test set: example.ExampleTests
-------------------------------------------------------------------------------
Tests run: 2, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 0.001 s <<< FAILURE! -- in example.ExampleTests
example.ExampleTests.failsAssertionAlways -- Time elapsed: 0.001 s <<< FAILURE!
org.opentest4j.AssertionFailedError: Always fails assertion ==> expected: <1> but was: <2>
	at example.ExampleTests.failsAssertionAlways(ExampleTests.java:60)

example.ExampleTests.throwsAlways -- Time elapsed: 0 s <<< ERROR!
java.lang.RuntimeException: Always throws exception
	at example.ExampleTests.throwsAlways(ExampleTests.java:37)

@EnricoMi
Copy link
Owner

EnricoMi commented Dec 15, 2023

The action only considers known tags like <failure> and <error>. Given they only exist when all attempts fail / error, the reporting behaviour is correct. The test pass if not all attempts fail / error. Do you agree this is the current behaviour?

The 1 out of 2 here is because you run your test suites twice, once the test fails all attempts, once it succeeds at least one attempt. The reruns are in the same file, but your report tests with two files.

@fibbers
Copy link
Author

fibbers commented Dec 18, 2023

I think you're right and I agree with the current behaviour.

After reading your comment, I've dug a bit deeper and found that I had not accurately reproduced the actual problem.

Some context: we run ~5000 tests spread over many classes and each class gets its own TEST-fqdnClassNameHere.xml report generated (i.e. the 2 TEST-..xml files in my report are from 2 different classes, I'm assuming each class is considered a test suite).

Though our mvn test command exited successfully and reported that it reran some tests, the resulting xml reports from GitHub Actions still contained <failure>/<error> tags (even without any <flaky*> sibling tags 🤔). This probably means my problem lies in generating the xml reports instead of analyzing and reporting the failed tests with publish-unit-test-result-action. This action indeed seems to behave correctly.

I'll focus my efforts now on how the xml reports are generated, it might have to be related to the fact we're running our tests in parallel.
But that's all unrelated to this action 🙂 so I'll close this issue.

Thanks for helping along and the quick responses.

@fibbers fibbers closed this as completed Dec 18, 2023
@EnricoMi
Copy link
Owner

Thanks for clarification and for raising this!

@EnricoMi
Copy link
Owner

Let's reopen this to track supporting the flaky annotation for eventual successful tests. That would be a great addition to the summary.

@EnricoMi EnricoMi reopened this Aug 18, 2024
@LorenzoBettini
Copy link

So, currently, the only way is to use "fail-on nothing* and rely on the failure of the build job, right?

@EnricoMi
Copy link
Owner

Depends on what you want. Do you want to fail on falkiness? Or do you want to fail on consistent test failure? This action fails on the latter (with default settings).

@LorenzoBettini
Copy link

We don't want it to fail on flakyness, but it does.. looks like it interprets first failed attempts of flaky tests as failures, and the corresponding job is marked as failed. We don't want that.

@LorenzoBettini
Copy link

@EnricoMi I've also tried with fail_on nothing, eclipse-xtext/xtext#3164 but the job for publishing test results is still marked with failure (due to flakes). Am I missing something?

@EnricoMi
Copy link
Owner

We don't want it to fail on flakyness, but it does.. looks like it interprets first failed attempts of flaky tests as failures

Can you point me to an example workflow run?

@HannesWell
Copy link

We don't want it to fail on flakyness, but it does.. looks like it interprets first failed attempts of flaky tests as failures

Can you point me to an example workflow run?

Many of the latest runs in https://github.com/eclipse/xtext/ are affected.
One specific example is https://github.com/eclipse/xtext/actions/runs/10538099413
where the test results are collected in https://github.com/eclipse/xtext/actions/runs/10538099413/job/29201077926 which leads to this check result:
https://github.com/eclipse/xtext/pull/3154/checks?check_run_id=29201077926

@EnricoMi
Copy link
Owner

In file org.eclipse.xtext.ui.tests/target/surefire-reports/TEST-org.eclipse.xtext.ui.tests.core.resource.Storage2UriMapperJdtImplTest.xml in artifact test-results-Linux-21 of workflow run https://github.com/eclipse/xtext/actions/runs/10538099413 the suite claims 11 tests and 1 failure, but there are only 10 <testcase>s. Looks like the last test is flaky and was rerun (which should not increment the tests counter).

Looking at 1, a <flakyFailure> indicates the first run failed, but the second succeeded. So this test case should not be considered a failure. If all runs of the test have failed, we would see a <failure> tag for the first run and a <rerunFailure> for all reruns.

The xml files are inconsistent with 1.

Footnotes

  1. https://maven.apache.org/surefire/maven-surefire-plugin/examples/rerun-failing-tests.html 2

@LorenzoBettini
Copy link

We use maven-surefire and for Eclipse tests (the flaky ones) we use tycho-surefire, which uses maven-surefire under the hood I think. So we don't do anything specific...

@HannesWell
Copy link

HannesWell commented Aug 26, 2024

In file org.eclipse.xtext.ui.tests/target/surefire-reports/TEST-org.eclipse.xtext.ui.tests.core.resource.Storage2UriMapperJdtImplTest.xml in artifact test-results-Linux-21 of workflow run https://github.com/eclipse/xtext/actions/runs/10538099413 the suite claims 11 tests and 1 failure, but there are only 10 <testcase>s. Looks like the last test is flaky and was rerun (which should not increment the tests counter).

Looking at 1, a <flakyFailure> indicates the first run failed, but the second succeeded. So this test case should not be considered a failure. If all runs of the test have failed, we would see a <failure> tag for the first run and a <rerunFailure> for all reruns.

Thank you for investigating this. And although it is correct that Xtext uses the tycho-surefire-plugin, that plugin 'only' calls the functionality of the maven-surefire-plugin from an Eclipse/OSGi runtime.

And indeed I can reproduce the described behavior with a very simple Maven project that only has the following test class:

package bar;
import static org.junit.Assert.assertEquals;
import java.util.concurrent.atomic.AtomicInteger;
import org.junit.Test;

public class FailOnFirstExecutionTest {
	private static final AtomicInteger EXECUTIONS = new AtomicInteger(0);
	@Test
	public void testName() throws Exception {
		assertEquals(2, EXECUTIONS.incrementAndGet());
	}
}

and a very simple pom.xml:

<project xmlns="http://maven.apache.org/POM/4.0.0"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>
	<groupId>foo</groupId>
	<artifactId>bar</artifactId>
	<version>0.0.1-SNAPSHOT</version>

	<properties>
		<maven.compiler.release>17</maven.compiler.release>
	</properties>

	<dependencies>
		<dependency>
			<groupId>junit</groupId>
			<artifactId>junit</artifactId>
			<version>4.13.2</version>
		</dependency>
	</dependencies>
	<build>
		<pluginManagement>
			<plugins>
				<plugin>
					<groupId>org.apache.maven.plugins</groupId>
					<artifactId>maven-surefire-plugin</artifactId>
					<version>3.4.0</version>
					<configuration>
						<rerunFailingTestsCount>3</rerunFailingTestsCount>
					</configuration>
				</plugin>
			</plugins>
		</pluginManagement>
	</build>
</project>

The resulting text-report.xml has the described content:

<?xml version="1.0" encoding="UTF-8"?>
<testsuite xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="https://maven.apache.org/surefire/maven-surefire-plugin/xsd/surefire-test-report.xsd" version="3.0.2" name="bar.FailOnFirstExecutionTest" time="0.042" tests="2" errors="0" skipped="0" failures="1">
  <properties>
...
  </properties>
  <testcase name="testName" classname="bar.FailOnFirstExecutionTest" time="0.0">
    <flakyFailure message="expected:&lt;2&gt; but was:&lt;1&gt;" type="java.lang.AssertionError">
      <stackTrace><![CDATA[java.lang.AssertionError: expected:<2> but was:<1>
	at org.junit.Assert.fail(Assert.java:89)
	at org.junit.Assert.failNotEquals(Assert.java:835)
	at org.junit.Assert.assertEquals(Assert.java:647)
	at org.junit.Assert.assertEquals(Assert.java:633)
	at bar.FailOnFirstExecutionTest.testName(FailOnFirstExecutionTest.java:14)
...
]]></stackTrace>
    </flakyFailure>
  </testcase>
</testsuite>

The xml files are inconsistent with 1.

I cannot se that the this page make any statements regarding the summary in the head-line. But I agree with you that the content is indeed surprising. I would also expect the tests attribute to count the number of actual tests and not the number of runs, while I would expect that failures/errors would only count eventual failures/errors.

I'll ask the Maven devs if this is intended. I don't see anything regarding flakes in JUnit so I assume it's a Maven-Surefire feature.

@michael-o
Copy link

@EnricoMi
Copy link
Owner

EnricoMi commented Aug 28, 2024

I think the <testsuite> counts all flaky failures / errors in addition to test case outcomes:

<testsuite xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
           xsi:noNamespaceSchemaLocation="https://maven.apache.org/surefire/maven-surefire-plugin/xsd/surefire-test-report.xsd"
           version="3.0"
           name="org.eclipse.xtext.ui.tests.core.resource.Storage2UriMapperJavaImplTest"
           time="8.454" tests="8" errors="0" skipped="1" failures="1">
  <properties>
    ...
  </properties>
  <testcase name="testOnRemoveTwoProjects" classname="org.eclipse.xtext.ui.tests.core.resource.Storage2UriMapperJavaImplTest" time="0.177"/>
  <testcase name="testBug574908_Performance" classname="org.eclipse.xtext.ui.tests.core.resource.Storage2UriMapperJavaImplTest" time="3.674"/>
  <testcase name="testBug574908_Performance_500" classname="org.eclipse.xtext.ui.tests.core.resource.Storage2UriMapperJavaImplTest" time="0.0">
    <skipped message="Can be enabled for performance test purposes"/>
  </testcase>
  <testcase name="testGetStorages_Performance" classname="org.eclipse.xtext.ui.tests.core.resource.Storage2UriMapperJavaImplTest" time="3.771"/>
  <testcase name="testOnClasspathChange" classname="org.eclipse.xtext.ui.tests.core.resource.Storage2UriMapperJavaImplTest" time="0.452"/>
  <testcase name="testBug574908" classname="org.eclipse.xtext.ui.tests.core.resource.Storage2UriMapperJavaImplTest" time="0.107"/>
  <testcase name="testOnCloseOpenRemoveProject" classname="org.eclipse.xtext.ui.tests.core.resource.Storage2UriMapperJavaImplTest" time="0.152">
    <flakyFailure message="{} expected:&lt;1&gt; but was:&lt;0&gt;" type="java.lang.AssertionError">
      <stackTrace><![CDATA[...]]></stackTrace>
    </flakyFailure>
  </testcase>
</testsuite>

We have 7 testcases, so you would expect <suite tests="7" ...>. Further, the failure is a flaky one, so the test case is considered successful. Still the suite counts a failure. Looks like all attempts / runs of a test and their outcomes are provided in the <suite>, which is not what you would expect. The test result report / this action has to ignore the <suite> and process the individual <testcase>s to distinguish successful from flaky tests.

I have a workaround to derive suite counts from test cases, which will make flaky tests count as successful tests and not over-count the number of tests.

Supporting flaky tests natively will require extending JunitParser. Then this action will mark flaky test as ❄️ instead of ✅.

@HannesWell
Copy link

https://issues.apache.org/jira/browse/SUREFIRE-1751

Thanks for the pointer! But from the report I'm not sure if it's just about processing a given test-report XML or if it's about the creation of that file. I have the impression it's about the former.

I think the <testsuite> counts all flaky failures / errors in addition to test case outcomes:

We have 7 testcases, so you would expect <suite tests="7" ...>. Further, the failure is a flaky one, so the test case is considered successful. Still the suite counts a failure. Looks like all attempts / runs of a test and their outcomes are provided in the <suite>, which is not what you would expect. The test result report / this action has to ignore the <suite> and process the individual <testcase>s to distinguish successful from flaky tests.

Yes I think that's exactly was happening.

I have a workaround to derive suite counts from test cases, which will make flaky tests count as successful tests and not over-count the number of tests.

Yes I wanted to suggest that as well, but first wanted to know if the current content is intended or not.
Actually the individual testcase elements just have to be inspected for real error or failure children, if the errors or failures attribute of the testsuite element is non-zero. If both are zero it means everything passed on the first run and the tests attribute value is exactly as expected and you could just rely on the summary.
Or is the entire XML parsed any way in any case?

Supporting flaky tests natively will require extending JunitParser. Then this action will mark flaky test as ❄️ instead of ✅.

That would be nice :) Thank you!

@michael-o
Copy link

https://issues.apache.org/jira/browse/SUREFIRE-1751

Thanks for the pointer! But from the report I'm not sure if it's just about processing a given test-report XML or if it's about the creation of that file. I have the impression it's about the former.

Just wanted to share that this issue exists already in some form.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants