What is the Vocabulary of Flaky Tests? An Extended Replication

Bruno Henrique Pachulski Camara ¹, ²,
Marco Aure ́lio Graciotto Silva ³,
Andre T. Endo ⁴,
Silvia Regina Vergilio ².

¹ Centro Universitário Integrado, Campo Mourão, PR, Brazil
² Department of Computer Science, Federal University of Parana ́, Curitiba, PR, Brazil
[email protected], [email protected]
³ Department of Computing, Federal University of Technology - Parana ́, Campo Mourão, PR, Brazil
[email protected]
⁴ Department of Computing, Federal University of Technology - Parana ́, Cornélio Procópio, PR, Brazil
[email protected]

This paper has been submitted for publication in ICPC 2021 - Replications and Negative Results (RENE).

This experimental package is organized by research questions. For each of the questions, some files can be executed to obtain the data that are presented in the paper.

Abstract

Software systems have been continuously evolved and delivered with high quality due to the widespread adoption of automated tests. A recurring issue hurting this scenario is the presence of flaky tests, a test case that may pass or fail non-deterministically. A promising, but yet lacking more empirical evidence, approach is to collect static data of automated tests and use them to predict their flakiness. In this paper, we conducted an empirical study to assess the use of code identifiers to predict test flakiness. To do so, we first replicate most parts of the previous study of Pintoetal.~~(MSR~~2020). This replication was extended by using a different ML Python platform (Scikit-learn) and adding different learning algorithms in the analyses. Then, we validated the performance of trained models using datasets with other flaky tests and from different projects. We successfully replicated the results of Pintoetal.~(2020), with minor differences using Scikit-learn; different algorithms had performance similar to the ones used previously. Concerning the validation, we noticed that the recall of the trained models was smaller, and classifiers presented a varying range of decreases. This was observed in both intra-project and inter-projects test flakiness prediction.

Keywords: test flakiness, regression testing, replication studies, machine learning

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
RQ 1		RQ 1
RQ 2		RQ 2
datasets		datasets
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is the Vocabulary of Flaky Tests? An Extended Replication

Abstract

About

Releases 2

Packages

Languages

License

bhpachulski/ICPC-RENE-Paper

Folders and files

Latest commit

History

Repository files navigation

What is the Vocabulary of Flaky Tests? An Extended Replication

Abstract

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages