Name		Name	Last commit message	Last commit date
parent directory ..
clang-tidy		clang-tidy
codechecker		codechecker
cppcheck		cppcheck
recurrence		recurrence
rosecheckers		rosecheckers
scan-build		scan-build
test		test
zeek5		zeek5
EXP34-C.csv		EXP34-C.csv
MSC12-C.csv		MSC12-C.csv
README.dataset.md		README.dataset.md
README.md		README.md
accolade.csv		accolade.csv
accolade.zeek4.csv		accolade.zeek4.csv
all_alerts.csv		all_alerts.csv
clang-and-sa		clang-and-sa
compile_commands.dos2unix.json		compile_commands.dos2unix.json
compile_commands.git.json		compile_commands.git.json
compile_commands.zeek.json		compile_commands.zeek.json
join_pivot.sql		join_pivot.sql

README.md

Automated Code Repair Data

'Redemption' Automated Code Repair Tool Copyright 2023, 2024 Carnegie Mellon University. NO WARRANTY. THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE MATERIAL IS FURNISHED ON AN 'AS-IS' BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR RESULTS OBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY DOES NOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT INFRINGEMENT. Licensed under a MIT (SEI)-style license, please see License.txt or contact [email protected] for full terms. [DISTRIBUTION STATEMENT A] This material has been approved for public release and unlimited distribution. Please see Copyright notice for non-US Government use and distribution. This Software includes and/or makes use of Third-Party Software each subject to its own license. DM23-2165

This is the data used by this project. Much of this data is referenced in our paper "Using Automated Code Repair to Fight Back the Deluge of False Positives". See the README.dataset.md file for details about that data.

About the Dataset

The dataset will be a Zip file with the following directory contents:

data.publication README.md: use data/README.dataset.md Dockerfile.rosecheckers: for conveniently constructing a container to use with codebases and tools (NOTE: Do not use clang-tidy from this container, it is an older version than what we used.) Dockerfile.redemption: for conveniently contstructing a container to use with clang-tidy version 15 codebases.yml data: redemption/data directory minus some files noted below paper/oss_frequency.csv: table redemption/paper/oss_frequency.csv paper/tables: redemption/paper/tables directory minus some redemption/paper files noted below code/analysis: from the redemption/code/analysis directory, only include the following files: cert_rules.2016.tsv, checkers.csv, my-gcc.sh, my-g++.sh, {clang_tidy,cppcheck,rosecheckers}2tsv.py LICENSE.txt: license redemption/License.dataset.txt ABOUT: per-file markings redemption/ABOUT.dataset

Other Files

These are files and directories not included in the published dataset.

`accolade.zeek4.csv`

The accolade.zeek4 file contains info relevant to Zeek v4, which came from our collaborator; it is CUI-derived data, so it should not be published. It contains CUI-derived data. Writeup said the following:

header: "CERT Guidelines Ranked by Effort Worthiness for This Project"

The raw table lives in accolade.csv. The tables in the paper contain this data reformatted to fit the page.

This table was generated manually based on the "Excerpt of Per-CERT-Rule Alert Counts and Related Data for Tools and Codebases Used}" table. Each rule that had a non-empty rank column was added to this table. This table also does coalesce information about zeek4 from our collaborator's data (which is CUI and not provided).

`zeek4` directory

Data related to zeek4. CUI...comes from Brandon.

`zeek5` directory

Data related to zeek5. CUI...comes from Brandon.

`scan-build` directory

Data related to the scan-build SA tool. Format similar to clang-tidy, rosecheckers, cppcheck. Scan-build turned out to be less useful than clang-tidy.

`test` directory

Data and scripts related to testing our ACR tool on git and zeek. Not useful for our paper.

`join_pivot.sql`

I?IRC I used this script to join some of the pivot tables when creating all_alerts.csv.

Latex and IEEE -specific files

We exclude Latex, IEEE, and figure files that were used for the paper from the dataset release. From the redemption/paper directory, the dataset excludes files accolade.org, IEEEtran.bst, IEEEtran.cls, makefile, mathmode-spacing.tex, paper.md, paper.tex, refs.bib, plus it excludes all files from the redemption/figs directory.

Delete these files from the dataset for publication

Since the one README file needed is in the top-level directory of the publication dataset, you should make sure these files are deleted from the publication dataset: data/README.md and data/README.dataset.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

README.md

Automated Code Repair Data

About the Dataset

Other Files

`accolade.zeek4.csv`

`zeek4` directory

`zeek5` directory

`scan-build` directory

`test` directory

`join_pivot.sql`

Latex and IEEE -specific files

Delete these files from the dataset for publication

Files

data

Directory actions

More options

Directory actions

More options

Latest commit

History

data

Folders and files

parent directory

README.md

Automated Code Repair Data

About the Dataset

Other Files

accolade.zeek4.csv

zeek4 directory

zeek5 directory

scan-build directory

test directory

join_pivot.sql

Latex and IEEE -specific files

Delete these files from the dataset for publication

`accolade.zeek4.csv`

`zeek4` directory

`zeek5` directory

`scan-build` directory

`test` directory

`join_pivot.sql`