GitHub - rocket-3/cam: Classes and Metriсs (CaM): a dataset of Java classes from public open-source GitHub repositories

This is a dataset of open source Java classes and some metrics on them. Every now and then I make a new version of it using the scripts in this repository. You are welcome to use it in your researches. Each release has a fixed version. By referring to it in your research you avoid ambiguity and guarantees repeatability of your experiments.

The latest ZIP archive with the dataset is here: cam-2022-02-17.zip (532Mb). It is the result of the analysis of Java classes in 1000 GitHub repositories against 15 metrics: lines of code (reported by cloc), lines of comments, blank lines, NCSS, cyclomatic complexity, number of attributes, number of static attributes, number of constructors, number of methods, number of static methods, total cognitive complexity (reported by PMD), maximum cognitive complexity, minimum cognitive complexity, average cognitive complexity, number of committers.

Previous archives:

cam-2021-08-04.zip (692Mb): 1000 repos, 15 metrics
cam-2021-07-08.zip (387Mb): 1000 repos, 11 metrics

If you want to create a new dataset, just run this and the entire dataset will be built (you need to have Docker installed), where 1000 is the number of repositories to fetch from GitHub and XXX is your personal access token:

$ docker build --tag=cam .
$ docker run -d --rm -v "$(pwd):/w" -w /w \
  -e "TOKEN=XXX" -e "TOTAL=1000" -e "TARGET=/w/dataset" \
  cam "make -e"

The dataset will be created in the ./dataset directory (may take some time, maybe a few days!), and a .zip archive will also be there. Docker container will run in background: you can safely close the console and come back when the dataset is ready and the container is deleted.

You can also run it without Docker:

$ make TOTAL=100

Should work, if you have all dependencies installed, as suggested in the Dockerfile.

Name		Name	Last commit message	Last commit date
Latest commit History 160 Commits
.github/workflows		.github/workflows
filters		filters
metrics		metrics
tex		tex
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE.txt		LICENSE.txt
Makefile		Makefile
README.md		README.md
action.yml		action.yml
discover-repos.rb		discover-repos.rb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

License

rocket-3/cam

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages