Exploring Hyperparameter Usage and Tuning in Machine Learning Research

This repository provides additional material for our paper: "Exploring Hyperparameter Usage and Tuning in Machine Learning Research", including metadata for the research papers and the statistics of the analyzed code repositories.

API Crawler

We developed an API crawler for three popular and widely used ML libraries: scikit-learn, TensorFlow and PyTorch. The API crawler can be found here.

Code Repository Analysis

Note that we developed plugins for the each ML library, which apply static code-analysis as well as control- and data-flow analysis to locate API calls from the corresponding library and extract their configuration settings. The plugins are integrated into the CfgNet. Its implementation can be found here. Our analysis script relies on the CfgNet and assumes that it is run on our Slurm cluster if the hostname is tesla or starts with brown. You can find our evaluation script in analysis/.

You can start the analysis by running run.sh. It takes an optional parameter which is a Git tree-ish (e.g. main) that can be used to get a certain version of CfgNet.

For this analysis, it is required to use the ml branch of the CfgNet, because only this branch contains the ML library plugins and extracts the API calls.

The result files will be in results/. You can find the modified repositories in out/.

Data and Scripts

The data/ directory contains all the data used in this paper, while the src/ directory contains all scripts used to process the data.

data/dblp/ : contains the data crawled from the DBLP digital bibliography
data/library_data/: contain the API data of the ML libraries
data/paper_analysis/: contains the metadata for each paper
data/statistics/: contains the statistics for the analyzed code repositories
src/cross_validation/: contains the script to calculate the inter-annotator agreement
src/dblp_results/: contains the script the calculate the number of papers dealing with hyperparameter importance and tuning
src/library_stats/: contains the script the calculate the total number of API calls and parameter for each library
src/repos/: contains the script to identify suitable code repositories from the papers with code corpus
src/: contains the scripts to process the data extracted from the code repostories and respective research papers

Name		Name	Last commit message	Last commit date
Latest commit History 173 Commits
analysis		analysis
data		data
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploring Hyperparameter Usage and Tuning in Machine Learning Research

API Crawler

Code Repository Analysis

Data and Scripts

About

Releases 2

Packages

Languages

AI-4-SE/Exploring-Hyperparameter-Usage-And-Tuning-In-Machine-Learning-Research

Folders and files

Latest commit

History

Repository files navigation

Exploring Hyperparameter Usage and Tuning in Machine Learning Research

API Crawler

Code Repository Analysis

Data and Scripts

About

Resources

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages