Frugal algorithm selection is an active learning approach that attempts to reduce the labelling cost by using only a subset of the training data with timeout predictor and dynamic timeout configurations.
Below is an overview of the main folders in this repository and their contents:
/DATASETS
: Contains all datasets that were used in the research./EXPERIMENT_LOGS
: Includes Slurm output files of each implemented approach under various configurations./EXPERIMENT_OUTPUTS
: Includes output files of each implemented approach under various configurations./PLOTS
: Holds all plots that are included in the published paper and its appendix.Appendix.pdf
: This file is supplementary material accompanying the paper. It provides additional data, analyses, and explanations that support the main text of the research.
Ensure you have Python 3.12 or higher installed on your machine. The dependencies are listed below and can also be found in the requirements.txt file:
- liac-arff==2.5.0
- matplotlib==3.8.0
- modAL-python==0.4.2.1
- numpy==1.26.1
- pandas==2.1.1
- scikit-learn==1.3.1
To install the required packages with the specified versions, run the following command:
pip install -r requirements.txt
This ensures you have the exact environment used in our studies, promoting consistency and reproducibility.
Timeout Predictor On/Off: It can be activated/deactivated with arameter used in the voting mechanism and instance selection
Adjustable Timeout: The dynamic timeout increases during runtime based on performance on the validation set and can be adjusted through parameters to suit different experimental setups.
Customizable Ratio: Adjust the query size ratio through parameters to optimize the querying process in learning algorithms.
Uncertainty-based and Random: Supports different methods of instance selection including uncertainty-based (leveraging modAL) and random approaches.
To run an uncertainty-based instance selection, which includes passive learning approaches, use the following command:
python Algorithm_Selection.py --dataset {dataset} --prefix {prefix} --weight {Weighted, No_Weight} --timeout_predictor_usage {Yes, No} --query_size {query_size} --seed {seed} --split {split}
For running the random instance selection with the same configurable parameters:
python Algorithm_Selection_Random_Query.py --dataset {dataset} --prefix {prefix} --weight {Weighted, No_Weight} --timeout_predictor_usage {Yes, No} --query_size {query_size} --seed {seed} --split {split}
Further parameter details and their descriptions can be found within the code comments or the supplementary documentation provided in the repository.