ERASE: Benchmarking Feature Selection Methods for Deep Recommender Systems

In this repo, our scripts can be divided to two parts: dataset preprocess and run fs.

You can also download the preprocessed dataset from Huggingface ERASE_Dataset

Please note that you need to run the following script from the root directory of the project.

package requirment

torch
pandas
numpy
nni

File Structure

- checkpoints
- checkpoints_for_retrain
- data
    - avazu
        - preprocessed_avazu.csv # your data should put here
    - criteo
        - preprocessed_criteo.csv # your data should put here
    - movielens-1m
    - aliccp
    - preprocess.py # preprocess script
- nni
    - search spaces
        - fs
            - specific-method.json # the hyperparameter search space for each methods in fs
        config.json # some hyperparameters related to general training, e.g., number of selected fields, learning rate
- notebooks # some test notebooks
- utils
    - datasets.py # read datasets
    - fs_trainer.py # trainer for feature selection
    - utils # some functions
- fs_run.py # main script to run feature selection
- nni_tune.py # run the nni tune
- requirements.text # python libraries needed for this repository

Dataset Preprocess

python data/preprocess.py --dataset=[avazu/criteo] --data_path=[default is data/]

Run FS & ES

Parameters in run.py

dataset: (avazu/criteo)
model: backbone model (mlp)
fs: feature selection method (no_selecion/autofield/adafs/optfs/gbdt/lasso/gbr/pca)
seed: random seed (specific number or 0(random))
device: cuda or cpu
data_path: your data path (default is data/)
batch_size
dataset_shuffle: (True or False)
embedding_dim: embedding size (default is 8)
train_or_search: need train_or_search (True/False)
retrain: need retrain (True/False)
k: number of selected fields (specific number)
learning_rate
epoch: training epoch (default 100)
patience: patience of earlystopper (default 3)
num_workers: num_workers in dataloader (default 32)
nni: whether use nni to tune hyperparameters (default False)
rank_path: if only want retrain, please specify the path of feature rank file
read_feature_rank: whether to use pre-saved feature rank

Feature Selection

python fs_run.py --model=[model_name] --fs=[feature_selection_method] --train_or_search=True --retrain=True

More experimental results

Overall experimental results of feature selection for deep recommender systems.
Experimental results on more backbone models with different number of selected features on Avazu.

Experimental results on more backbone models with different number of selected features on Criteo.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
models		models
nni/search_spaces		nni/search_spaces
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
fs_run.py		fs_run.py
nni_tune.py		nni_tune.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ERASE: Benchmarking Feature Selection Methods for Deep Recommender Systems

package requirment

File Structure

Dataset Preprocess

Run FS & ES

Parameters in run.py

Feature Selection

More experimental results

About

Releases

Packages

Languages

License

Applied-Machine-Learning-Lab/ERASE

Folders and files

Latest commit

History

Repository files navigation

ERASE: Benchmarking Feature Selection Methods for Deep Recommender Systems

package requirment

File Structure

Dataset Preprocess

Run FS & ES

Parameters in run.py

Feature Selection

More experimental results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages