Skip to content

rajatsen91/mimic_classify

Repository files navigation

mimic_classify

Mimic & Classify CI Test

This is an implementation of the paper: https://arxiv.org/abs/1806.09708 Also uses the code base for the paper: https://arxiv.org/abs/1709.06138

Dependency:

  1. pytorch with cuda support is you have a gpu (follow the instructions on their website)
  2. scikit-learn
  3. CCIT (mentioned above)
  4. pandas
  5. numpy

Please cite the above papers if this package is used in any publication.

There are two CI Testers one using CGAN as a mimic function and the other using a regression based MIMIC function. The parameters to be specified are as follows:

Base Class for CI Testing. All the parameters may not be used for GAN/Regression testing

X,Y,Z: Arrays for input random variables

max_depths: max_depth parameter choices for xgboost e.g [6,10,13]

n_estimators: n_estimator parameter choices for xgboost e.g [100,200,300]

colsample_bytrees: colsample_bytree parameter choices for xgboost e.g [100,200,300]

nfold: cross validation number of folds

train_samp: percentage of samples to be used for training e.g -1 for default (recommended)

nthread: number of parallel threads for xgboost, recommended as number of processors in the machine

max_epoch: number of epochs when mimi function is GAN

bsize: batch size when mimic function is GAN or when using a deep regressor for mimifyREG

dim_N: dimension of noise when GAN, if None then set to dim_z + 1, can be set to a moderate value like 20

noise: Type of noise for regression mimic function 'Laplace' or 'Normal' or 'Mixture'

perc: percentage of mixture Normal for noise type 'Mixture'

normalized: Normalize data-set or not. Recommended setting is True for MIMIFY_REG and anything is good for GAN.

deep: bool argument for mimifyREG. If true it uses a deep network for regression otherwise it uses xgb.

deep_classifier: if the classifier used is a deep model or xgboost. If deep model then supply this argument True.

params: parameters for deep classifier. Example: {'nhid':20,'nlayers':5,'dropout':0.2} means 5 layers each with 20 neurons and train dropout of 0.2.

For regular use we recommend deep = False and deep_classifier=False. These options are still being prototyped.

The usage for both the files on synthetic data-sets can be seen in the ipython notebook named examples. The file run_mimify_reg.py gives command-line functionality to run mimify_reg from a structured folder. One such folder with datafiles in .npy format has been provided with the repository. An exampel to run this command line argument is provided in example.sh. For mimifyGAN the same functionalities are provided as run_mimify_GAN.py.

The default setting has use_cuda = False in all relevant files, which means that no GPU speed-up is used. If you have pytorch with CUDA support then you need to set use_cuda = True . For this go to src folder and run the following:

python change_use_cuda.py -dr 0

In order to change back to use use_cuda = False again go to src directory and run the following:

python change_use_cuda.py -dr 1

The file datagen.py in the /src folder has functions to generate the synthetic data-sets used in the paper.

About

Mimic & Classify CI Test

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published