Skip to content

Latest commit

 

History

History
100 lines (84 loc) · 4.96 KB

File metadata and controls

100 lines (84 loc) · 4.96 KB

Wide and Deep Large Dataset training

This document has instructions for training Wide and Deep using a large dataset using Intel-optimized TensorFlow.

Dataset

The Large Kaggle Display Advertising Challenge Dataset will be used for training Wide and Deep. The data is from Criteo and has a field indicating if an ad was clicked (1) or not (0), along with integer and categorical features.

Download the Large Kaggle Display Advertising Challenge Dataset from Criteo Labs in $DATASET_DIR. If the evaluation/train dataset were not available in the above link, it can be downloaded as follow:

 export DATASET_DIR=<location where dataset files will be saved>
 mkdir $DATASET_DIR && cd $DATASET_DIR
 wget https://storage.googleapis.com/dataset-uploader/criteo-kaggle/large_version/eval.csv
 wget https://storage.googleapis.com/dataset-uploader/criteo-kaggle/large_version/train.csv

The DATASET_DIR environment variable will be used as the dataset directory when running quickstart scripts.

Quick Start Scripts

Script name Description
training_check_accuracy.sh Trains the model for a specified number of steps (default is 500) and then compare the accuracy against the accuracy specified in the TARGET_ACCURACY env var (ex: export TARGET_ACCURACY=0.75). If the accuracy is not met, then script exits with error code 1. The CHECKPOINT_DIR environment variable can optionally be defined to start training based on previous set of checkpoints.
training.sh Trains the model for 10 epochs. The CHECKPOINT_DIR environment variable can optionally be defined to start training based on previous set of checkpoints.
training_demo.sh A short demo run that trains the model for 100 steps.

Run the model

Setup your environment using the instructions below, depending on if you are using AI Kit:

Setup using AI Kit Setup without AI Kit

To run using AI Kit you will need:

  • numactl
  • wget
  • Activate the `tensorflow` conda environment
    conda activate tensorflow

To run without AI Kit you will need:

  • Python 3
  • intel-tensorflow>=2.5.0
  • numactl
  • git
  • wget
  • A clone of the Model Zoo repo
    git clone https://github.com/IntelAI/models.git

After the setup is complete, set environment variables for the path to your dataset directory and an output directory where logs will be written. You can optionally provide a directory where checkpoint files will be read and written. Navigate to your model zoo directory, then select a quickstart script to run. Note that some quickstart scripts might use other environment variables in addition to the ones below, like STEPS and TARGET_ACCURACY for the fp32_training_check_accuracy.sh script.

# cd to your model zoo directory
cd models

export DATASET_DIR=<path to the dataset directory>
export PRECISION=fp32
export OUTPUT_DIR=<path to the directory where the logs and the saved model will be written>
export CHECKPOINT_DIR=<Optional directory where checkpoint files will be read and written>
# For a custom batch size, set env var `BATCH_SIZE` or it will run with a default value.
export BATCH_SIZE=<customized batch size value>

./quickstart/recommendation/tensorflow/wide_deep_large_ds/training/cpu/<script name>.sh

Additional Resources