The PreCog data set used for training is hosted by Stanford.
To host the file on Google Cloud Storage:
TRAIN_FILE=train.csv
EVAL_FILE=eval.csv
GCS_TRAIN_FILE=gs://gs_bucket/train.csv
GCS_EVAL_FILE=gs://gs_bucket/eval.csv
gsutil cp $GCS_TRAIN_FILE $TRAIN_FILE
gsutil cp $GCS_EVAL_FILE $EVAL_FILE
Training file with dimensions
- [N, 0:p] Expression matrix
- [N, p+1] Survival information
- [N, p+2] Censor information
Selects features from TCGA expression from a features file, and creates training and evaluation files.
python preprocess.py --expression-file "experiment/data/tcga_sample/expression.tsv" --survival-file "experiment/data/tcga_sample/survival.tsv" --features-file "experiment/data/genes.tide.txt"
The files should be in a .tsv
format (tab seperated values).
Virtual environments are strongly suggested, but not required. Installing this sample's dependencies in a new virtual environment allows you to run the sample without changing global python packages on your system.
There are two options for the virtual environments:
- Install Virtual env
- Create virtual environment
virtualenv census_keras
- Activate env
source census_keras/bin/activate
- Create virtual environment
- Install Miniconda
- Create conda environment
conda create --name census_keras python=2.7
- Activate env
source activate census_keras
- Create conda environment
- Install gcloud
- Install the python dependencies.
pip install --upgrade -r requirements.txt
You can run the Keras code locally.
A sample local run can be run as:
TRAIN_STEPS=35
BATCH_SIZE=256
TRAIN_FILE="$DATA_DIR/TrainingData.txt"
EVAL_FILE="$DATA_DIR/EvalData.txt"
VALIDATION_FILE="$DATA_DIR/TestData.txt"
LEARNING_RATE=0.0003
python -m trainer.task --train-files $TRAIN_FILE \
--eval-files $EVAL_FILE \
--validation-files $VALIDATION_FILE \
--job-dir $JOB_DIR \
--train-steps $TRAIN_STEPS \
--learning-rate $LEARNING_RATE \
--num-epochs 100 \
--early-stop 10 \
--train-batch-size $BATCH_SIZE
You can run Keras training using gcloud locally
JOB_DIR=surv_keras
TRAIN_STEPS=200
gcloud ml-engine local train --package-path trainer \
--module-name trainer.task \
-- \
--train-files $TRAIN_FILE \
--eval-files $EVAL_FILE \
--job-dir $JOB_DIR \
--train-steps $TRAIN_STEPS
You can run prediction on the SavedModel created from Keras HDF5 model
python preprocess.py sample.json
gcloud ml-engine local predict --model-dir=$JOB_DIR/export \
--json-instances sample.json
You can train the model on Cloud ML Engine
gcloud ml-engine jobs submit training $JOB_NAME \
--stream-logs \
--runtime-version 1.4 \
--job-dir $JOB_DIR \
--package-path trainer \
--module-name trainer.task \
--region us-central1 \
-- \
--train-files $GCS_TRAIN_FILE \
--eval-files $GCS_EVAL_FILE \
--train-steps $TRAIN_STEPS
You can perform prediction on Cloud ML Engine by following the steps below. Create a model on Cloud ML Engine
gcloud ml-engine models create keras_model --regions us-central1
Export the model binaries
MODEL_BINARIES=$JOB_DIR/export
Deploy the model to the prediction service
gcloud ml-engine versions create v1 --model keras_model --origin $MODEL_BINARIES --runtime-version 1.2
Create a processed sample from the data
python preprocess.py sample.json
Run the online prediction
gcloud ml-engine predict --model keras_model --version v1 --json-instances sample.json
tensorboard --logdir=path/to/log-directory --host=127.0.0.1