Predicting cancer surival using Keras

Data Sources

The PreCog data set used for training is hosted by Stanford.

To host the file on Google Cloud Storage:

TRAIN_FILE=train.csv
EVAL_FILE=eval.csv

GCS_TRAIN_FILE=gs://gs_bucket/train.csv
GCS_EVAL_FILE=gs://gs_bucket/eval.csv

gsutil cp $GCS_TRAIN_FILE $TRAIN_FILE
gsutil cp $GCS_EVAL_FILE $EVAL_FILE

Data Format

Training file with dimensions $N$ samples by $p*genes+survival+censor$ features:

[N, 0:p] Expression matrix
[N, p+1] Survival information
[N, p+2] Censor information

Preprocessing Files

Selects features from TCGA expression from a features file, and creates training and evaluation files.

python preprocess.py --expression-file "experiment/data/tcga_sample/expression.tsv" --survival-file "experiment/data/tcga_sample/survival.tsv" --features-file "experiment/data/genes.tide.txt"

The files should be in a .tsv format (tab seperated values).

Virtual environment

Virtual environments are strongly suggested, but not required. Installing this sample's dependencies in a new virtual environment allows you to run the sample without changing global python packages on your system.

There are two options for the virtual environments:

Install Virtual env
- Create virtual environment virtualenv census_keras
- Activate env source census_keras/bin/activate
Install Miniconda
- Create conda environment conda create --name census_keras python=2.7
- Activate env source activate census_keras

Install dependencies

Install gcloud
Install the python dependencies. pip install --upgrade -r requirements.txt

Using local python

You can run the Keras code locally.

A sample local run can be run as:

TRAIN_STEPS=35
BATCH_SIZE=256
TRAIN_FILE="$DATA_DIR/TrainingData.txt"
EVAL_FILE="$DATA_DIR/EvalData.txt"
VALIDATION_FILE="$DATA_DIR/TestData.txt"
LEARNING_RATE=0.0003
python -m trainer.task --train-files $TRAIN_FILE \
                       --eval-files $EVAL_FILE \
                       --validation-files $VALIDATION_FILE \
                       --job-dir $JOB_DIR \
                       --train-steps $TRAIN_STEPS \
                       --learning-rate $LEARNING_RATE \
                       --num-epochs 100 \
                       --early-stop 10 \
                       --train-batch-size $BATCH_SIZE

Training using gcloud local

You can run Keras training using gcloud locally

JOB_DIR=surv_keras
TRAIN_STEPS=200
gcloud ml-engine local train --package-path trainer \
                             --module-name trainer.task \
                             -- \
                             --train-files $TRAIN_FILE \
                             --eval-files $EVAL_FILE \
                             --job-dir $JOB_DIR \
                             --train-steps $TRAIN_STEPS

Prediction using gcloud local

You can run prediction on the SavedModel created from Keras HDF5 model

python preprocess.py sample.json

gcloud ml-engine local predict --model-dir=$JOB_DIR/export \
                               --json-instances sample.json

Training using Cloud ML Engine

You can train the model on Cloud ML Engine

gcloud ml-engine jobs submit training $JOB_NAME \
                                    --stream-logs \
                                    --runtime-version 1.4 \
                                    --job-dir $JOB_DIR \
                                    --package-path trainer \
                                    --module-name trainer.task \
                                    --region us-central1 \
                                    -- \
                                    --train-files $GCS_TRAIN_FILE \
                                    --eval-files $GCS_EVAL_FILE \
                                    --train-steps $TRAIN_STEPS

Prediction using Cloud ML Engine

You can perform prediction on Cloud ML Engine by following the steps below. Create a model on Cloud ML Engine

gcloud ml-engine models create keras_model --regions us-central1

Export the model binaries

MODEL_BINARIES=$JOB_DIR/export

Deploy the model to the prediction service

gcloud ml-engine versions create v1 --model keras_model --origin $MODEL_BINARIES --runtime-version 1.2

Create a processed sample from the data

python preprocess.py sample.json

Run the online prediction

gcloud ml-engine predict --model keras_model --version v1 --json-instances sample.json

Visualize training with TensorBoard

tensorboard --logdir=path/to/log-directory --host=127.0.0.1

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
.vscode		.vscode
experiment		experiment
models		models
preprocess		preprocess
trainer		trainer
transfer_learning		transfer_learning
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
nonorm.sbatch		nonorm.sbatch
requirements.txt		requirements.txt
setup.py		setup.py
test.sh		test.sh
train.sbatch		train.sbatch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting cancer surival using Keras

Data Sources

Data Format

Preprocessing Files

Virtual environment

Install dependencies

Using local python

Training using gcloud local

Prediction using gcloud local

Training using Cloud ML Engine

Prediction using Cloud ML Engine

Visualize training with TensorBoard

About

Releases 1

Packages

Languages

License

p-shen/risk

Folders and files

Latest commit

History

Repository files navigation

Predicting cancer surival using Keras

Data Sources

Data Format

Preprocessing Files

Virtual environment

Install dependencies

Using local python

Training using gcloud local

Prediction using gcloud local

Training using Cloud ML Engine

Prediction using Cloud ML Engine

Visualize training with TensorBoard

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages