Intel® Extension for TensorFlow* is compatible with stock TensorFlow*. This example shows 3D-UNet Training for medical image segmentation. It contains single-tile training scripts and multi-tile training scripts with horovod.
Install the Intel® Extension for TensorFlow* in legacy running environment, Tensorflow will execute the Training on Intel GPU.
Verified Hardware Platforms:
- Intel® Data Center GPU Max Series
To get better performance, instead of installing the official repository, you can apply the patch and install it as shown here. You can choose one patch from single-tile patch 3dunet_itex.patch
and multi-tile patch 3dunet_itex_with_horovod.patch
.
git clone https://github.com/NVIDIA/DeepLearningExamples.git
cd DeepLearningExamples/TensorFlow/Segmentation/UNet_3D_Medical/
git checkout 88eb3cff2f03dad85035621d041e23a14345999e
git apply patch # When applying this patch, please move it to the above 3D-UNet dir first.
Refer to Prepare.
You can use ./pip_set_env.sh
to setup for GPU. It contains the following two steps: creating virtual environment and installing python packages.
- Create Virtual Environment
python -m venv env_itex
source env_itex/bin/activate
- Install
pip install --upgrade pip
pip install --upgrade intel-extension-for-tensorflow[xpu]
pip install intel-optimization-for-horovod
pip install tfa-nightly
pip install git+https://github.com/NVIDIA/dllogger.git
Enable oneAPI running environment (only for GPU) and virtual running environment.
- For GPU, refer to Running
We use Brain Tumor Segmentation 2019 dataset for 3D-UNet training. Upon registration, the challenge's data is made available through the https//ipp.cbica.upenn.edu
service.
The training and test datasets are given as 3D nifti
volumes that can be read using the Nibabel library and NumPy. It can be converted from nifti
to tfrecord
using ./dataset/preprocess_data.py
script.
Assume current_dir is examples/train_maskrcnn/DeepLearningExamples/TensorFlow/Segmentation/UNet_3D_Medical/
.
Here we provide single-tile training scripts and multi-tile training scripts with horovod. The datatype can be float32 or bfloat16.
DATASET_DIR=/the/path/to/dataset
OUTPUT_DIR=/the/path/to/output_dir
First apply patch.
git apply 3dunet_itex.patch
- float32
python main.py --benchmark --data_dir $DATASET_DIR --model_dir $OUTPUT_DIR --exec_mode train --batch_size $BATCH_SIZE --warmup_steps 150 --max_steps 1000 --log_every 1
- bfloat16
python main.py --benchmark --data_dir $DATASET_DIR --model_dir $OUTPUT_DIR --exec_mode train --warmup_steps 150 --max_steps 1000 --batch_size=$BATCH_SIZE --log_every 1 --amp
First apply patch.
git apply 3dunet_itex_with_horovod.patch
- float32
mpirun -np 2 -prepend-rank -ppn 2 \
python main.py --data_dir=$DATASET_DIR --benchmark --model_dir=$MODEL_DIR --exec_mode train --warmup_steps 150 --max_steps 1000 --batch_size=$BATCH_SIZE
- bf16
mpirun -np 2 -prepend-rank -ppn 2 \
python main.py --data_dir=$DATASET_DIR --benchmark --model_dir=$MODEL_DIR --exec_mode train --warmup_steps 150 --max_steps 1000 --batch_size=$BATCH_SIZE --amp
- If you get the following error log, refer to Enable Running Environment to Enable oneAPI running environment.
tensorflow.python.framework.errors_impl.NotFoundError: libmkl_sycl.so.2: cannot open shared object file: No such file or directory