NDPipe

What is NDPipe?

NDPipe is a deep learning (DL) system designed to enhance both training and inference performance by embracing the concept of near-data processing (NDP) within storage servers. At its core, NDPipe utilizes an innovative architecture that distributes storage servers equipped with cost-effective commodity GPUs across a data center.

NDPipe is composed of two main elements: PipeStore (storage server equipped with a low-end GPU for near-data training and inference) and Tuner (training server that manages distributed PipeStores)

The original paper that introduced NDPipe is currently in the revision stage of ACM ASPLOS 2024.

The archive snapshot with a detailed screencast used in the paper is available at .

Prerequisites

NDPipe requires hardware configurations equipped NVIDIA GPU. We detail the AWS instance types or the alternatable NVIDIA GPU-equipped machines that are suitable for NDPipe.

For running the PipeStore component, one or more machines like the following specifications is required:

Amazon Machine Image (AMI): Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 20.04) version 20240228.
Instance Type: g4dn.2xlarge
- 8 vCPUs
- 32 GiB Memory
- 1 NVIDIA T4 GPU
- 125 GiB gp3 Storage

For the Tuner component, a single machine like the following specifications is required:

Amazon Machine Image (AMI): Deep Learning Base Proprietary Nvidia Driver GPU AMI (Ubuntu 20.04) version 20240201.
Instance Type: p3.2xlarge
- 8 vCPUs
- 61 GiB Memory
- 1 NVIDIA V100 GPU
- 125 GiB gp3 Storage

Alternatively, NDPipe can be configured on real machines equipped with CUDA-enabled NVIDIA GPUs for both PipeStore and Tuner.

Installation & Execution (Fine-tuning)

PipeStore preparation

Clone required repository(NDPipe) into the machine.

# PipeStore
~$ git clone https://github.com/dgist-datalab/NDPipe.git

Generate an SSH key.

# PipeStore
~$ ssh-keygen -t rsa

Display the public SSH key and append it to the Tuner's authorized_keys.

# PipeStore
~$ cat ~/.ssh/id_rsa.pub

On the Tuner machine, run:

 # Tuner
 ~$ echo [PublicKeyContent] >> ~/.ssh/authorized_keys

Install the NVIDIA-toolkit

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

Run a Docker container with NVIDIA TensorRT.

# PipeStore
~$ docker run --gpus all -it -v ~:/DataLab --name PipeStore nvcr.io/nvidia/tensorrt:20.09-py3

Set the environment variables for Tuner IP and username (replace placeholders with actual values):

# PipeStore
/workspace# export TUNER_IP=[Tuner IP]
/workspace# export TUNER_USERNAME=[Tuner username]

Add the Tuner IP to known hosts for SSH:

# PipeStore
/workspace# ssh-keyscan -H $TUNER_IP >> /DataLab/.ssh/known_hosts

Update and upgrade the package lists:

# PipeStore
/workspace# apt update && apt upgrade

Update pip and install required Python packages from requirements.txt.

# PipeStore
/workspace# cd /DataLab/NDPipe/Fine_tuning/PipeStore
.../PipeStore# pip install -r requirements.txt

repare the dataset directory and download the dataset:

# PipeStore
.../PipeStore# mkdir dataset
.../PipeStore# python download_dataset.py

(optional) If not using a T4 GPU, compile the model specifically for your GPU (e.g., for a different GPU):

# PipeStore
trtexec --onnx=resnet50.onnx --workspace=8192 --saveEngine=resnet50.engine --buildOnly --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --fp16

Tuner Preparation

Clone required repository(NDPipe) into the machine.

# Tuner
~$ git clone https://github.com/dgist-datalab/NDPipe.git

Update pip and install required Python packages from requirements.txt.

# Tuner
~$ cd NDPipe/Fine_tuning/Tuner
.../Tuner$ pip install -r requirements.txt

Running the System (< 2mins to run)

On the Tuner, start the server script with optional parameters:

# Tuner
.../Tuner$ python3.9 server.py --num_of_run [value] --num_of_client [value] --port [value]

--num_of_run or -r: Sets the pipelining strength. Default is 1.
--num_of_client or -n: The number of PipeStores. Default is 1.
--port or -p: Socket connection port. Default is 25258.

Simultaneously execute the main script on each PipeStore server (specifying the port if needed).

# PipeStore
.../PipeStore# python main.py [port]

The script uses a command-line argument for the port if provided; otherwise, it defaults to 25258.

Review the inference results. In this test, we provide two pieces of information: the elapsed time for the feature extraction time/throughput (executed in PipeStore) and the overall fine-tuning time.

# In AWS setting
Feature extraction time (sec):               31.360074814000654
Feature extraction throughput (image/sec):   1913.2607417509444
Overall fine-tuning time (sec):              75.19124622900017

Installation & Execution (Offline Inference)

For offline inference evaluation, we provide a simple test code. This code can be executed on the PipeStore side.

PipeStore Preparation

The prerequisites for offline inference are almost identical to those required for fine-tuning.

Follow steps 1-9 of the fine-tuning guide (Only PipeStore setup is needed).
Install the deflate module by following the instructions below:

/workspace# cd /DataLab/NDPipe/Offline_inference
.../Offline_inference# pip3 install deflate

Download the dataset file and unzip it. This dataset is based on CIFAR-100 and contains 10,000 compressed binaries of preprocessed images.

../Offline_inference# wget https://zenodo.org/record/10796922/files/inference_dataset.zip
../Offline_inference# sha256sum inference_dataset.zip
86e1b0452abb9aa164d8be8d6a9577f78251502f7b68fff24ce8595218969fdf
../Offline_inference# unzip inference_dataset.zip

Compile the model specifically for your GPU

trtexec --onnx=resnet50.onnx --workspace=8192 --saveEngine=resnet50.engine --buildOnly --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --fp16

Running the System (Takes < 1 min to run)

Execute the following command: the first argument specifies the path to the model's engine, and the second argument specifies the dataset path.

python3 inference_test.py resnet50.engine image/

Review the inference results. In this test, we provide two pieces of information: the elapsed time for inference and the inference throughput.

[NDPipe] inference time: 14.78sec
[NDPipe] inference throughput : 2417.53IPS

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
Fine_tuning		Fine_tuning
Offline_inference		Offline_inference
LICENSE		LICENSE
NDPipe.png		NDPipe.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NDPipe

What is NDPipe?

Prerequisites

Installation & Execution (Fine-tuning)

PipeStore preparation

Tuner Preparation

Running the System (< 2mins to run)

Installation & Execution (Offline Inference)

PipeStore Preparation

Running the System (Takes < 1 min to run)

About

Releases 1

Packages

Contributors 2

Languages

License

dgist-datalab/NDPipe

Folders and files

Latest commit

History

Repository files navigation

NDPipe

What is NDPipe?

Prerequisites

Installation & Execution (Fine-tuning)

PipeStore preparation

Tuner Preparation

Running the System (< 2mins to run)

Installation & Execution (Offline Inference)

PipeStore Preparation

Running the System (Takes < 1 min to run)

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages