Skip to content

OUTRE: An Out-of-core De-Redundancy Framework for GNN Training on Massive Graphs within A Single Server

Notifications You must be signed in to change notification settings

shengzeang/OUTRE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

870459a · Oct 21, 2023

History

4 Commits
Oct 21, 2023
Oct 21, 2023
Oct 21, 2023
Oct 21, 2023
Oct 21, 2023
Oct 21, 2023
Oct 21, 2023
Oct 21, 2023
Oct 21, 2023
Oct 21, 2023
Oct 21, 2023
Oct 21, 2023

Repository files navigation

OUTRE

This is the code repository of "OUTRE: An Out-of-core De-Redundancy Framework for GNN Training on Massive Graphs within A Single Server". The code of OUTRE is built on an existing GNN training framework Ginex. The Bloom Filter implementation in OUTRE is from here.

Setup:

  1. Disable read_ahead on Linux.

    sudo -s
    echo 0 > /sys/block/$block_device_name/queue/read_ahead_kb
  2. Install necessary Linux packages.

    1. sudo apt-get install -y build-essential
    2. sudo apt-get install -y cgroup-tools
    3. sudo apt-get install -y unzip
    4. sudo apt-get install -y python3-pip and pip3 install --upgrade pip
    5. Compatible NVIDIA CUDA driver and toolkit.
  3. Install Python packages.

    1. PyTorch
    2. ogb
    3. PyG
    4. DGL with version of >= 1.0
    5. others that necessary
  4. Install ninja.

    sudo wget https://github.com/ninja-build/ninja/releases/download/v1.8.2/ninja-linux.zip
    sudo unzip ninja-linux.zip -d /usr/local/bin/
    sudo update-alternatives --install /usr/bin/ninja ninja /usr/local/bin/ninja 1 --force
  5. Use cgroup to limit the memory size. For example, we limit the host memory size to 64GB.

    sudo -s
    cgcreate -g memory:64gb
    echo 64000000000 > /sys/fs/cgroup/memory/64gb/memory.limit_in_bytes
  6. Allocate enough swap area.

Run on mag240m-cite:

  1. Prepare dataset

    python3 prepare_dataset_mag.py --dataset mag240m
  2. Partition the original graph

    python3 partition_fennel_twolevel.py --dataset mag240m
  3. Create neighbor cache

    python3 create_neigh_cache.py --neigh-cache-size 10000000000
  4. Get PYTHONPATH

    python3 get_pythonpath.py
  5. Run OUTRE on mag240m-cite. Replace PYTHONPATH=... with the outcome of step 4.

    sudo PYTHONPATH=xxx cgexec -g memory:64gb python3 -W ignore run_profiling.py --neigh-cache-size 10000000000 --feature-cache-size 30000000000 --dataset mag240m
    
    sudo PYTHONPATH=xxx cgexec -g memory:64gb python3 -W ignore run_main.py --neigh-cache-size 10000000000 --feature-cache-size 30000000000 --num-epochs 1 --dataset mag240m

About

OUTRE: An Out-of-core De-Redundancy Framework for GNN Training on Massive Graphs within A Single Server

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published