This repository contains the official implementation of TensorGRaD, a memory-efficient gradient optimization framework for training large-scale neural operators. TensorGRaD uses a robust combination of low-rank tensor decomposition and unstructured sparsification to compress gradient updates. TensorGRaD achieves significant memory savings while maintaining or even improving model performance.
Start from a clean conda environment:
# Create and activate a new conda environment with Python 3.10
conda create -n tensorgrad python=3.10
conda activate tensorgrad
# Install PyTorch with CUDA support
conda install pytorch torchvision -c pytorch
# Clone the repository
git clone https://github.com/neuraloperator/tensorgrad.git
cd tensorgrad
# Install dependencies
pip install -r requirements.txt
TensorGRaD is a drop-in optimizer that replaces standard optimizers like AdamW. It applies compression at the gradient level through:
- Low-rank decomposition using a Tucker higher-order low-rank decomposition
- Gradient sparsification using structured or unstructured sparsity (top-k, random-k, or probabilistic)
- Composite projectors that combine low-rank and sparse compression in a compositional manner: TensorGRaD first applies either a low-rank or sparse decomposition to the gradient, then compresses the residual using the second method. This sequential scheme ensures that low-rank and sparse components complement each other for more effective compression.
TensorGRaD supports mixed-precision training and is implemented for scientific ML workloads that optimize tensors.
-
tensorgrad/
: Optimizer implementationsadamw.py
: Single projector optimizerstensorgrad.py
: Composite projector variant (TensorGRaD)projectors/
: Includes all projector logic (tensor/matrix, sparse/low-rank)
-
scripts/experiments/
: Runs for ablation studies and benchmarks (low-rank, sparse, mixed) -
scripts/profiling/
: Memory profiling tools for different architectures -
train_ns_repro_tensorgrad.py
: Main training script on Navier–Stokes
Use YAML-based configs and command-line overrides for training:
python train_ns_repro_tensorgrad.py --config_file ns_tensorgrad_repro_config.yaml
Or use the prepared bash scripts in scripts/experiments/
.
--opt.tensorgrad True # Enable TensorGRaD
--opt.tensorgrad False # Use AdamW
--opt.proj_type unstructured_sparse \
--opt.sparse_ratio 0.05 \
--opt.sparse_type randk \
--opt.second_proj_type low_rank \
--opt.second_rank 0.20
--opt.proj_type low_rank \
--opt.rank 0.25
or
--opt.proj_type structured_sparse \
--opt.sparse_ratio 0.25 \
--opt.sparse_type randk
--opt.update_proj_gap 1000 # Projection update interval
--fno.fno_block_precision mixed # Activations: mixed precision
--fno.fno_block_weights_precision half # Weights: half precision
-
Navier–Stokes (
$Re=1000$ )- Resolutions: 128×128 and 1024×1024
- Automatically downloaded via neuraloperator
-
Navier–Stokes (
$Re=10^5$ )- High-resolution (1024×1024)
- Download manually from Hugging Face
- Requires
nsforcing_test_1024.hdf5
andnsforcing_train_1024.hdf5
- See paper for pretraining and dataset details
To prepare your own data:
- Follow the structure used in
neuraloperator
- Review
FullSizeNavierStokes
class intensorgrad/navier_stokes.py
- Utilities are available in
dataset_creation/
Use bash scripts under scripts/profiling/
for benchmarking.
bash scripts/profiling/128modes_256channels_4layers/US_LR_025.sh
Profiling outputs are written to memstats/
and profiler_outputs/
.
If you use this code, please cite:
@misc{tensorgrad,
title={TensorGRaD: Tensor Gradient Robust Decomposition for Memory-Efficient Neural Operator Training},
author={Sebastian Loeschcke and David Pitt and Robert Joseph George and Jiawei Zhao and Cheng Luo and Yuandong Tian and Jean Kossaifi and Anima Anandkumar},
year={2025},
eprint={2501.02379},
archivePrefix={arXiv},
primaryClass={cs.LG}
}