How to reduce the memory usage during forward? #3855

jiang719 · 2023-07-01T00:31:10Z

jiang719
Jul 1, 2023

Here is my config

{
    "bf16": {
        "enabled": true
    },
    "zero_optimization": {
        "stage": 3,
        "offload_optimizer": {
            "device": "cpu",
            "pin_memory": false
        },
        "offload_param": {
            "device": "cpu",
            "pin_memory": false
        },
        "overlap_comm": false,
        "contiguous_gradients": true,
        "stage3_gather_16bit_weights_on_model_save": true
    },
    "activation_checkpointing": {
        "partition_activations": true,
        "contiguous_memory_optimization": true,
        "cpu_checkpointing": true,
        "number_checkpoints": 1
    },
    "zero_force_ds_cpu_optimizer": false,
    "zero_allow_untested_optimizer": true,
    "gradient_accumulation_steps": 1,
    "gradient_clipping": 0.5,
    "steps_per_print": 1000,
    "train_batch_size": 8,
    "wall_clock_breakdown": false
}

I'm using LORA (https://arxiv.org/abs/2106.09685) so the gradient and optimizer shouldn't take too much memory since the trainable parameter number is very small.
But I got OOM during forward, it's already ZERO3 with offload. Is there ways to reduce the forward memory usage? Thanks.

mzamini92 · 2023-07-06T21:48:40Z

mzamini92
Jul 6, 2023

Use a smaller batch size (easiest and most obvious way), Use a different activation checkpointing configuration (try changing the contiguous_memory_optimization parameter to false to see if that helps), and Use a different optimizer (Some optimizers, such as AdamW, use more memory than others)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to reduce the memory usage during forward? #3855

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

How to reduce the memory usage during forward? #3855

jiang719 Jul 1, 2023

Replies: 1 comment

mzamini92 Jul 6, 2023

jiang719
Jul 1, 2023

mzamini92
Jul 6, 2023