deepspeed initial entire model on each GPU at begining #3154

floatingbigcat · 2023-04-06T08:21:02Z

floatingbigcat
Apr 6, 2023

I find that deepseed put entire model in to one GPU at the very begining and partation next, is this a desired behavior?
https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/runtime/engine.py#L1069

I can't use v100 (16GB) to train large model (10B+) with pure stage3 no matter how many I used, as it always gets OOM at the begining.

And I use gpu with larger mem to do the profile(only deepspeed.initialize) on the same code, result shows that the gpu first peak usage keeps the same with different gpu nums. the peak usage exactly match the requried mem for my model in fp16 format.

4 gpus

2 gpus

Answered by zarzen

Apr 6, 2023

You will need to init your model with

with deepspeed.zero.Init():
    model = ....

View full answer

zarzen · 2023-04-06T12:01:30Z

zarzen
Apr 6, 2023

You will need to init your model with

with deepspeed.zero.Init():
    model = ....

3 replies

floatingbigcat Apr 7, 2023
Author

Thanks for your reply!
I will give a try, so I can't simply initialize like this with stage3 config, and have to modify with deepspeed.zero.Init() way, right?

 model, optimizer, _, lr_scheduler = deepspeed.initialize(
          model=model,
          optimizer=optimizer,
          args=neox_args,
          lr_scheduler=_lr_scheduler,
          model_parameters=_model_params,
          config_params=neox_args.deepspeed_config,
      )

tjruwase Apr 7, 2023
Maintainer

@floatingsnake, the reason is that you need zero.Init() to partition each parameter during model construction to avoid having 10B parameters on each GPU. See here for some discussion.

apzl Jul 5, 2023

Can you explain how to use zero,Init for AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path).
I'm facing -

RuntimeError: `<class 'transformers.models.t5.modeling_t5.T5DenseGatedActDense'>' was not properly set up for sharding by zero.Init(). A subclass of torch.nn.Module must be defined before zero.Init() where an instance of the class is created.

when trying to initialize model like -

    with deepspeed.zero.Init(config=ds_config):
        model = AutoModelForSeq2SeqLM.from_pretrained(flan-t5-xl)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deepspeed initial entire model on each GPU at begining #3154

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

deepspeed initial entire model on each GPU at begining #3154

floatingbigcat Apr 6, 2023

Replies: 1 comment · 3 replies

zarzen Apr 6, 2023

floatingbigcat Apr 7, 2023 Author

tjruwase Apr 7, 2023 Maintainer

apzl Jul 5, 2023

floatingbigcat
Apr 6, 2023

Replies: 1 comment 3 replies

zarzen
Apr 6, 2023

floatingbigcat Apr 7, 2023
Author

tjruwase Apr 7, 2023
Maintainer