Memory and time requirements for Mistral-7B #68

NamburiSrinath · 2024-09-12T05:51:51Z

Hi,

I am trying to prune Mistral 7B (https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) and while I was able to successfully run the commands for magnitude pruning, I was facing issues with SparseGPT and Wanda.

SparseGPT: Took more than an hour and threw a CUDA OOM error (I'm working on a g5.24x which is 4 x 24GB), so I believe that should be definitely enough
Wanda: The code is running for ~2hrs! And failed with CUDA OOM

Commands used:
python main.py --model 'mistralai/Mistral-7B-Instruct-v0.2' --prune_method sparsegpt --sparsity_ratio 0.1 --sparsity_type unstructured --save out/mistral_7b/unstructured/sparsegpt/0.1/ --save_model out/mistral_7b/unstructured/sparsegpt/0.1/

python main.py --model 'mistralai/Mistral-7B-Instruct-v0.2' --prune_method wanda --sparsity_ratio 0.1 --sparsity_type unstructured --save out/mistral_7b/unstructured/wanda/0.1/ --save_model out/mistral_7b/unstructured/wanda/0.1/

Any help here would be greatly appreciated :), tagging authors - @liuzhuang13 , @Eric-mingjie and @eltociear

The text was updated successfully, but these errors were encountered:

NamburiSrinath · 2024-09-12T20:43:58Z

Update --- The error comes from initializing torch.zeros(), below is the tracestack.

Traceback (most recent call last):                                                                                                                                                        
  File "/home/ubuntu/Compress_Align/wanda/main.py", line 113, in <module>                                                                                                                 
    main()                                                                                                                                                                                
  File "/home/ubuntu/Compress_Align/wanda/main.py", line 73, in main                                                                                                                      
    prune_sparsegpt(args, model, tokenizer, device, prune_n=prune_n, prune_m=prune_m)                                                                                                     
  File "/home/ubuntu/anaconda3/envs/compress_align/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context                                                 
    return func(*args, **kwargs)                                                                                                                                                          
  File "/home/ubuntu/Compress_Align/wanda/lib/prune.py", line 230, in prune_sparsegpt                                                                                                     
    inps = torch.zeros(                                                                                                                                                                   
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 GiB. GPU 1 has a total capacity of 21.99 GiB of which 16.77 GiB is free. Including non-PyTorch memory, this process ha
s 5.21 GiB memory in use. Of the allocated memory 4.88 GiB is allocated by PyTorch, and 89.82 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try 
setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Upon debugging further, here are the values for print(args.nsamples, model.seqlen, model.config.hidden_size)

Mistral-7B:  128, 32768, 4096
Llama-2-7B: 128, 4096, 4096

So basically the sequence length of Mistral is very large which doesn't allow for creation of tensor.

Are there any suggestions to overcome this error?

P.S: I think this issue is similar to #51 i.e support to Mistral models

NamburiSrinath mentioned this issue Sep 12, 2024

AttributeError: 'NoneType' object has no attribute 'to' #51

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory and time requirements for Mistral-7B #68

Memory and time requirements for Mistral-7B #68

NamburiSrinath commented Sep 12, 2024 •

edited

Loading

NamburiSrinath commented Sep 12, 2024 •

edited

Loading

Memory and time requirements for Mistral-7B #68

Memory and time requirements for Mistral-7B #68

Comments

NamburiSrinath commented Sep 12, 2024 • edited Loading

NamburiSrinath commented Sep 12, 2024 • edited Loading

NamburiSrinath commented Sep 12, 2024 •

edited

Loading

NamburiSrinath commented Sep 12, 2024 •

edited

Loading