Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] how to eval large scale model use 1dp+8pp? #447

Closed
mxjmtxrm opened this issue Dec 13, 2024 · 4 comments · Fixed by #481
Closed

[BUG] how to eval large scale model use 1dp+8pp? #447

mxjmtxrm opened this issue Dec 13, 2024 · 4 comments · Fixed by #481
Labels
bug Something isn't working

Comments

@mxjmtxrm
Copy link

Describe the bug

I tired to eval a large scale model use1dp+8pp with accelerate. I use the command like the following:

accelerate launch --multi_gpu --num_processes=1 run_evals_accelerate.py \
    --model_args="pretrained=<path to model on the hub>" \
    --model_parallel \
    --tasks <task parameters> \
    --output_dir output_dir

The error is ValueError: You need to use at least 2 processes to use --multi_gpu

How to solve this problem?

Version info

lighteval-0.3.0

@mxjmtxrm mxjmtxrm added the bug Something isn't working label Dec 13, 2024
@NathanHB
Copy link
Member

Hi ! to use DP = 1 you need to run wihtout accelerate. In your case the command would be:

lighteval accelerate "pretrained=gpt2,model_parallel=True" "leaderboard|truthfulqa:mc|0|0"

The command you described above is deprecated, Where did you find it ?

@mxjmtxrm
Copy link
Author

@mxjmtxrm
Copy link
Author

I updated lighteval to V0.6.0 and tried again.
I used the following command:

export CUDA_VISIBLE_DEVICES=0,1,2,3 
lighteval accelerate "pretrained=<model_path>,model_parallel=True" "leaderboard|truthfulqa:mc|0|0"

It failed to load the model with MP, and only GPU 0 was occupied.

[2024-12-25 03:29:43,045] [    INFO]: Model parallel was set to False, max memory set to None and device map to None (base_model.py:389)

I found 'LOCAL_WORLD_SIZE' and 'WORLD_SIZE' was none.
How to solve this problem?

@NathanHB
Copy link
Member

NathanHB commented Dec 30, 2024

Hi ! You are right, I opened a PR you can checkout it should solve the issue.

NathanHB added a commit that referenced this issue Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants