Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Torch dp support (WIP) #3207

Draft
wants to merge 86 commits into
base: main
Choose a base branch
from
Draft

Conversation

grimoire
Copy link
Collaborator

@grimoire grimoire commented Mar 4, 2025

Add dp support.

  • manually set CUDA_VISIBLE_DEVICES in torchrun is disabled because of MMEngine.
  • We could start a ray cluster to allocate device, but the memory allocated by check_env would be allocated on device 0 and can not be released.
  • We need to update MASTER_PORT inside torchrun to create distributed process group.
  • engine should not be destroyed until all data has been finished.
  • W8A8/AWQ would not be supported, blocked_fp8 WIP
import os
from lmdeploy import pipeline, PytorchEngineConfig
from lmdeploy.messages import GenerationConfig


def main(rank: int):
    model_path ='meta-llama/Meta-Llama-3.1-8B-Instruct'

    log_level = 'WARNING'
    prompts = [
        'hakuna matata.',
        'fast fox jump over the lazy dog.'
        ]
    prompts = prompts[rank:rank+1]

    backend_config = PytorchEngineConfig(
        tp=2,
        dp=2,
        dp_rank=rank,
        # eager_mode=True,
    )
    gen_config = GenerationConfig(
        temperature=1.0,
        top_k=1,
        do_sample=True,
    )

    with pipeline(model_path, backend_config=backend_config, log_level=log_level) as pipe:
        outputs = pipe(prompts, gen_config=gen_config)
        print(outputs)

        # wait other process finish.
        while True:
            import time
            time.sleep(1)

if __name__ == '__main__':
    rank = int(os.environ['RANK'])
    os.environ['MASTER_PORT'] = str(29555)
    main(rank)
torchrun --nproc-per-node=2 test.py

@grimoire grimoire changed the title Torch dp support (PART 1) Torch dp support (WIP) Mar 6, 2025
@grimoire grimoire marked this pull request as draft March 6, 2025 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant