Add multi-gpu inference option #125

natolambert · 2024-05-11T03:36:01Z

Currently run_rm.py only uses one RM because RMs are not well supported generally for inference.
Current implementation is a separate run_rm_mpgu.py script. We can delete this and improve the base script if more use cases emerge.

Closes #95

natolambert · 2024-05-11T04:24:38Z

Inference works distributed, but I couldn't get the results gather working correctly with things like

    state.wait_for_everyone()
    
    # flatten results list of lists if is list of lists
    if state.is_main_process:
        logger.info("Gathering results")
        results = gather_object(results) # gather() is for tensors

chrisliu298 · 2024-09-07T03:44:21Z

I have a distributed inference script for my own use case here, which might help.

The code assumes ds is a Hugging Face dataset of the following structure:

Dataset({
    features: ['chosen', 'rejected'],
    num_rows: 1000
})

and each of chosen and rejected is a list of messages in the standard chat format.

The core part of the code is shown below:

accelerator.wait_for_everyone()
with accelerator.split_between_processes(ds) as ds_shard:
    rewards = []
    for sample in tqdm(ds_shard):
        chosen_score, rejected_score = rm.get_score(sample["chosen"], sample["rejected"])
        rewards.append({"chosen_reward": chosen_score, "rejected_reward": rejected_score})
rewards_gathered = accelerate.utils.gather_object([{"rewards": rewards}])

if accelerator.is_main_process:
    all_rewards = [row for result in rewards_gathered for row in result["rewards"]]

The key steps are 1) iterate through the assigned ds_shard for the current GPU, 2) gather rewards outside the context (but for all processes), and 3) collect all rewards only in the main process.

The final all_rewards will be a list of dicts, like {"chosen_reward": 18.35, "rejected_reward": -7.89}, each corresponding to the same order of the samples in ds.

natolambert added 2 commits May 11, 2024 03:11

init

c94988a

docs fix

2a492a4

natolambert marked this pull request as draft May 11, 2024 04:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multi-gpu inference option #125

Add multi-gpu inference option #125

natolambert commented May 11, 2024

natolambert commented May 11, 2024

chrisliu298 commented Sep 7, 2024

Add multi-gpu inference option #125

Are you sure you want to change the base?

Add multi-gpu inference option #125

Conversation

natolambert commented May 11, 2024

natolambert commented May 11, 2024

chrisliu298 commented Sep 7, 2024