Silences NCCL warning (#1055)

finbarrtimbers · gemini-code-assist[bot] · claude · web-flow · commit e8db3e7b405b · 2025-10-05T01:01:11.000Z
* Loads model on device * Update open_instruct/grpo_fast.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Set conditional device map * Set CUDA device before init_process_group to silence NCCL warning Adds torch.cuda.set_device(self.local_rank) immediately before init_process_group() calls in PolicyTrainerRayProcess.setup_model_update_group() to silence the NCCL warning about unknown device mapping. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Set CUDA device immediately before DeepSpeed init to silence NCCL warning The NCCL warning was coming from deepspeed.init_distributed(), not from the model_update_group initialization. Added torch.cuda.set_device(self.local_rank) immediately before deepspeed.init_distributed() to ensure the device is properly set when DeepSpeed creates its distributed process group. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * trying to fix it * Removed unneeded code. * Cleaned up PR. * Cleaned PR * Cleaned PR * Undid changes to open_instruct/utils.py --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com>
diff --git a/open_instruct/grpo_fast.py b/open_instruct/grpo_fast.py
@@ -625,6 +625,15 @@ def load(self, path: str, map_location=None):
         np.random.seed(worker_seed)
         random.seed(worker_seed)
 
+        torch.distributed.init_process_group(
+            backend="nccl",
+            init_method="env://",
+            world_size=self.world_size,
+            rank=self.rank,
+            timeout=timedelta(minutes=args.backend_timeout),
+            device_id=torch.device("cuda", self.local_rank),
+        )
+
         deepspeed.init_distributed(timeout=timedelta(minutes=args.backend_timeout))
 
         ds_config = get_train_ds_config(offload=False, adam_offload=False, stage=args.deepspeed_stage, bf16=True)