You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
File "/home//anaconda3/envs/fedllm/lib/python3.10/site-packages/fedml/cross_silo/client/fedml_trainer.py", line 83, in train weights = self.trainer.get_model_params() File "/gpfs/work4/0/tese0660/projects/FedML/python/spotlight_prj/fedllm/run_fedllm.py", line 325, in get_model_params peft_state_dict = load_checkpoint(self.latest_checkpoint_dir) File "/gpfs/work4/0/tese0660/projects/FedML/python/spotlight_prj/fedllm/run_fedllm.py", line 238, in load_checkpoint raise FileNotFoundError( FileNotFoundError: Could not find either PEFT checkpoint in "/gpfs/work4/0/tese0660/projects/FedML/python/spotlight_prj/fedllm/.logs/FedML/1111/node_2/round_0_before_agg/adapter_model.bin" nor full checkpoint in /gpfs/work4/0/tese0660/projects/FedML/python/spotlight_prj/fedllm/.logs/FedML/1111/node_2/round_0_before_agg/pytorch_model.bin. [2024-07-30 15:00:46,590] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 3673085
Could someone help me with the issue? Thanks!
The text was updated successfully, but these errors were encountered:
When I run # run aggregator server
bash scripts/run_fedml_server.sh "$RUN_ID"
run client(s)
bash scripts/run_fedml_client.sh 1 "$RUN_ID"
bash scripts/run_fedml_client.sh 2 "$RUN_ID"
bash scripts/run_fedml_client.sh 3 "$RUN_ID"
I have the error as follows:
File "/home//anaconda3/envs/fedllm/lib/python3.10/site-packages/fedml/cross_silo/client/fedml_trainer.py", line 83, in train weights = self.trainer.get_model_params() File "/gpfs/work4/0/tese0660/projects/FedML/python/spotlight_prj/fedllm/run_fedllm.py", line 325, in get_model_params peft_state_dict = load_checkpoint(self.latest_checkpoint_dir) File "/gpfs/work4/0/tese0660/projects/FedML/python/spotlight_prj/fedllm/run_fedllm.py", line 238, in load_checkpoint raise FileNotFoundError( FileNotFoundError: Could not find either PEFT checkpoint in "/gpfs/work4/0/tese0660/projects/FedML/python/spotlight_prj/fedllm/.logs/FedML/1111/node_2/round_0_before_agg/adapter_model.bin" nor full checkpoint in /gpfs/work4/0/tese0660/projects/FedML/python/spotlight_prj/fedllm/.logs/FedML/1111/node_2/round_0_before_agg/pytorch_model.bin. [2024-07-30 15:00:46,590] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 3673085
Could someone help me with the issue? Thanks!
The text was updated successfully, but these errors were encountered: