Replies: 1 comment 2 replies
-
我也遇到了相同的问题,请问你现在解决了吗 |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
使用命令: FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/llama3_full_sft_ds3.yaml
结果如下:卡在Converting format of dataset阶段
[2024-07-28 05:54:58,072] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible
07/28/2024 05:55:00 - INFO - llamafactory.cli - Initializing distributed tasks at: 127.0.0.1:20782
W0728 05:55:02.054000 140692203471232 torch/distributed/run.py:757]
W0728 05:55:02.054000 140692203471232 torch/distributed/run.py:757] *****************************************
W0728 05:55:02.054000 140692203471232 torch/distributed/run.py:757] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W0728 05:55:02.054000 140692203471232 torch/distributed/run.py:757] *****************************************
[2024-07-28 05:55:06,359] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-07-28 05:55:06,421] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-07-28 05:55:06,425] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-07-28 05:55:06,425] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible
[2024-07-28 05:55:09,143] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-07-28 05:55:09,154] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-07-28 05:55:09,154] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2024-07-28 05:55:09,156] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-07-28 05:55:09,285] [INFO] [comm.py:637:init_distributed] cdb=None
07/28/2024 05:55:09 - INFO - llamafactory.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
[INFO|tokenization_utils_base.py:2287] 2024-07-28 05:55:09,352 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2287] 2024-07-28 05:55:09,352 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2287] 2024-07-28 05:55:09,352 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2287] 2024-07-28 05:55:09,352 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2533] 2024-07-28 05:55:09,674 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
07/28/2024 05:55:09 - INFO - llamafactory.data.template - Replace eos token: <|eot_id|>
07/28/2024 05:55:09 - INFO - llamafactory.data.template - Add pad token: <|eot_id|>
07/28/2024 05:55:09 - INFO - llamafactory.data.loader - Loading dataset identity.json...
07/28/2024 05:55:09 - INFO - llamafactory.hparams.parser - Process rank: 3, device: cuda:3, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
07/28/2024 05:55:09 - INFO - llamafactory.hparams.parser - Process rank: 2, device: cuda:2, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
07/28/2024 05:55:10 - INFO - llamafactory.hparams.parser - Process rank: 1, device: cuda:1, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
07/28/2024 05:55:10 - INFO - llamafactory.data.template - Replace eos token: <|eot_id|>
07/28/2024 05:55:10 - INFO - llamafactory.data.template - Add pad token: <|eot_id|>
07/28/2024 05:55:10 - INFO - llamafactory.data.template - Replace eos token: <|eot_id|>
07/28/2024 05:55:10 - INFO - llamafactory.data.template - Add pad token: <|eot_id|>
07/28/2024 05:55:10 - INFO - llamafactory.data.template - Replace eos token: <|eot_id|>
07/28/2024 05:55:10 - INFO - llamafactory.data.template - Add pad token: <|eot_id|>
Converting format of dataset (num_proc=16): 100%|███████████████████████████████████████████| 91/91 [00:00<00:00, 416.29 examples/s]
07/28/2024 05:55:10 - INFO - llamafactory.data.loader - Loading dataset alpaca_en_demo.json...
Converting format of dataset (num_proc=16): 100%|██████████████████████████████████████| 1000/1000 [00:00<00:00, 4747.58 examples/s]
Beta Was this translation helpful? Give feedback.
All reactions