You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[INFO|modeling_utils.py:1670] 2024-11-23 13:17:09,877 >> Instantiating Qwen2VisionTransformerPretrainedModel model under default dtype torch.bfloat16.
[WARNING|logging.py:168] 2024-11-23 13:17:09,890 >> Qwen2VLRotaryEmbedding can now be fully parameterized by passing the model config through the config argument. All other arguments will be removed in v4.46
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:03<00:00, 1.32it/s]
[INFO|modeling_utils.py:4800] 2024-11-23 13:17:13,974 >> All model checkpoint weights were used when initializing Qwen2VLForConditionalGeneration.
[INFO|modeling_utils.py:4808] 2024-11-23 13:17:13,974 >> All the weights of Qwen2VLForConditionalGeneration were initialized from the model checkpoint at /root/autodl-tmp/Qwen2-VL-7B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2VLForConditionalGeneration for predictions without further training.
[INFO|configuration_utils.py:1049] 2024-11-23 13:17:13,977 >> loading configuration file /root/autodl-tmp/Qwen2-VL-7B-Instruct/generation_config.json
[INFO|configuration_utils.py:1096] 2024-11-23 13:17:13,978 >> Generate config GenerationConfig {
"bos_token_id": 151643,
"do_sample": true,
"eos_token_id": [
151645,
151643
],
"pad_token_id": 151643,
"temperature": 0.01,
"top_k": 1,
"top_p": 0.001
}
[INFO|2024-11-23 13:17:13] llamafactory.model.model_utils.checkpointing:157 >> Gradient checkpointing enabled.
[INFO|2024-11-23 13:17:13] llamafactory.model.model_utils.attention:157 >> Using torch SDPA for faster training and inference.
[INFO|2024-11-23 13:17:13] llamafactory.model.adapter:157 >> Upcasting trainable params to float32.
[INFO|2024-11-23 13:17:13] llamafactory.model.adapter:157 >> Fine-tuning method: LoRA
[INFO|2024-11-23 13:17:13] llamafactory.model.model_utils.misc:157 >> Found linear modules: k_proj,q_proj,o_proj,up_proj,v_proj,down_proj,gate_proj
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:03<00:00, 1.32it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:03<00:00, 1.28it/s]
/root/LLaMA-Factory/src/llamafactory/train/sft/trainer.py:54: FutureWarning: tokenizer is deprecated and will be removed in version 5.0.0 for CustomSeq2SeqTrainer.__init__. Use processing_class instead.
super().init(**kwargs)
[INFO|2024-11-23 13:17:15] llamafactory.model.loader:157 >> trainable params: 20,185,088 || all params: 8,311,560,704 || trainable%: 0.2429
/root/LLaMA-Factory/src/llamafactory/train/sft/trainer.py:54: FutureWarning: tokenizer is deprecated and will be removed in version 5.0.0 for CustomSeq2SeqTrainer.__init__. Use processing_class instead.
super().init(**kwargs)
[INFO|trainer.py:698] 2024-11-23 13:17:15,186 >> Using auto half precision backend
/root/LLaMA-Factory/src/llamafactory/train/sft/trainer.py:54: FutureWarning: tokenizer is deprecated and will be removed in version 5.0.0 for CustomSeq2SeqTrainer.__init__. Use processing_class instead.
super().init(**kwargs)
[INFO|trainer.py:2313] 2024-11-23 13:17:15,653 >> ***** Running training *****
[INFO|trainer.py:2314] 2024-11-23 13:17:15,653 >> Num examples = 46
[INFO|trainer.py:2315] 2024-11-23 13:17:15,653 >> Num Epochs = 3
[INFO|trainer.py:2316] 2024-11-23 13:17:15,653 >> Instantaneous batch size per device = 2
[INFO|trainer.py:2319] 2024-11-23 13:17:15,654 >> Total train batch size (w. parallel, distributed & accumulation) = 48
[INFO|trainer.py:2320] 2024-11-23 13:17:15,654 >> Gradient Accumulation steps = 8
[INFO|trainer.py:2321] 2024-11-23 13:17:15,654 >> Total optimization steps = 3
[INFO|trainer.py:2322] 2024-11-23 13:17:15,657 >> Number of trainable parameters = 20,185,088
0%| | 0/3 [00:00<?, ?it/s]E1123 13:17:21.332000 140454999893184 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: -8) local_rank: 0 (pid: 5090) of binary: /root/miniconda3/bin/python
Traceback (most recent call last):
File "/root/miniconda3/bin/torchrun", line 8, in
sys.exit(main())
^^^^^^
File "/root/miniconda3/lib/python3.12/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 347, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.12/site-packages/torch/distributed/run.py", line 879, in main
run(args)
File "/root/miniconda3/lib/python3.12/site-packages/torch/distributed/run.py", line 870, in run
elastic_launch(
File "/root/miniconda3/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 132, in call
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 263, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
hiyouga
changed the title
很急的求助!/root/LLaMA-Factory/src/llamafactory/launcher.py FAILED
/root/LLaMA-Factory/src/llamafactory/launcher.py FAILED
Nov 23, 2024
Reminder
System Info
root@autodl-container-40b74f9912-1ab26877:~# llamafactory-cli env
[2024-11-23 13:16:23,920] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
llamafactory
version: 0.9.1.dev0Reproduction
[INFO|2024-11-23 13:17:00] llamafactory.cli:157 >> Initializing distributed tasks at: 127.0.0.1:26797
[2024-11-23 13:17:04,905] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-11-23 13:17:04,980] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-11-23 13:17:04,994] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING|2024-11-23 13:17:06] llamafactory.hparams.parser:162 >>
ddp_find_unused_parameters
needs to be set as False for LoRA in DDP training.[INFO|2024-11-23 13:17:06] llamafactory.hparams.parser:355 >> Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
[INFO|configuration_utils.py:677] 2024-11-23 13:17:06,228 >> loading configuration file /root/autodl-tmp/Qwen2-VL-7B-Instruct/config.json
[INFO|configuration_utils.py:746] 2024-11-23 13:17:06,230 >> Model config Qwen2VLConfig {
"_name_or_path": "/root/autodl-tmp/Qwen2-VL-7B-Instruct",
"architectures": [
"Qwen2VLForConditionalGeneration"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 3584,
"image_token_id": 151655,
"initializer_range": 0.02,
"intermediate_size": 18944,
"max_position_embeddings": 32768,
"max_window_layers": 28,
"model_type": "qwen2_vl",
"num_attention_heads": 28,
"num_hidden_layers": 28,
"num_key_value_heads": 4,
"rms_norm_eps": 1e-06,
"rope_scaling": {
"mrope_section": [
16,
24,
24
],
"rope_type": "default",
"type": "default"
},
"rope_theta": 1000000.0,
"sliding_window": 32768,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.46.1",
"use_cache": true,
"use_sliding_window": false,
"video_token_id": 151656,
"vision_config": {
"in_chans": 3,
"model_type": "qwen2_vl",
"spatial_patch_size": 14
},
"vision_end_token_id": 151653,
"vision_start_token_id": 151652,
"vision_token_id": 151654,
"vocab_size": 152064
}
[INFO|tokenization_utils_base.py:2209] 2024-11-23 13:17:06,231 >> loading file vocab.json
[INFO|tokenization_utils_base.py:2209] 2024-11-23 13:17:06,231 >> loading file merges.txt
[INFO|tokenization_utils_base.py:2209] 2024-11-23 13:17:06,231 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2209] 2024-11-23 13:17:06,231 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2209] 2024-11-23 13:17:06,231 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2209] 2024-11-23 13:17:06,231 >> loading file tokenizer_config.json
[INFO|2024-11-23 13:17:06] llamafactory.hparams.parser:355 >> Process rank: 1, device: cuda:1, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
[INFO|2024-11-23 13:17:06] llamafactory.hparams.parser:355 >> Process rank: 2, device: cuda:2, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
[INFO|tokenization_utils_base.py:2475] 2024-11-23 13:17:06,472 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|image_processing_base.py:373] 2024-11-23 13:17:06,473 >> loading configuration file /root/autodl-tmp/Qwen2-VL-7B-Instruct/preprocessor_config.json
[INFO|image_processing_base.py:373] 2024-11-23 13:17:06,475 >> loading configuration file /root/autodl-tmp/Qwen2-VL-7B-Instruct/preprocessor_config.json
[INFO|image_processing_base.py:429] 2024-11-23 13:17:06,475 >> Image processor Qwen2VLImageProcessor {
"do_convert_rgb": true,
"do_normalize": true,
"do_rescale": true,
"do_resize": true,
"image_mean": [
0.48145466,
0.4578275,
0.40821073
],
"image_processor_type": "Qwen2VLImageProcessor",
"image_std": [
0.26862954,
0.26130258,
0.27577711
],
"max_pixels": 12845056,
"merge_size": 2,
"min_pixels": 3136,
"patch_size": 14,
"processor_class": "Qwen2VLProcessor",
"resample": 3,
"rescale_factor": 0.00392156862745098,
"size": {
"max_pixels": 12845056,
"min_pixels": 3136
},
"temporal_patch_size": 2
}
[INFO|tokenization_utils_base.py:2209] 2024-11-23 13:17:06,475 >> loading file vocab.json
[INFO|tokenization_utils_base.py:2209] 2024-11-23 13:17:06,475 >> loading file merges.txt
[INFO|tokenization_utils_base.py:2209] 2024-11-23 13:17:06,475 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2209] 2024-11-23 13:17:06,475 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2209] 2024-11-23 13:17:06,476 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2209] 2024-11-23 13:17:06,476 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2475] 2024-11-23 13:17:06,705 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|processing_utils.py:755] 2024-11-23 13:17:07,088 >> Processor Qwen2VLProcessor:
image_processor: Qwen2VLImageProcessor {
"do_convert_rgb": true,
"do_normalize": true,
"do_rescale": true,
"do_resize": true,
"image_mean": [
0.48145466,
0.4578275,
0.40821073
],
"image_processor_type": "Qwen2VLImageProcessor",
"image_std": [
0.26862954,
0.26130258,
0.27577711
],
"max_pixels": 12845056,
"merge_size": 2,
"min_pixels": 3136,
"patch_size": 14,
"processor_class": "Qwen2VLProcessor",
"resample": 3,
"rescale_factor": 0.00392156862745098,
"size": {
"max_pixels": 12845056,
"min_pixels": 3136
},
"temporal_patch_size": 2
}
tokenizer: Qwen2TokenizerFast(name_or_path='/root/autodl-tmp/Qwen2-VL-7B-Instruct', vocab_size=151643, model_max_length=32768, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'eos_token': '<|im_end|>', 'pad_token': '<|endoftext|>', 'additional_special_tokens': ['<|im_start|>', '<|im_end|>', '<|object_ref_start|>', '<|object_ref_end|>', '<|box_start|>', '<|box_end|>', '<|quad_start|>', '<|quad_end|>', '<|vision_start|>', '<|vision_end|>', '<|vision_pad|>', '<|image_pad|>', '<|video_pad|>']}, clean_up_tokenization_spaces=False), added_tokens_decoder={
151643: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151644: AddedToken("<|im_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151645: AddedToken("<|im_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151646: AddedToken("<|object_ref_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151647: AddedToken("<|object_ref_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151648: AddedToken("<|box_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151649: AddedToken("<|box_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151650: AddedToken("<|quad_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151651: AddedToken("<|quad_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151652: AddedToken("<|vision_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151653: AddedToken("<|vision_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151654: AddedToken("<|vision_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151655: AddedToken("<|image_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151656: AddedToken("<|video_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
{
"processor_class": "Qwen2VLProcessor"
}
[INFO|2024-11-23 13:17:07] llamafactory.data.loader:157 >> Loading dataset deepseek.json...
my-dataset-is-secert
<|im_end|>
[INFO|configuration_utils.py:677] 2024-11-23 13:17:09,864 >> loading configuration file /root/autodl-tmp/Qwen2-VL-7B-Instruct/config.json
[INFO|configuration_utils.py:746] 2024-11-23 13:17:09,865 >> Model config Qwen2VLConfig {
"_name_or_path": "/root/autodl-tmp/Qwen2-VL-7B-Instruct",
"architectures": [
"Qwen2VLForConditionalGeneration"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 3584,
"image_token_id": 151655,
"initializer_range": 0.02,
"intermediate_size": 18944,
"max_position_embeddings": 32768,
"max_window_layers": 28,
"model_type": "qwen2_vl",
"num_attention_heads": 28,
"num_hidden_layers": 28,
"num_key_value_heads": 4,
"rms_norm_eps": 1e-06,
"rope_scaling": {
"mrope_section": [
16,
24,
24
],
"rope_type": "default",
"type": "default"
},
"rope_theta": 1000000.0,
"sliding_window": 32768,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.46.1",
"use_cache": true,
"use_sliding_window": false,
"video_token_id": 151656,
"vision_config": {
"in_chans": 3,
"model_type": "qwen2_vl",
"spatial_patch_size": 14
},
"vision_end_token_id": 151653,
"vision_start_token_id": 151652,
"vision_token_id": 151654,
"vocab_size": 152064
}
[INFO|modeling_utils.py:3934] 2024-11-23 13:17:09,875 >> loading weights file /root/autodl-tmp/Qwen2-VL-7B-Instruct/model.safetensors.index.json
[INFO|modeling_utils.py:1670] 2024-11-23 13:17:09,876 >> Instantiating Qwen2VLForConditionalGeneration model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:1096] 2024-11-23 13:17:09,877 >> Generate config GenerationConfig {
"bos_token_id": 151643,
"eos_token_id": 151645
}
[INFO|modeling_utils.py:1670] 2024-11-23 13:17:09,877 >> Instantiating Qwen2VisionTransformerPretrainedModel model under default dtype torch.bfloat16.
[WARNING|logging.py:168] 2024-11-23 13:17:09,890 >>
Qwen2VLRotaryEmbedding
can now be fully parameterized by passing the model config through theconfig
argument. All other arguments will be removed in v4.46Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:03<00:00, 1.32it/s]
[INFO|modeling_utils.py:4800] 2024-11-23 13:17:13,974 >> All model checkpoint weights were used when initializing Qwen2VLForConditionalGeneration.
[INFO|modeling_utils.py:4808] 2024-11-23 13:17:13,974 >> All the weights of Qwen2VLForConditionalGeneration were initialized from the model checkpoint at /root/autodl-tmp/Qwen2-VL-7B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2VLForConditionalGeneration for predictions without further training.
[INFO|configuration_utils.py:1049] 2024-11-23 13:17:13,977 >> loading configuration file /root/autodl-tmp/Qwen2-VL-7B-Instruct/generation_config.json
[INFO|configuration_utils.py:1096] 2024-11-23 13:17:13,978 >> Generate config GenerationConfig {
"bos_token_id": 151643,
"do_sample": true,
"eos_token_id": [
151645,
151643
],
"pad_token_id": 151643,
"temperature": 0.01,
"top_k": 1,
"top_p": 0.001
}
[INFO|2024-11-23 13:17:13] llamafactory.model.model_utils.checkpointing:157 >> Gradient checkpointing enabled.
[INFO|2024-11-23 13:17:13] llamafactory.model.model_utils.attention:157 >> Using torch SDPA for faster training and inference.
[INFO|2024-11-23 13:17:13] llamafactory.model.adapter:157 >> Upcasting trainable params to float32.
[INFO|2024-11-23 13:17:13] llamafactory.model.adapter:157 >> Fine-tuning method: LoRA
[INFO|2024-11-23 13:17:13] llamafactory.model.model_utils.misc:157 >> Found linear modules: k_proj,q_proj,o_proj,up_proj,v_proj,down_proj,gate_proj
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:03<00:00, 1.32it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:03<00:00, 1.28it/s]
/root/LLaMA-Factory/src/llamafactory/train/sft/trainer.py:54: FutureWarning:
tokenizer
is deprecated and will be removed in version 5.0.0 forCustomSeq2SeqTrainer.__init__
. Useprocessing_class
instead.super().init(**kwargs)
[INFO|2024-11-23 13:17:15] llamafactory.model.loader:157 >> trainable params: 20,185,088 || all params: 8,311,560,704 || trainable%: 0.2429
/root/LLaMA-Factory/src/llamafactory/train/sft/trainer.py:54: FutureWarning:
tokenizer
is deprecated and will be removed in version 5.0.0 forCustomSeq2SeqTrainer.__init__
. Useprocessing_class
instead.super().init(**kwargs)
[INFO|trainer.py:698] 2024-11-23 13:17:15,186 >> Using auto half precision backend
/root/LLaMA-Factory/src/llamafactory/train/sft/trainer.py:54: FutureWarning:
tokenizer
is deprecated and will be removed in version 5.0.0 forCustomSeq2SeqTrainer.__init__
. Useprocessing_class
instead.super().init(**kwargs)
[INFO|trainer.py:2313] 2024-11-23 13:17:15,653 >> ***** Running training *****
[INFO|trainer.py:2314] 2024-11-23 13:17:15,653 >> Num examples = 46
[INFO|trainer.py:2315] 2024-11-23 13:17:15,653 >> Num Epochs = 3
[INFO|trainer.py:2316] 2024-11-23 13:17:15,653 >> Instantaneous batch size per device = 2
[INFO|trainer.py:2319] 2024-11-23 13:17:15,654 >> Total train batch size (w. parallel, distributed & accumulation) = 48
[INFO|trainer.py:2320] 2024-11-23 13:17:15,654 >> Gradient Accumulation steps = 8
[INFO|trainer.py:2321] 2024-11-23 13:17:15,654 >> Total optimization steps = 3
[INFO|trainer.py:2322] 2024-11-23 13:17:15,657 >> Number of trainable parameters = 20,185,088
0%| | 0/3 [00:00<?, ?it/s]E1123 13:17:21.332000 140454999893184 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: -8) local_rank: 0 (pid: 5090) of binary: /root/miniconda3/bin/python
Traceback (most recent call last):
File "/root/miniconda3/bin/torchrun", line 8, in
sys.exit(main())
^^^^^^
File "/root/miniconda3/lib/python3.12/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 347, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.12/site-packages/torch/distributed/run.py", line 879, in main
run(args)
File "/root/miniconda3/lib/python3.12/site-packages/torch/distributed/run.py", line 870, in run
elastic_launch(
File "/root/miniconda3/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 132, in call
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 263, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
/root/LLaMA-Factory/src/llamafactory/launcher.py FAILED
Failures:
[1]:
time : 2024-11-23_13:17:21
host : autodl-container-40b74f9912-1ab26877
rank : 1 (local_rank: 1)
exitcode : -8 (pid: 5091)
error_file: <N/A>
traceback : Signal 8 (SIGFPE) received by PID 5091
[2]:
time : 2024-11-23_13:17:21
host : autodl-container-40b74f9912-1ab26877
rank : 2 (local_rank: 2)
exitcode : -8 (pid: 5092)
error_file: <N/A>
traceback : Signal 8 (SIGFPE) received by PID 5092
Root Cause (first observed failure):
[0]:
time : 2024-11-23_13:17:21
host : autodl-container-40b74f9912-1ab26877
rank : 0 (local_rank: 0)
exitcode : -8 (pid: 5090)
error_file: <N/A>
traceback : Signal 8 (SIGFPE) received by PID 5090
Expected behavior
不知为何之前还正常,再次启动训练出现了问题,求大佬解答!
@hiyouga
Others
尝试了将torch版本和cuda版本对应,但是无效
The text was updated successfully, but these errors were encountered: