使用InstructionWild数据中的seed_prompts_en.jsonl数据集对llama-7b模型进行预训练报错 #4387

13416157913 · 2023-08-07T05:58:13Z

13416157913
Aug 7, 2023

执行命令是：torchrun --standalone --nproc_per_node=1 train_sft.py --pretrain "/home/llm-deploy/ColossalAI/llama-7b" --model 'llama' --strategy colossalai_zero2 --log_interval 10 --save_path /home/llm-deploy/ColossalAI/output/llama-7B --dataset /home/llm-deploy/ColossalAI/InstructionWild/data/seed_prompts_en.jsonl --batch_size 1 --lr 2e-5 --max_epochs 1

报错信息如下：

Answered by flybird11111

Aug 9, 2023

执行命令是：torchrun --standalone --nproc_per_node=1 train_sft.py --pretrain "/home/llm-deploy/ColossalAI/llama-7b" --model 'llama' --strategy colossalai_zero2 --log_interval 10 --save_path /home/llm-deploy/ColossalAI/output/llama-7B --dataset /home/llm-deploy/ColossalAI/InstructionWild/data/seed_prompts_en.jsonl --batch_size 1 --lr 2e-5 --max_epochs 1

报错信息如下：

The training data is contained in these two files.

View full answer

CWHer · 2023-08-09T05:40:10Z

CWHer
Aug 9, 2023

It seems that you used the wrong dataset. 🤔

0 replies

CWHer · 2023-08-09T05:42:49Z

CWHer
Aug 9, 2023

The dataset should follow the following format,

[
    {
        "instruction": "Give three tips for staying healthy.",
        "input": "",
        "output": "1.Eat a balanced diet and make sure to include plenty of fruits and vegetables. \n2. Exercise regularly to keep your body active and strong. \n3. Get enough sleep and maintain a consistent sleep schedule."
    },
...

0 replies

flybird11111 · 2023-08-09T06:47:26Z

flybird11111
Aug 9, 2023
Collaborator

执行命令是：torchrun --standalone --nproc_per_node=1 train_sft.py --pretrain "/home/llm-deploy/ColossalAI/llama-7b" --model 'llama' --strategy colossalai_zero2 --log_interval 10 --save_path /home/llm-deploy/ColossalAI/output/llama-7B --dataset /home/llm-deploy/ColossalAI/InstructionWild/data/seed_prompts_en.jsonl --batch_size 1 --lr 2e-5 --max_epochs 1

报错信息如下：

The training data is contained in these two files.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

使用InstructionWild数据中的seed_prompts_en.jsonl数据集对llama-7b模型进行预训练报错 #4387

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

使用InstructionWild数据中的seed_prompts_en.jsonl数据集对llama-7b模型进行预训练报错 #4387

13416157913 Aug 7, 2023

Replies: 3 comments

CWHer Aug 9, 2023

CWHer Aug 9, 2023

flybird11111 Aug 9, 2023 Collaborator

13416157913
Aug 7, 2023

CWHer
Aug 9, 2023

CWHer
Aug 9, 2023

flybird11111
Aug 9, 2023
Collaborator