You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
我使用了你们开元的50000条的数据集,当使用data_process.py进行数据处理时,报错了:
The dataset will save in HuatuoGPT2_sft_instruct_GPT4_HuatuoGPT2-7B_4096_dataset
Traceback (most recent call last):
File "/home/ly/test/HuatuoGPT-II-main/adaption/one_stage_training/data_process.py", line 274, in
preprocess(args)
File "/home/ly/test/HuatuoGPT-II-main/adaption/one_stage_training/data_process.py", line 201, in preprocess
train_dataset = HuatuoGPT_data(args, tokenizer)
File "/home/ly/test/HuatuoGPT-II-main/adaption/one_stage_training/data_process.py", line 90, in init
self.data_dict = json.load(f)
File "/home/ly/anaconda3/envs/huatuo/lib/python3.8/json/init.py", line 293, in load
return loads(fp.read(),
File "/home/ly/anaconda3/envs/huatuo/lib/python3.8/json/init.py", line 357, in loads
return _default_decoder.decode(s)
File "/home/ly/anaconda3/envs/huatuo/lib/python3.8/json/decoder.py", line 340, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 598)
Process finished with exit code 1
数据集长这样:
应该如何修改数据集格式?或者修改某处代码?
The text was updated successfully, but these errors were encountered:
我使用了你们开元的50000条的数据集,当使用data_process.py进行数据处理时,报错了:
The dataset will save in HuatuoGPT2_sft_instruct_GPT4_HuatuoGPT2-7B_4096_dataset
Traceback (most recent call last):
File "/home/ly/test/HuatuoGPT-II-main/adaption/one_stage_training/data_process.py", line 274, in
preprocess(args)
File "/home/ly/test/HuatuoGPT-II-main/adaption/one_stage_training/data_process.py", line 201, in preprocess
train_dataset = HuatuoGPT_data(args, tokenizer)
File "/home/ly/test/HuatuoGPT-II-main/adaption/one_stage_training/data_process.py", line 90, in init
self.data_dict = json.load(f)
File "/home/ly/anaconda3/envs/huatuo/lib/python3.8/json/init.py", line 293, in load
return loads(fp.read(),
File "/home/ly/anaconda3/envs/huatuo/lib/python3.8/json/init.py", line 357, in loads
return _default_decoder.decode(s)
File "/home/ly/anaconda3/envs/huatuo/lib/python3.8/json/decoder.py", line 340, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 598)
Process finished with exit code 1
数据集长这样:
应该如何修改数据集格式?或者修改某处代码?
The text was updated successfully, but these errors were encountered: