You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
E:\ChatGLM\ChatGLM3\ChatGLM-LoRA>python tokenize_dataset_rows.py ^
More? --jsonl_path data/alpaca_data.jsonl ^
More? --save_path data/alpaca ^
More? --max_seq_length 200
0%| | 0/52002 [00:00<?, ?it/s]
Generating train split: 0 examples [00:02, ? examples/s] | 0/52002 [00:00<?, ?it/s]
Traceback (most recent call last):
File "e:\anaconda3\Lib\site-packages\datasets\builder.py", line 1676, in _prepare_split_single
for key, record in generator:
File "e:\anaconda3\Lib\site-packages\datasets\packaged_modules\generator\generator.py", line 30, in _generate_examples
for idx, ex in enumerate(self.config.generator(**gen_kwargs)):
File "E:\ChatGLM\ChatGLM3\ChatGLM-LoRA\tokenize_dataset_rows.py", line 31, in read_jsonl
feature = preprocess(tokenizer, config, example, max_seq_length)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\ChatGLM\ChatGLM3\ChatGLM-LoRA\tokenize_dataset_rows.py", line 10, in preprocess
prompt = example["text"]
~~~~~~~^^^^^^^^
KeyError: 'text'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "E:\ChatGLM\ChatGLM3\ChatGLM-LoRA\tokenize_dataset_rows.py", line 53, in <module>
main()
File "E:\ChatGLM\ChatGLM3\ChatGLM-LoRA\tokenize_dataset_rows.py", line 46, in main
dataset = datasets.Dataset.from_generator(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "e:\anaconda3\Lib\site-packages\datasets\arrow_dataset.py", line 1072, in from_generator
).read()
^^^^^^
File "e:\anaconda3\Lib\site-packages\datasets\io\generator.py", line 47, in read
self.builder.download_and_prepare(
File "e:\anaconda3\Lib\site-packages\datasets\builder.py", line 954, in download_and_prepare
self._download_and_prepare(
File "e:\anaconda3\Lib\site-packages\datasets\builder.py", line 1717, in _download_and_prepare
super()._download_and_prepare(
File "e:\anaconda3\Lib\site-packages\datasets\builder.py", line 1049, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "e:\anaconda3\Lib\site-packages\datasets\builder.py", line 1555, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File "e:\anaconda3\Lib\site-packages\datasets\builder.py", line 1712, in _prepare_split_single
raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.builder.DatasetGenerationError: An error occurred while generating the dataset
查阅信息,没有找到有效方法
有没有大佬邦邦鸭——
The text was updated successfully, but these errors were encountered:
按照步骤生成了jsonl文件
然后运行一下代码
报错
查阅信息,没有找到有效方法
有没有大佬邦邦鸭——
The text was updated successfully, but these errors were encountered: