-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question]How to solve [datasets.builder.DatasetGenerationError: An error occurred while generating the dataset] #35
Comments
environment was set automatically by the file requiremets.txt |
同样遇到这个问题,看起来应该是adaseq加载数据集的时候,可能处理逻辑有问题,加载数据集的格式 ···text 可能有点问题 |
是因为数据集找不到或者数据集不是标准的解析格式,可以按照toy msra的加载代码重写一下数据加载 |
@PPPP-kaqiu 你重新写了吗?可以分享一下吗 |
@Shawnzheng011019 请问解决了吗,大哥 |
完全按照hf dataset的格式写数据加载脚本,yaml的数据加载就只写数据那个文件夹就好了 |
@PPPP-kaqiu 加个微信吧大哥,求教啊WX:Xugeyuan923 |
大神您好可以分享一下怎么解决的吗 |
@PPPP-kaqiu 佬儿,可以加一下联系方式指导一下吗?vx:j626850567 |
What is your question?
Traceback (most recent call last):
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\datasets\builder.py", line 1618, in _prepare_split_single
writer = writer_class(
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\datasets\arrow_writer.py", line 334, in init
self.stream = self._fs.open(fs_token_paths[2][0], "wb")
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\fsspec\spec.py", line 1309, in open
f = self._open(
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\fsspec\implementations\local.py", line 180, in _open
return LocalFileOpener(path, mode, fs=self, **kwargs)
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\fsspec\implementations\local.py", line 298, in init
self._open()
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\fsspec\implementations\local.py", line 303, in _open
self.f = open(self.path, mode=self.mode)
FileNotFoundError: [Errno 2] No such file or directory: 'C:/Users/shawn/.cache/huggingface/datasets/named_entity_recognition_dataset_builder/default-c270794ce0d
23d06/0.0.0/db737b9bb893f20fb03d04403a30bf7c033256c212b7e9f0ebc6e9c958535c51.incomplete/named_entity_recognition_dataset_builder-train-00000-00000-of-NNNNN.arro
w'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\runpy.py", line 197, in _run_module_as_main
return run_code(code, main_globals, None,
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Users\shawn\anaconda3\envs\pytorch\Scripts\adaseq.exe_main.py", line 7, in
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\adaseq\main.py", line 13, in run
main(prog='adaseq')
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\adaseq\commands_init.py", line 29, in main
args.func(args)
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\adaseq\commands\train.py", line 84, in train_model_from_args
train_model(
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\adaseq\commands\train.py", line 156, in train_model
trainer = build_trainer_from_partial_objects(
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\adaseq\commands\train.py", line 185, in build_trainer_from_partial_objects
dm = DatasetManager.from_config(task=config.task, **config.dataset)
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\adaseq\data\dataset_manager.py", line 182, in from_config
hfdataset = hf_load_dataset(path, name=name, **kwargs)
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\datasets\load.py", line 1797, in load_dataset
builder_instance.download_and_prepare(
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\datasets\builder.py", line 909, in download_and_prepare
self._download_and_prepare(
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\datasets\builder.py", line 1670, in _download_and_prepare
super()._download_and_prepare(
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\datasets\builder.py", line 1004, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\datasets\builder.py", line 1508, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\datasets\builder.py", line 1665, in _prepare_split_single
raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.builder.DatasetGenerationError: An error occurred while generating the dataset
What have you tried?
set http proxy and successfully conneted to Youtube.
Code (if necessary)
No response
What's your environment?
Code of Conduct
The text was updated successfully, but these errors were encountered: