Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset not found #11

Open
remylouisew opened this issue Jan 9, 2024 · 0 comments
Open

Dataset not found #11

remylouisew opened this issue Jan 9, 2024 · 0 comments

Comments

@remylouisew
Copy link

The code provided to prepare the wmt_t2t_translate dataset fails with this error:

File "/opt/conda/lib/python3.7/site-packages/tensorflow_datasets/translate/wmt.py", line 1000, in _parse_parallel_sentences
assert f1_files and f2_files, "No matching files found: %s, %s." % (f1, f2)
AssertionError: No matching files found: gs://rw-tpu/datasets/downloads/extracted/TAR_GZ.statmt.org_wmt13_traini-parall-europa-v7AiTHxxDIoGPf2JOwzAgwIC1h9MdcF-uOMYNhA9J9luc.tgz/training/europarl-v7.de-en.de, gs://rw-tpu/datasets/downloads/extracted/TAR_GZ.statmt.org_wmt13_traini-parall-europa-v7AiTHxxDIoGPf2JOwzAgwIC1h9MdcF-uOMYNhA9J9luc.tgz/training/europarl-v7.de-en.en.

The files mentioned do indeed exist at that location, and both myself and the service account used by the Vertex Workbench are able to access the files. Perhaps there is a problem with the config?

In addition, the code to download the xsum dataset that is linked in this tutorial fails due to some syntax error in their creation script. Not exactly your problem, but wanted to let you know. I was able to build the squad and cnn_dailymail datasets successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant