Dataset not found #11

remylouisew · 2024-01-09T18:43:17Z

The code provided to prepare the wmt_t2t_translate dataset fails with this error:

File "/opt/conda/lib/python3.7/site-packages/tensorflow_datasets/translate/wmt.py", line 1000, in _parse_parallel_sentences
assert f1_files and f2_files, "No matching files found: %s, %s." % (f1, f2)
AssertionError: No matching files found: gs://rw-tpu/datasets/downloads/extracted/TAR_GZ.statmt.org_wmt13_traini-parall-europa-v7AiTHxxDIoGPf2JOwzAgwIC1h9MdcF-uOMYNhA9J9luc.tgz/training/europarl-v7.de-en.de, gs://rw-tpu/datasets/downloads/extracted/TAR_GZ.statmt.org_wmt13_traini-parall-europa-v7AiTHxxDIoGPf2JOwzAgwIC1h9MdcF-uOMYNhA9J9luc.tgz/training/europarl-v7.de-en.en.

The files mentioned do indeed exist at that location, and both myself and the service account used by the Vertex Workbench are able to access the files. Perhaps there is a problem with the config?

In addition, the code to download the xsum dataset that is linked in this tutorial fails due to some syntax error in their creation script. Not exactly your problem, but wanted to let you know. I was able to build the squad and cnn_dailymail datasets successfully.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset not found #11

Dataset not found #11

remylouisew commented Jan 9, 2024

Dataset not found #11

Dataset not found #11

Comments

remylouisew commented Jan 9, 2024