Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection reset by peer error when dataset:pull #40

Open
gifdog97 opened this issue Aug 6, 2024 · 5 comments
Open

Connection reset by peer error when dataset:pull #40

gifdog97 opened this issue Aug 6, 2024 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@gifdog97
Copy link

gifdog97 commented Aug 6, 2024

I tried to download dataset with the command poetry run zrc datasets:pull <dataset-name>, but failed with the following error, which happened for all the dataset.

Traceback (most recent call last):
  File "/Users/skando/MyProjects/DAU_HAS/.venv/lib/python3.8/site-packages/urllib3/response.py", line 748, in _error_catcher
    yield
  File "/Users/skando/MyProjects/DAU_HAS/.venv/lib/python3.8/site-packages/urllib3/response.py", line 873, in _raw_read
    data = self._fp_read(amt, read1=read1) if not fp_closed else b""
  File "/Users/skando/MyProjects/DAU_HAS/.venv/lib/python3.8/site-packages/urllib3/response.py", line 856, in _fp_read
    return self._fp.read(amt) if amt is not None else self._fp.read()
  File "/Users/skando/.pyenv/versions/3.8.18/lib/python3.8/http/client.py", line 459, in read
    n = self.readinto(b)
  File "/Users/skando/.pyenv/versions/3.8.18/lib/python3.8/http/client.py", line 503, in readinto
    n = self.fp.readinto(b)
  File "/Users/skando/.pyenv/versions/3.8.18/lib/python3.8/socket.py", line 669, in readinto
    return self._sock.recv_into(b)
  File "/Users/skando/.pyenv/versions/3.8.18/lib/python3.8/ssl.py", line 1274, in recv_into
    return self.read(nbytes, buffer)
  File "/Users/skando/.pyenv/versions/3.8.18/lib/python3.8/ssl.py", line 1132, in read
    return self._sslobj.read(len, buffer)
ConnectionResetError: [Errno 54] Connection reset by peer

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/skando/MyProjects/DAU_HAS/.venv/lib/python3.8/site-packages/requests/models.py", line 820, in generate
    yield from self.raw.stream(chunk_size, decode_content=True)
  File "/Users/skando/MyProjects/DAU_HAS/.venv/lib/python3.8/site-packages/urllib3/response.py", line 1060, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "/Users/skando/MyProjects/DAU_HAS/.venv/lib/python3.8/site-packages/urllib3/response.py", line 949, in read
    data = self._raw_read(amt)
  File "/Users/skando/MyProjects/DAU_HAS/.venv/lib/python3.8/site-packages/urllib3/response.py", line 902, in _raw_read
    self._fp.close()
  File "/Users/skando/.pyenv/versions/3.8.18/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "/Users/skando/MyProjects/DAU_HAS/.venv/lib/python3.8/site-packages/urllib3/response.py", line 775, in _error_catcher
    raise ProtocolError(f"Connection broken: {e!r}", e) from e
urllib3.exceptions.ProtocolError: ("Connection broken: ConnectionResetError(54, 'Connection reset by peer')", ConnectionResetError(54, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/skando/MyProjects/DAU_HAS/.venv/bin/zrc", line 8, in <module>
    sys.exit(main())
  File "/Users/skando/MyProjects/DAU_HAS/.venv/lib/python3.8/site-packages/zerospeech/startup.py", line 39, in main
    cli.run()
  File "/Users/skando/MyProjects/DAU_HAS/.venv/lib/python3.8/site-packages/zerospeech/cmd/cli_lib.py", line 258, in run
    cmd.run_cmd(argv=sys.argv[2:])
  File "/Users/skando/MyProjects/DAU_HAS/.venv/lib/python3.8/site-packages/zerospeech/cmd/cli_lib.py", line 91, in run_cmd
    self.run(args)
  File "/Users/skando/MyProjects/DAU_HAS/.venv/lib/python3.8/site-packages/zerospeech/cmd/datasets.py", line 72, in run
    dataset.pull(quiet=argv.quiet, show_progress=True, verify=not argv.skip_verification)
  File "/Users/skando/MyProjects/DAU_HAS/.venv/lib/python3.8/site-packages/zerospeech/datasets/_model.py", line 86, in pull
    download_extract_archive(self.origin.zip_url, self.location, int(self.origin.total_size),
  File "/Users/skando/MyProjects/DAU_HAS/.venv/lib/python3.8/site-packages/zerospeech/misc.py", line 209, in download_extract_archive
    for chunk in response.iter_content(chunk_size=1024):
  File "/Users/skando/MyProjects/DAU_HAS/.venv/lib/python3.8/site-packages/requests/models.py", line 822, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ("Connection broken: ConnectionResetError(54, 'Connection reset by peer')", ConnectionResetError(54, 'Connection reset by peer'))

My environments:

  • MacOS 14.5
  • Python version: 3.8.18 (managed with pyenv)
  • Environment Manager: Poetry (version 1.8.3)
  • zerospeech-benchmarks version: 0.9.4
@gifdog97
Copy link
Author

gifdog97 commented Aug 6, 2024

Anyway, are there alternative ways to download dataset other than using zerospeech-benchmark library? Thank you in advance! 🙇

@nhamilakis nhamilakis self-assigned this Aug 6, 2024
@nhamilakis nhamilakis added the bug Something isn't working label Aug 6, 2024
@nhamilakis
Copy link
Contributor

Hello, could you specify which dataset were you trying to download so i can try and debug the issue ?
In the meantime, you can find the index with links for all the datasets here https://download.zerospeech.com

@gifdog97
Copy link
Author

gifdog97 commented Aug 6, 2024

could you specify which dataset were you trying to download so i can try and debug the issue ?

Actually I tried all the dataset I can find and failed for all the cases.. Below is an output of zrc datasets.

┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━┓
┃ Name                  ┃ Origin                  ┃ Size   ┃ Installed ┃
┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━┩
│ abxLS-dataset         │ download.zerospeech.com │ 2.2GB  │ False     │
│ sLM21-dataset         │ download.zerospeech.com │ 30.6GB │ False     │
│ zrc2017-test-dataset  │ download.zerospeech.com │ 9.6GB  │ False     │
│ zrc2017-train-dataset │ download.zerospeech.com │ 11.0GB │ False     │
│ zr2015-buckeye        │ external                │ 4.2GB  │ False     │
│ zr2015-nchlt          │ external                │ 6.2GB  │ False     │
│ prosaudit-dataset     │ download.zerospeech.com │ 1.0GB  │ False     │
└───────────────────────┴─────────────────────────┴────────┴───────────┘

In the meantime, you can find the index with links for all the datasets here https://download.zerospeech.com

Thank you! I will try it.

@gifdog97
Copy link
Author

gifdog97 commented Aug 6, 2024

Sorry, my reply was inaccurate.
I got the Connection reset by peer error except for zr2015-buckeye and zr2015-nchlt. For these two dataset, I got the different error like below:

Traceback (most recent call last):
  File "/Users/skando/MyProjects/DAU_HAS/.venv/bin/zrc", line 8, in <module>
    sys.exit(main())
  File "/Users/skando/MyProjects/DAU_HAS/.venv/lib/python3.8/site-packages/zerospeech/startup.py", line 39, in main
    cli.run()
  File "/Users/skando/MyProjects/DAU_HAS/.venv/lib/python3.8/site-packages/zerospeech/cmd/cli_lib.py", line 258, in run
    cmd.run_cmd(argv=sys.argv[2:])
  File "/Users/skando/MyProjects/DAU_HAS/.venv/lib/python3.8/site-packages/zerospeech/cmd/cli_lib.py", line 91, in run_cmd
    self.run(args)
  File "/Users/skando/MyProjects/DAU_HAS/.venv/lib/python3.8/site-packages/zerospeech/cmd/datasets.py", line 72, in run
    dataset.pull(quiet=argv.quiet, show_progress=True, verify=not argv.skip_verification)
  File "/Users/skando/MyProjects/DAU_HAS/.venv/lib/python3.8/site-packages/zerospeech/datasets/_model.py", line 79, in pull
    raise ValueError("External datasets cannot be pulled from the repository !!")
ValueError: External datasets cannot be pulled from the repository !!

@nhamilakis
Copy link
Contributor

Thank you for your response, i will investigate the issue.

As for the zr2015-buckeye and zr2015-nchlt it is normal, i need to remove them from the index as they are not supposed to be included in the tool.

They are datasets for the 2015 benchmark, but we do not have permission to redistribute them. And the 2015 benchmark has some technical issues which prevent it from being packaged in this tool. So they need to be removed from the index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants