Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeouts (short dns timeout value?) #2

Open
wildegnux opened this issue Jan 18, 2024 · 15 comments
Open

Timeouts (short dns timeout value?) #2

wildegnux opened this issue Jan 18, 2024 · 15 comments

Comments

@wildegnux
Copy link

I'm repeatedly getting the following errors running the downloader:

Process Process-11:
Traceback (most recent call last):
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/anyio/_core/_sockets.py", line 190, in connect_tcp
    addr_obj = ip_address(remote_host)
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/ipaddress.py", line 54, in ip_address
    raise ValueError(f'{address!r} does not appear to be an IPv4 or IPv6 address')
ValueError: 'api.pwnedpasswords.com' does not appear to be an IPv4 or IPv6 address

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/anyio/_core/_tasks.py", line 115, in fail_after
    yield cancel_scope
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpcore/_backends/anyio.py", line 114, in connect_tcp
    stream: anyio.abc.ByteStream = await anyio.connect_tcp(
                                   ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/anyio/_core/_sockets.py", line 193, in connect_tcp
    gai_res = await getaddrinfo(
              ^^^^^^^^^^^^^^^^^^
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7fbb37227890

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpcore/_exceptions.py", line 10, in map_exceptions
    yield
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpcore/_backends/anyio.py", line 113, in connect_tcp
    with anyio.fail_after(timeout):
  File "/usr/lib/python3.11/contextlib.py", line 155, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/anyio/_core/_tasks.py", line 118, in fail_after
    raise TimeoutError
TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpx/_transports/default.py", line 66, in map_httpcore_exceptions
    yield
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpx/_transports/default.py", line 366, in handle_async_request
    resp = await self._pool.handle_async_request(req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 268, in handle_async_request
    raise exc
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 251, in handle_async_request
    response = await connection.handle_async_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpcore/_async/connection.py", line 99, in handle_async_request
    raise exc
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpcore/_async/connection.py", line 76, in handle_async_request
    stream = await self._connect(request)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpcore/_async/connection.py", line 124, in _connect
    stream = await self._network_backend.connect_tcp(**kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpcore/_backends/auto.py", line 30, in connect_tcp
    return await self._backend.connect_tcp(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpcore/_backends/anyio.py", line 112, in connect_tcp
    with map_exceptions(exc_map):
  File "/usr/lib/python3.11/contextlib.py", line 155, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
    raise to_exc(exc) from exc
httpcore.ConnectTimeout

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/hibp_downloader/commands/hibp_download.py", line 178, in queue_worker_process
    asyncio.run(pwnedpasswords_get_store_gather(result_queue, hash_prefixes, worker_args))
  File "/usr/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/hibp_downloader/commands/hibp_download.py", line 182, in pwnedpasswords_get_store_gather
    metadata_results = await asyncio.gather(
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/hibp_downloader/commands/hibp_download.py", line 228, in pwnedpasswords_get_and_store_async
    binary, metadata_latest = await pwnedpasswords_get(prefix, hash_type=hash_type, etag=etag, encoding=encoding_type)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/hibp_downloader/commands/hibp_download.py", line 269, in pwnedpasswords_get
    response = await httpx_binary_response(url=url, etag=etag, encoding=encoding, debug=httpx_debug)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/hibp_downloader/lib/http.py", line 53, in httpx_binary_response
    response = await client.send(request=request, stream=True)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpx/_client.py", line 1617, in send
    response = await self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpx/_client.py", line 1645, in _send_handling_auth
    response = await self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpx/_client.py", line 1682, in _send_handling_redirects
    response = await self._send_single_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpx/_client.py", line 1719, in _send_single_request
    response = await transport.handle_async_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpx/_transports/default.py", line 365, in handle_async_request
    with map_httpcore_exceptions():
  File "/usr/lib/python3.11/contextlib.py", line 155, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpx/_transports/default.py", line 83, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.ConnectTimeout

The thread managing also does not handle the threads crashing, making those threads stall. Eventually every thread crashes and the main process gets stuck waiting for the now crashed threads to close.

@gregsadetsky
Copy link

Same here i.e. ValueError: 'api.pwnedpasswords.com' does not appear to be an IPv4 or IPv6 address (funny coincidence that I just got the same error as this issue created a few hours ago..!)

Thanks @threatpatrols for creating this tool! Cheers

@ndejong
Copy link
Contributor

ndejong commented Jan 18, 2024

Thanks for the report(s) @gregsadetsky and @wildegnux

I've not had a chance to look into this yet, however I've been able to compare Python 3.10 vs 3.11 and somewhere in between something has broken :(

My current limited check is that I've just used hibp-downloader under Python 3.10 to update my own local --data-path to see if the latest reported Naz.API dataset is included yet - the tool works fine under 3.10

Not entirely sure the root cause is a DNS lookup issue yet, as I get slightly different messages in my 3.11 test environment

2024-01-19T01:17:44+1000 | INFO | hibp-downloader | HIBP Downloader: v0.2.1
2024-01-19T01:17:44+1000 | INFO | hibp-downloader | data-path '/home/ndejong/test/data'
2024-01-19T01:17:44+1000 | INFO | hibp-downloader | metadata-path '/home/ndejong/test/data'
2024-01-19T01:17:45+1000 | INFO | hibp-downloader | Created 4 worker processes to consume a queue of prefix-hash values.
2024-01-19T01:17:52+1000 | INFO | hibp-downloader | prefix=000ef source=[lc:0 et:0 rc:249 ro:0 xx:1] processed=[4.4MB ~24824H/s] api=[55req/s 4.4MB] runtime=0.2min
2024-01-19T01:18:01+1000 | INFO | hibp-downloader | prefix=001f3 source=[lc:0 et:0 rc:499 ro:0 xx:1] processed=[8.8MB ~26053H/s] api=[58req/s 8.8MB] runtime=0.3min
Process Process-4:
Traceback (most recent call last):
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/anyio/streams/tls.py", line 140, in _call_sslobject_method
    result = func(*args)
             ^^^^^^^^^^^
  File "/usr/lib/python3.11/ssl.py", line 921, in read
    v = self._sslobj.read(len)
        ^^^^^^^^^^^^^^^^^^^^^^
ssl.SSLWantReadError: The operation did not complete (read) (_ssl.c:2580)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/anyio/_core/_tasks.py", line 115, in fail_after
    yield cancel_scope
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpcore/_backends/anyio.py", line 34, in read
    return await self._stream.receive(max_bytes=max_bytes)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/anyio/streams/tls.py", line 205, in receive
    data = await self._call_sslobject_method(self._ssl_object.read, max_bytes)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/anyio/streams/tls.py", line 147, in _call_sslobject_method
    data = await self.transport_stream.receive()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 1123, in receive
    await self._protocol.read_event.wait()
  File "/usr/lib/python3.11/asyncio/locks.py", line 213, in wait
    await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7905d01b4790

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpcore/_exceptions.py", line 10, in map_exceptions
    yield
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpcore/_backends/anyio.py", line 32, in read
    with anyio.fail_after(timeout):
  File "/usr/lib/python3.11/contextlib.py", line 158, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/anyio/_core/_tasks.py", line 118, in fail_after
    raise TimeoutError
TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpx/_transports/default.py", line 66, in map_httpcore_exceptions
    yield
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpx/_transports/default.py", line 366, in handle_async_request
    resp = await self._pool.handle_async_request(req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 268, in handle_async_request
    raise exc
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 251, in handle_async_request
    response = await connection.handle_async_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpcore/_async/connection.py", line 103, in handle_async_request
    return await self._connection.handle_async_request(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpcore/_async/http2.py", line 185, in handle_async_request
    raise exc
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpcore/_async/http2.py", line 148, in handle_async_request
    status, headers = await self._receive_response(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpcore/_async/http2.py", line 292, in _receive_response
    event = await self._receive_stream_event(request, stream_id)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpcore/_async/http2.py", line 333, in _receive_stream_event
    await self._receive_events(request, stream_id)
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpcore/_async/http2.py", line 361, in _receive_events
    events = await self._read_incoming_data(request)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpcore/_async/http2.py", line 452, in _read_incoming_data
    raise exc
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpcore/_async/http2.py", line 438, in _read_incoming_data
    data = await self._network_stream.read(self.READ_NUM_BYTES, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpcore/_backends/anyio.py", line 31, in read
    with map_exceptions(exc_map):
  File "/usr/lib/python3.11/contextlib.py", line 158, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
    raise to_exc(exc) from exc
httpcore.ReadTimeout

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/hibp_downloader/commands/hibp_download.py", line 178, in queue_worker_process
    asyncio.run(pwnedpasswords_get_store_gather(result_queue, hash_prefixes, worker_args))
  File "/usr/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/hibp_downloader/commands/hibp_download.py", line 182, in pwnedpasswords_get_store_gather
    metadata_results = await asyncio.gather(
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/hibp_downloader/commands/hibp_download.py", line 228, in pwnedpasswords_get_and_store_async
    binary, metadata_latest = await pwnedpasswords_get(prefix, hash_type=hash_type, etag=etag, encoding=encoding_type)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/hibp_downloader/commands/hibp_download.py", line 269, in pwnedpasswords_get
    response = await httpx_binary_response(url=url, etag=etag, encoding=encoding, debug=httpx_debug)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/hibp_downloader/lib/http.py", line 53, in httpx_binary_response
    response = await client.send(request=request, stream=True)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpx/_client.py", line 1617, in send
    response = await self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpx/_client.py", line 1645, in _send_handling_auth
    response = await self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpx/_client.py", line 1682, in _send_handling_redirects
    response = await self._send_single_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpx/_client.py", line 1719, in _send_single_request
    response = await transport.handle_async_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpx/_transports/default.py", line 365, in handle_async_request
    with map_httpcore_exceptions():
  File "/usr/lib/python3.11/contextlib.py", line 158, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpx/_transports/default.py", line 83, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.ReadTimeout
^CSIGINT received, exiting.
SIGINT received, exiting.
SIGINT received, exiting.
SIGINT received, exiting.
^CSIGINT received, exiting.
Exception ignored in atexit callback: <function _exit_function at 0x7905d137ce00>
Traceback (most recent call last):
  File "/usr/lib/python3.11/multiprocessing/util.py", line 360, in _exit_function
    _run_finalizers()
  File "/usr/lib/python3.11/multiprocessing/util.py", line 300, in _run_finalizers
    finalizer()
  File "/usr/lib/python3.11/multiprocessing/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/queues.py", line 199, in _finalize_join
    thread.join()
  File "/usr/lib/python3.11/threading.py", line 1119, in join
    self._wait_for_tstate_lock()
  File "/usr/lib/python3.11/threading.py", line 1139, in _wait_for_tstate_lock
    if lock.acquire(block, timeout):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/hibp_downloader/main.py", line 20, in sigint_handler
    exit()
  File "<frozen _sitebuiltins>", line 26, in __call__
SystemExit: None

@ndejong
Copy link
Contributor

ndejong commented Jan 18, 2024

Note to self

  • write tests to catch this, we really don't have much in the way of tests at the moment :(
  • figure out why the project badges are claiming 3.12 since we don't even test for 3.12
  • handle threads that blow up so we don't fall into an infinite wait condition
  • the issue here appears to be caused by httpx
  • tickets that looks related - anyio update to version 3.7.0 cause http not working encode/httpx#2726

@ndejong
Copy link
Contributor

ndejong commented Jan 21, 2024

New release v0.3.0 that might resolve this issue, unfortunately it's been really hard to replicate the errors on this to chase it down - https://github.com/threatpatrols/hibp-downloader/releases/tag/0.3.0

There are two key changes that I'm expecting should move the needle on this issue though -

  • the loosening of the dependencies to allow upgrades beyond patch releases because there is "something" funky in and around anyio where previously we were probably too opinionated about httpx dependency version that imports anyio
  • the increase in HTTPX request timeout from 5 seconds to 30 seconds; this seems the more likely culprit

The code is more robust around handling worker threads when they have HTTPX exceptions so we should see less infinite loop scenarios, however clean thread exiting (eg SIGINT with ^C) is still not a thing :(

Please do send feedback so we can gauge if this resolves.

@wildegnux
Copy link
Author

Occasional errors still occur on Python 3.11.6 (on EndeavourOS):

2024-01-24T14:50:16+0100 | DEBUG | hibp-downloader | worker_index=14 prefix='0413a' hash_type='sha1' encoding_type='gzip' ignore_etag=False local_cache_ttl=43200 start_timestamp=2024-01-24 14:50:16.201230+01:00
2024-01-24T14:50:16+0100 | DEBUG | hibp-downloader | No existing metadata file 'data2/sha1/04/13/0413a.meta' using an empty entry
2024-01-24T14:50:46+0100 | ERROR | hibp-downloader | Unable to establish connection https://api.pwnedpasswords.com/range/0413a

Debug log doesn't seem to tell me anything else about those threads that might be useful.

But the failing threads do exit cleanly now so there's no more infinite wait. And to workaround the errors once could just run it in a loop until it stops giving errors.

@wildegnux
Copy link
Author

wildegnux commented Jan 24, 2024

I can also reproduce the same errors on a different machine (in a different network) with Python 3.10.12 on Ubuntu 22.04.3 LTS (running under WSL)

@ndejong
Copy link
Contributor

ndejong commented Jan 25, 2024

Okay, that helps - are you able to provide a list of the installed Python packages in each virtual environment you have there?

@wildegnux
Copy link
Author

Python 3.11.6:

aiofiles==23.2.1
anyio==4.2.0
certifi==2023.11.17
click==8.1.7
colorama==0.4.6
h11==0.14.0
h2==4.1.0
hibp-downloader==0.3.0
hpack==4.0.0
httpcore==1.0.2
httpx==0.26.0
hyperframe==6.0.1
idna==3.6
markdown-it-py==3.0.0
mdurl==0.1.2
Pygments==2.17.2
rich==13.7.0
shellingham==1.5.4
sniffio==1.3.0
typer==0.9.0
typing_extensions==4.9.0

Python 3.10.12:

aiofiles==23.2.1
anyio==4.2.0
certifi==2023.11.17
click==8.1.7
colorama==0.4.6
exceptiongroup==1.2.0
h11==0.14.0
h2==4.1.0
hibp-downloader==0.3.0
hpack==4.0.0
httpcore==1.0.2
httpx==0.26.0
hyperframe==6.0.1
idna==3.6
markdown-it-py==3.0.0
mdurl==0.1.2
Pygments==2.17.2
rich==13.7.0
shellingham==1.5.4
sniffio==1.3.0
typer==0.9.0
typing_extensions==4.9.0

@FlorianLoch
Copy link

I am also experiencing the Unable to establish connection errors - and, from reading the logs, I assume they are not retried? 🤔

@ndejong
Copy link
Contributor

ndejong commented Jan 29, 2024

Yes, retries are not a thing in the current code - would surely help

Current thinking is that the timeouts would make sense if you consider this thing creates a lot connections which may in-turn cause an upstream network component to grok - for example a small home router may not have enough NAT state table and then intermittently show connection problems.

This line of thinking might explain why I've rarely been able to replicate this most of the time.

Could you give it another spin and reduce the --processes and --chunk-size values - if this is the underlaying issue then a feature to dynamically scale back to suit the local network conditions might be the thing needed here

@ndejong
Copy link
Contributor

ndejong commented Feb 9, 2024

New release 0.3.1 implements retries and provides CLI options to adjust the default values
https://pypi.org/project/hibp-downloader/0.3.1/

pip install --upgrade hibp-downloader

Feedback would be great

@wildegnux
Copy link
Author

Sorry for the lack of feedback! Will run some test now!

@wildegnux
Copy link
Author

I still run into some threads crashing:

Traceback (most recent call last):
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/anyio/streams/tls.py", line 140, in _call_sslobject_method
    result = func(*args)
             ^^^^^^^^^^^
  File "/usr/lib/python3.11/ssl.py", line 921, in read
    v = self._sslobj.read(len)
        ^^^^^^^^^^^^^^^^^^^^^^
ssl.SSLWantReadError: The operation did not complete (read) (_ssl.c:2580)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/anyio/_core/_tasks.py", line 115, in fail_after
    yield cancel_scope
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpcore/_backends/anyio.py", line 34, in read
    return await self._stream.receive(max_bytes=max_bytes)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/anyio/streams/tls.py", line 205, in receive
    data = await self._call_sslobject_method(self._ssl_object.read, max_bytes)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/anyio/streams/tls.py", line 147, in _call_sslobject_method
    data = await self.transport_stream.receive()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 1123, in receive
    await self._protocol.read_event.wait()
  File "/usr/lib/python3.11/asyncio/locks.py", line 213, in wait
    await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 72b64bf7c550

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpcore/_exceptions.py", line 10, in map_exceptions
    yield
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpcore/_backends/anyio.py", line 32, in read
    with anyio.fail_after(timeout):
  File "/usr/lib/python3.11/contextlib.py", line 158, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/anyio/_core/_tasks.py", line 118, in fail_after
    raise TimeoutError
TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpx/_transports/default.py", line 67, in map_httpcore_exceptions
    yield
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpx/_transports/default.py", line 252, in __aiter__
    async for part in self._httpcore_stream:
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 361, in __aiter__
    async for part in self._stream:
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpcore/_async/http2.py", line 582, in __aiter__
    raise exc
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpcore/_async/http2.py", line 572, in __aiter__
    async for chunk in self._connection._receive_response_body(
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpcore/_async/http2.py", line 313, in _receive_response_body
    event = await self._receive_stream_event(request, stream_id)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpcore/_async/http2.py", line 333, in _receive_stream_event
    await self._receive_events(request, stream_id)
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpcore/_async/http2.py", line 361, in _receive_events
    events = await self._read_incoming_data(request)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpcore/_async/http2.py", line 452, in _read_incoming_data
    raise exc
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpcore/_async/http2.py", line 438, in _read_incoming_data
    data = await self._network_stream.read(self.READ_NUM_BYTES, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpcore/_backends/anyio.py", line 31, in read
    with map_exceptions(exc_map):
  File "/usr/lib/python3.11/contextlib.py", line 158, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
    raise to_exc(exc) from exc
httpcore.ReadTimeout

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/hibp_downloader/commands/hibp_download.py", line 193, in queue_worker_process
    asyncio.run(pwnedpasswords_get_store_gather(result_queue, hash_prefixes, worker_args))
  File "/usr/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
2024-03-08T14:55:04+0100 | ESC[1;37mDEBUGESC[0m | hibp-downloader | Request attempt 1 of 5 for 'https://api.pwnedpasswords.com/range/0ec91'
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/hibp_downloader/commands/hibp_download.py", line 197, in pwnedpasswords_get_store_gather
    metadata_results = await asyncio.gather(
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/hibp_downloader/commands/hibp_download.py", line 254, in pwnedpasswords_get_and_store_async
    binary, metadata_latest = await pwnedpasswords_get(
                              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/hibp_downloader/commands/hibp_download.py", line 306, in pwnedpasswords_get
    response = await httpx_binary_response(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/hibp_downloader/lib/http.py", line 74, in httpx_binary_response
    response.binary = b"".join([part async for part in response.aiter_raw()])
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/hibp_downloader/lib/http.py", line 74, in <listcomp>
    response.binary = b"".join([part async for part in response.aiter_raw()])
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpx/_models.py", line 991, in aiter_raw
    async for raw_stream_bytes in self.stream:
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpx/_client.py", line 147, in __aiter__
    async for chunk in self._stream:
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpx/_transports/default.py", line 251, in __aiter__
    with map_httpcore_exceptions():
  File "/usr/lib/python3.11/contextlib.py", line 158, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpx/_transports/default.py", line 84, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.ReadTimeout

And now some retries (nice 👍 )

2024-03-08T14:53:30+0100 | DEBUG | hibp-downloader | Request attempt 2 of 5 for 'https://api.pwnedpasswords.com/range/05cbc'
2024-03-08T14:53:30+0100 | DEBUG | hibp-downloader | Request attempt 2 of 5 for 'https://api.pwnedpasswords.com/range/05cbf'
2024-03-08T14:53:51+0100 | DEBUG | hibp-downloader | Request attempt 2 of 5 for 'https://api.pwnedpasswords.com/range/075ea'
2024-03-08T14:54:14+0100 | DEBUG | hibp-downloader | Request attempt 2 of 5 for 'https://api.pwnedpasswords.com/range/0922a'
2024-03-08T14:55:04+0100 | DEBUG | hibp-downloader | Request attempt 2 of 5 for 'https://api.pwnedpasswords.com/range/0ccec'
2024-03-08T14:55:04+0100 | DEBUG | hibp-downloader | Request attempt 2 of 5 for 'https://api.pwnedpasswords.com/range/0cced'
2024-03-08T14:55:04+0100 | DEBUG | hibp-downloader | Request attempt 2 of 5 for 'https://api.pwnedpasswords.com/range/0ccf2'
2024-03-08T15:11:25+0100 | DEBUG | hibp-downloader | Request attempt 2 of 5 for 'https://api.pwnedpasswords.com/range/4e3ac'
2024-03-08T15:12:56+0100 | DEBUG | hibp-downloader | Request attempt 2 of 5 for 'https://api.pwnedpasswords.com/range/540c3'
2024-03-08T15:19:08+0100 | DEBUG | hibp-downloader | Request attempt 2 of 5 for 'https://api.pwnedpasswords.com/range/6bb9d'
2024-03-08T15:19:08+0100 | DEBUG | hibp-downloader | Request attempt 2 of 5 for 'https://api.pwnedpasswords.com/range/6bba1'
2024-03-08T15:39:30+0100 | DEBUG | hibp-downloader | Request attempt 2 of 5 for 'https://api.pwnedpasswords.com/range/b9611'
2024-03-08T15:39:30+0100 | DEBUG | hibp-downloader | Request attempt 2 of 5 for 'https://api.pwnedpasswords.com/range/b9615'
2024-03-08T15:40:41+0100 | DEBUG | hibp-downloader | Request attempt 2 of 5 for 'https://api.pwnedpasswords.com/range/bdb53'
2024-03-08T15:47:58+0100 | DEBUG | hibp-downloader | Request attempt 2 of 5 for 'https://api.pwnedpasswords.com/range/da102'

Unclear to me if that crashed thread is one of the retries...

That test was run on my laptop in my office (wifi) network using the default options. Running the same right now on my home network. Will also run a test on my home desktop (WSL).

@wildegnux
Copy link
Author

Test from home network: No crashed threads, but more of these failed downloads:

2024-03-09T18:54:56+0100 | DEBUG | hibp-downloader | worker_index=6 prefix='d3215' hash_type='sha1' encoding_type='gzip' http_timeout=30 http_max_retries=5 http_debug=Falseignore_etag=False local_cache_ttl=43200 start_timestamp=2024-03-09 18:54:56.486975+01:00
2024-03-09T18:54:56+0100 | DEBUG | hibp-downloader | No existing metadata file 'test_home/sha1/d3/21/d3215.meta' using an empty entry
2024-03-09T18:54:56+0100 | DEBUG | hibp-downloader | Request attempt 1 of 5 for 'https://api.pwnedpasswords.com/range/d3215'
2024-03-09T18:55:06+0100 | WARNING | hibp-downloader | Request [1 of 5] failed for 'GET' 'https://api.pwnedpasswords.com/range/d3215'
2024-03-09T18:55:06+0100 | DEBUG | hibp-downloader | Request attempt 2 of 5 for 'https://api.pwnedpasswords.com/range/d3215'

But all failures succeeded on attempt 2. 👍

@wildegnux
Copy link
Author

Test in WSL (home network): With default options (which on this machine means 32 processes) it runs full boar for ~30-60 sec, then completely stops for ~30 sec (matching the http-timeout value). Then all the active download fails and it continues going at full speed again for another minute and then stops again and so on. Reducing processes / chunks size does improve things, but I had to go down to 8 processes and 1 chunk for getting it to not occasionally lock up. Might be some funky stuff in the WSL network stack I guess, since it worked mostly fine one the linux laptop in the same network.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants