Description
Describe the bug
My deploy process runs a pip install
during image build time on AWS. This process has now failed a couple times in a row with the following error:
ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them.
torch==2.1.1 from https://files.pythonhosted.org/packages/72/d0/8e7157fe416f657e38736a42d9b0b82ef7f7af00398516200b59ebb5995e/torch-2.1.1-cp311-cp311-manylinux2014_aarch64.whl (from -r requirements.txt (line 84)):
Expected sha256 61b51b33c61737c287058b0c3061e6a9d3c363863e4a094f804bc486888a188a
Got db7f567ef4ee64ffdb28fe1cc71206584bdddc70e1e4a92e26b3671de6f9e32b
I tried fetching the file locally and got the correct SHA. So, I spun up an instance in EC2, connected to it, and fetched the file. That fetch failed after precisely 39MiB (a bit under half the expected size) with a 503 error. The resulting file had the hash db7f567ef4ee64ffdb28fe1cc71206584bdddc70e1e4a92e26b3671de6f9e32b
.
All subsequent attempts to fetch the file from that instance succeeded.
I don't have enough samples to be conclusive about it, but it appears to be an oddly deterministic failure in which the first attempt to fetch this file from an EC2 node results in the same incomplete file being produced consistently.
I did get a successful download on the third image build attempt, so it's not perfectly consistent (thankfully). Honestly, I wouldn't bother reporting it except that it happened 3 times and produced the same incorrect hash every time.
Expected behavior
I would expect the server to not be sending me just under half the expected file on the first attempt. Alternatively, I'd expect pip to properly retry (help screen says it should make 5 attempts by default, and I'm not overriding that).
To Reproduce
pip3.11 install --requirement requirements.txt
Where requirements.txt
contains the line torch=2.1.1
, and the command is performed from a Linux instance on EC2 in Amazon's us-east-1 region.
My Platform
We're using Debian 11, with Python 3.11.4 (SHA 85c37a265e5c9dd9f75b35f954e31fbfc10383162417285e30ad25cc073a0d63) built from source.
Additional context