Hash mismatch on resumed files #57

shadesmore · 2015-06-01T01:24:46Z

I get these errors constantly on 2 different installs. I'm trying to download my data off of ACD, all of my files are 10235MB split 7zip archives. Machine 1 is running ubuntu 14.04 with 250mbps symmetrical BW, Machine 2 is running latest debian stable with gbit symmetrical. The file will be downloading and then the speed drops to 0.0 KBps and it will start downloading again after 15-30 seconds. Then once it reaches 60-90% downloaded it will fail with [ERROR] [acd_cli] - Hash mismatch between local and remote file for "File_Name".

If I set max connections to anything other than 1 every is guaranteed to fail, setting max retries doesn't seem to affect the hash error rate. If I queue 25 files to download I'll get 3-4 without errors, then if I delete the error-ed files and redownload I can get another 3-4 files with the rest being hash errors. SO it's wasting a large amount of BW and time. Because once a hash mismatch happens it will stop downloading that file.

I'll see if I can get some verbose logs of the errors. Anybody have any ideas why I'm getting constant errors?

yadayada · 2015-06-01T06:23:46Z

Please append '.__incomplete' to a failed file, retry and see if the remaining part gets downloaded and the hash is correct.

shadesmore · 2015-06-01T23:31:28Z

I tried that on 5 files, it completed them, but it still claimed hash errors for all 5. I checked filesizes and they are complete/correct so I'll try extracting a file from inside to make sure they are intact.

yadayada · 2015-06-02T11:15:55Z

It would be nice if you could

md5sum the completed files and ascertain that the hashing is correct and there is a download error. That means that acd-cli find-md5 should not find the independently computed hash.
Do a binary diff on the files and see where the errors occur, provided that you still have or can recreate the original files.

shadesmore · 2015-06-04T02:10:30Z

I used md5sum on 10 different files, a few were downloaded without error; all were correctly matched via find-md5 even the resumed ones from above.

USER@HO$T:~$ md5sum /data/USER/test1/test/BD-000-097.7z.001
b36de352cb66b7641f736adb847ffc11  /data/USER/test1/test/BD-000-097.7z.001
USER@HO$T:~$ sudo acd_cli find-md5 b36de352cb66b7641f736adb847ffc11
[OK9rN6nzRBCYj4ZC1mXyRw] [A] /test/BD-000-097.7z.001
USER@HO$T:~$ md5sum /data/USER/test1/test/BD-000-097.7z.005
4fd6810060d41c153e719b4236ac4ba9  /data/USER/test1/test/BD-000-097.7z.005
USER@HO$T:~$ sudo acd_cli find-md5 4fd6810060d41c153e719b4236ac4ba9
[4ls8JVeZRAyfxVEji_9Cfw] [A] /test/BD-000-097.7z.005

I do not have the original files, I did extract one archive set completely and I had no errors/issues.

edit: and now I'm getting Code: 1000, msg: ('Connection aborted.', ResponseNotReady('Request-sent',)) I wonder if something is down.

yadayada · 2015-06-04T09:23:47Z

I will add a check that suppresses the hash error messages when a file download is incomplete for some reason. Why a hashing error occurs for resumed files, I don't know.

Regarding the connection error, Amazon has disabled downloads of large files again.

This greatly improves FUSE read speeds by streaming chunks on read operations compared to experimental FUSE release. Reduces read timeout to 5 seconds. Moving and renaming should now work. misc: - raise RequestError on incomplete download (#57) - moving nodes now done using add-child and remove-child (fixes #62) - cache.query: new resolve method tha returns (node, parent) tuple

Sunako · 2015-06-30T03:08:40Z

I also have this issue, I have more or less the same example of how it happens. However I have maybe more info since I have the original files inside my archive files.

Anyway, I have a 4GiB archive file, I download it via acd_cli and it has a wrong hash when completed. It usually stops to 0 B/s DL speed for about 30 seconds at some random point, like above, and then resumes itself saying it dropped the connection. After a while it says it failed, so I start the DL again to resume the file to completion, it still reports a failure. I check its size in bytes, it matches the source file size, however the hash is wrong as compared to my own hash, and the one that amazon's metadata reports(which is the same as mine). I then proceeded to extract the files inside the archive, which I also have the original hashes for and then I md5sum check them and I get this:
md5sum: WARNING: 468 computed checksums did NOT match

More than half of the files in the archive do not match my originals, the archive seems to be the same size in bytes as my original source file, however it is a mismatch on the hash. Just to further check around I downloaded the same archive file from amazon's website for clouddrive, I do get the correct file and hash by doing that.

I can only assume there may be some bug in the api or acd_cli, but I don't really know enough to say. Hopefully any of this is helpful. If you want I can test things for you since I've already got everything setup for that.

Rufflewind · 2016-03-23T01:27:04Z

I'm also running into this problem.

I have a file that's about 8.38 GiB (8998338695 B) in size. I made 5 attempts to download this file:

Success (no special flags).
Bytes 4204527168 to 8479162056 are corrupted (no special flags).
Bytes 5772410432 to 8479162056 are corrupted (no special flags).
Bytes 7937719872 to 8467119236 are corrupted (with -r 2 -x 8).
Success (with -r 4 -x 8).

Some of these attempts failed midway (or I manually interrupted them) so they had to be resumed, though I don't quite remember which ones did. I know for certain 5 did not fail at all, and 4 did fail midway. So I suspect it's a result of resuming a download. FYI I am downloading them onto an SSD.

(Here, "corruption" means the majority of the bytes in that range do not match the original file at all.)

yadayada · 2016-03-23T16:34:54Z

(Here, "corruption" means the majority of the bytes in that range do not match the original file at all.)

Since you were able to identify the offending byte ranges, could you provide some further information on the corruption?

Removes downloaded files that do not match the originals' MD5 sums and adds HASH_MISMATCH to list of return values that trigger a download retry. Also adds an optional acd_cli.ini config file. Having the line "keep_corrupt=True" in the [download] section will safely rename the corrupt file for later inspection. Concerns #57 and #336.

yadayada · 2016-06-11T11:09:56Z

While trying to reproduce #336, I only was able to reproduce this issue.

It turns out that faulty byte ranges originally appear in the incompletely downloaded files. In one file, a chunk of approximately 500MB is missing at an 1500MB offset.

The resuming itself seems to work fine.

yadayada · 2016-07-25T15:48:23Z

I had an inkling about this. Sorry I took so long. Please try whether the latest commit fixes the issues.

yadayada changed the title ~~Constant Hash mismatch~~ Hash mismatch on resumed files Jul 9, 2015

yadayada added the bug label Jul 11, 2015

yadayada added the API label Sep 20, 2015

yadayada mentioned this issue Jun 4, 2016

Files are corrupted after download #336

Closed

michaelotto mentioned this issue Oct 4, 2016

borgbackup on acd_cli #377

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hash mismatch on resumed files #57

Hash mismatch on resumed files #57

shadesmore commented Jun 1, 2015

yadayada commented Jun 1, 2015

shadesmore commented Jun 1, 2015

yadayada commented Jun 2, 2015

shadesmore commented Jun 4, 2015

yadayada commented Jun 4, 2015

Sunako commented Jun 30, 2015

Rufflewind commented Mar 23, 2016

yadayada commented Mar 23, 2016

yadayada commented Jun 11, 2016

yadayada commented Jul 25, 2016

Hash mismatch on resumed files #57

Hash mismatch on resumed files #57

Comments

shadesmore commented Jun 1, 2015

yadayada commented Jun 1, 2015

shadesmore commented Jun 1, 2015

yadayada commented Jun 2, 2015

shadesmore commented Jun 4, 2015

yadayada commented Jun 4, 2015

Sunako commented Jun 30, 2015

Rufflewind commented Mar 23, 2016

yadayada commented Mar 23, 2016

yadayada commented Jun 11, 2016

yadayada commented Jul 25, 2016