HTTP: get more precise remote file size from content-length header instead of http.group.file.size #50

abretaud · 2024-08-23T07:39:58Z

For example here: https://ftp.ncbi.nlm.nih.gov/blast/db/
-> the file sizes are in kb/mb/gb, which means it's not super precise.

The problem is that this size (together with last modification date) is used to check if some files are already present in offline dir (in case of a failed bank update attempt) => all files get redownloaded on each bank update attempt, even if some are already present locally.

Probably needs to be done around https://github.com/genouest/biomaj-download/blob/master/biomaj_download/download/curl.py#L375

It's already implemented in the direct http downloader https://github.com/genouest/biomaj-download/blob/master/biomaj_download/download/direct.py#L233

Not sure if we want it to be configurable. Maybe just do it in this case = http.group.file.size=-1 in properties file?

The text was updated successfully, but these errors were encountered:

mboudet · 2024-09-13T07:20:04Z

Should be closed by #51

osallou · 2024-09-13T07:23:28Z

Group file size is (also) obtained during listing step, ans used with date to decide if file should be downloaded (provided data depends on protocoles and servers config ..).

Using content length will be exact, but may be différent from listing info, and file will be downloaded again on next check.

mboudet · 2024-09-13T07:26:14Z

Hmm. The listing step uses unix file information I think? For the files already on disk.
In any case, parsing directly the html info will break everytime, since it's not usually printed in bytes.

Hopefully content-length will work a bit better, but we'll see

abretaud mentioned this issue Aug 23, 2024

Files already in offline dir are not uncompressed/copied/published genouest/biomaj#141

Closed

mboudet mentioned this issue Aug 26, 2024

Trying some stuff.. #51

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTTP: get more precise remote file size from content-length header instead of http.group.file.size #50

HTTP: get more precise remote file size from content-length header instead of http.group.file.size #50

abretaud commented Aug 23, 2024

mboudet commented Sep 13, 2024

osallou commented Sep 13, 2024 •

edited

Loading

mboudet commented Sep 13, 2024

HTTP: get more precise remote file size from content-length header instead of http.group.file.size #50

HTTP: get more precise remote file size from content-length header instead of http.group.file.size #50

Comments

abretaud commented Aug 23, 2024

mboudet commented Sep 13, 2024

osallou commented Sep 13, 2024 • edited Loading

mboudet commented Sep 13, 2024

osallou commented Sep 13, 2024 •

edited

Loading