Download only the package header, not complete RPMs. #2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently, the rpmdpelint downloads the complete RPMs just to load
their headers to check for file conflicts. This is waste of time
and resouces.
This commit changes this to download only RPM headers.
There is no function provided by the Python
rpm
module which wouldreturn the size of RPM header. The code therefore tries to download
first N bytes of the RPM file and checks if the header is complete or
not using the
hdrFromFdno
RPM funtion.As the header size can be very different from package to package, it
tries to download first 100KB and if header is not complete, it
fallbacks to 1MB and 5MB. If that is not enough, the final fallback
downloads whole RPM file.
This strategy still wastes some bandwidth, because we are downloading
first N bytes repeatedly, but because header of typical RPM fits
into first 100KB usually and because the RPM data is much bigger than
what we download repeatedly, it saves lot of time and bandwidth overall.
Checksums cannot be checked by this method, because checksums work
only when complete RPM file is downloaded.
Signed-off-by: Jan Kaluza [email protected]