Download only the package header, not complete RPMs. #2

jankaluza · 2022-02-28T08:02:57Z

Currently, the rpmdpelint downloads the complete RPMs just to load
their headers to check for file conflicts. This is waste of time
and resouces.

This commit changes this to download only RPM headers.

There is no function provided by the Python rpm module which would
return the size of RPM header. The code therefore tries to download
first N bytes of the RPM file and checks if the header is complete or
not using the hdrFromFdno RPM funtion.

As the header size can be very different from package to package, it
tries to download first 100KB and if header is not complete, it
fallbacks to 1MB and 5MB. If that is not enough, the final fallback
downloads whole RPM file.

This strategy still wastes some bandwidth, because we are downloading
first N bytes repeatedly, but because header of typical RPM fits
into first 100KB usually and because the RPM data is much bigger than
what we download repeatedly, it saves lot of time and bandwidth overall.

Checksums cannot be checked by this method, because checksums work
only when complete RPM file is downloaded.

Signed-off-by: Jan Kaluza [email protected]

Currently, the rpmdpelint downloads the complete RPMs just to load their headers to check for file conflicts. This is waste of time and resouces. This commit changes this to download only RPM headers. There is no function provided by the Python `rpm` module which would return the size of RPM header. The code therefore tries to download first N bytes of the RPM file and checks if the header is complete or not using the `hdrFromFdno` RPM funtion. As the header size can be very different from package to package, it tries to download first 100KB and if header is not complete, it fallbacks to 1MB and 5MB. If that is not enough, the final fallback downloads whole RPM file. This strategy still wastes some bandwidth, because we are downloading first N bytes repeatedly, but because header of typical RPM fits into first 100KB usually and because the RPM data is much bigger than what we download repeatedly, it saves lot of time and bandwidth overall. Checksums cannot be checked by this method, because checksums work only when complete RPM file is downloaded. Signed-off-by: Jan Kaluza <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Download only the package header, not complete RPMs. #2

Download only the package header, not complete RPMs. #2

Uh oh!

jankaluza commented Feb 28, 2022

Uh oh!

Uh oh!

Download only the package header, not complete RPMs. #2

Are you sure you want to change the base?

Download only the package header, not complete RPMs. #2

Uh oh!

Conversation

jankaluza commented Feb 28, 2022

Uh oh!

Uh oh!