-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
COMPRESS-514: SevenZFile header buffers over 2G #98
base: master
Are you sure you want to change the base?
Conversation
src/main/java/org/apache/commons/compress/archivers/sevenz/HeaderBuffer.java
Show resolved
Hide resolved
src/main/java/org/apache/commons/compress/archivers/sevenz/HeaderChannelBuffer.java
Show resolved
Hide resolved
src/main/java/org/apache/commons/compress/archivers/sevenz/HeaderInMemoryBuffer.java
Show resolved
Hide resolved
src/main/java/org/apache/commons/compress/archivers/sevenz/HeaderChannelBuffer.java
Outdated
Show resolved
Hide resolved
What's preventing us from implementing the CRC part in |
I'll assume that is the preferred option and will have a look later then.
|
I believe so. The 7z header is fully parsed in the constructor of
What do you mean by acting? From my view, a CRC check is a good way (and maybe the only way) to verify the corrupted 7z headers - especially for large 7z archives like yours. |
Just that branching is decided by the content of the data (e.g. checking header ID bytes then deciding what to do). That implies something quite unexpected could occur long before you reach the CRC value. I suppose I'm saying from a code safety point of view, you need to confirm CRC before you use the data. |
So you are talking about whether or not read the encoded header by the
A little confused here. From my understanding, you are talking about when reading |
Run out of time for now so I'll be brief (will be back in some hours)!
Yes, that's one possible place a decision is made based on the data of the header. There may be more; I haven't checked all possibilities.
Actually it applies to a "normal" header too, because that's also swapped for a
I think there are more places than that (again, needs checking). |
I see. You are talking about that in some cases the |
Not really, no. After digging a little deeper into 7z format it seems that particular CRC only applies to the If I understand correctly, we could retain existing capability (a CRC check for the end header) by simply enforcing a I'm going to come back to this again later so I can think about it (and get some sleep). Comments welcome in the mean time of course! Edit: That's done. CRC check is now exactly as before, but we still keep large header support. |
src/main/java/org/apache/commons/compress/archivers/sevenz/HeaderChannelBuffer.java
Outdated
Show resolved
Hide resolved
Hey @akelday |
95f8e3d
to
54be6c4
Compare
Hi @PeterAlfredLee - squashed to a single commit from latest master, I hope that's OK. Not sure what's going on with coverage, I'll have to look later because it's late now... Edit: possibly resource (inputstream) is no longer closed correctly so will need to investigate! Update: definite problem with that (will put more detail in the jira), so this is not OK to merge. |
Take it easy. The coverage sometimes report a coverage rate that is not accurate enough. |
Please rebase on master. Recent changes should allow all builds to be green including Java 16 and 17-EA. |
private boolean refilled(final int remainingBytes) throws IOException { | ||
if (remainingBytes <= 0) { | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix formatting please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, do we think this PR is still appropriate? I currently have no files large enough to test the issue it resolved, and haven't found time to create one yet. No problem with closing it, but if it's still useful I'll try to make time for it again.
This is a simple way to enable reading of SevenZFile headers over 2GiB. It also allows a much smaller memory footprint for even the largest headers. The CRC handling needs work/input, because it's only supported for headers held entirely in memory.
When the complete header is not entirely in memory, that leaves several options - for example read the header in two passes, read/parse the header fully while computing CRC (which sort of defeats the purpose) or simply ignore it.