-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added test for delete from in-memory zip. #333
base: master
Are you sure you want to change the base?
Conversation
This required making some updates that are kind of hacky. The path of least edits to miniz necessitated updating miniz mz_zip_reader_init_mem to initialize the writer function and additionally zip_stream_open to handle the case where a user passes a stream and the mode 'r' or 'd'.
I'm sure that modifying miniz is something that we only want to ever do as a last result to fix critical errors, but what are your thoughts on these modifications to enable more rich support of in memory zips? |
Also, I have no idea why ubuntu is failing on the runner when it works for me on ubuntu when I do:
|
Also, it has been a nightmare trying to update the CMake config to properly toggle output on failure for the tests. The ideal solution would be to add:
to the CMakeLists.txt, but that was only introduced in cmake 3.17 (current cmake minimum in your project is 3.14). Another way to achieve it is to create a new test target called check where you add the option to toggle output on failure like:
Problem is that the pipeline currently does a |
I've added more verbose output for
|
Yes, we should not change Regarding new features - in my personal opinion the library is in maintenance mode (I do not see any need for new features), but people use it in many different ways, so I'm always open for PRs. |
Thank you for the output on error update! I see now that what I thought would prevent the stream from being freed didn't actually do that. I'll look into it now. |
So before we merge this, I should rebase the fixup and add in an additional commit that updates zip.h to document the weird behavior that zip_stream_close frees the original stream passed in if it's not NULL. Is that too disgusting of a way to do this? Because of the reallocs that are performed on the stream in miniz, I think it's safest if the user assumes that buffer is just going to be managed by the zip library at that point. |
2dfe454
to
93ee771
Compare
Went ahead and rebased and documented the wonky behavior. |
93ee771
to
d519e89
Compare
src/zip.h
Outdated
@@ -494,6 +495,9 @@ extern ZIP_EXPORT ssize_t zip_stream_copy(struct zip_t *zip, void **buf, | |||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not deallocate memory allocated by other library/thread. if we need to free pre-allocated memory we either should add allocator/deallocator functionality or let people pass deallocator which will be called by this function.
It can be problematic, e.g. if people allocate memory in C++ by new
and we deallocate memory with free
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it felt kind of hacky to me too. Maybe what I do is just create a copy if stream is not NULL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not copy. Just push the responsibility for the user and take free_cb
as function pointer, which will be called (if not NULL) to deallocate memory. Then we are pretty sure that user's functions allocated/deallocated memory (not zip library).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that if I don't make a copy of the stream, then internally, reallocs may happen that result in the original stream pointer getting freed anyway. So even if I did push changes that made the API breaking change of updating zip_stream_open to take a free callback function, I don't think the double free problem would be completely solved.
Given that the provided stream may mutate as a result of realloc, I think the safest and simplest change here is to copy the stream into another buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that if I don't make a copy of the stream, then internally, reallocs may happen that result in the original stream pointer getting freed anyway. So even if I did push changes that made the API breaking change of updating zip_stream_open to take a free callback function, I don't think the double free problem would be completely solved.
Given that the provided stream may mutate as a result of realloc, I think the safest and simplest change here is to copy the stream into another buffer.
So far, I do not see the point to copy the stream (it can be huge, so it will consume 2x memory). Moreover we already have zip_stream_copy
function (where user is obligated to free memory).
Do you have any workflow scenario (with realloc) when the double free problem happens?
In terms of tests, I think we can use two zip_t
pointers and close
/free
them at the end (in correct order), to avoid memory leaks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The realloc may happen any time someone passes a stream to zip_open_stream
and then adds a file to the zip that was big enough that the realloc moved the memory that stream originally pointed to to another address.
I agree that copying the stream is not ideal if the stream may be gigabytes. I think part of fixing this issue would require passing the stream pointer by reference (e.g. int zip_stream_open(uint8_t **ppstream, ...)
) so that the realloc memory shift won't matter as much. The user would then almost not have to call zip_stream_copy at that point to get the final zip, but there's some zip finalization that has to take place that isn't exposed to the API user (zip_archive_finalize
). The other part of addressing this is to create a boolean that reflects whether the memory that the zip API is operating on was originally allocated by the user and to not free it if that was the case.
It may take me a few to do the above (some life and work things are coming up that will leave me a little low energy), but let me know your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The realloc may happen any time someone passes a stream to
zip_open_stream
and then adds a file to the zip that was big enough that the realloc moved the memory that stream originally pointed to to another address.
Ok, I understand what you mean. That's why to kept workflows safe and separated delete
mode from write/append
.
Frankly speaking, I'm not super convinced that it's worth to redesign the API, or copy the stream, ...
Personally, I do not think it's a big deal to open the stream twice. First time for delete, the second time for write/append.
ead371d
to
7c1646b
Compare
This is more of a difference between read and write though. Like if you're
only opening with read permissions, then the stream pointer will definitely
never be modified. But for write, it'll obviously be mutated AND you could add a file big enough that
necessitates moving all of the data to a new chunk of memory. Delete would also mutate the stream,
but it wouldn't do so in a way that would result in needing a realloc.
To me, it still doesn't make as much sense to separate write and delete
because they're both mutating the stream. I know there's a little more
complexity required to properly shift the entries, but it's very cumbersome
to have to redo the same workflow to toggle between write and delete. At a minimum,
that could result about 7 API function calls with error checking for each workflow.
Also, zip_stream_open taking a `const char *stream` for the workflow where
you're doing writes (even though the pointer itself isn't changed)
definitely doesn't make sense given the realloc involved.
After being forced to dig under the hood of this library, it's become clear
to me how naive (and potentially incomplete) my changes are in this MR. I
plan on making more comprehensive changes in my fork when I have time. If
you decide that it's easier to limit actions to read/write/delete per open
for simplicity, I do understand though.
On another note, as I've been looking more at the miniz part of this library, I've found several code quality issues and
things that look a little broken. Outside of all of the monolith functions that make me wonder how they
properly tested any of the units, I have found things like in `mz_zip_writer_add_mem_ex_v2` where at https://github.com/kuba--/zip/blob/4696e964a6c66b46a6469566d81145993d4a7c5e/src/miniz.h#L8749 they
only ever use the `MZ_ZIP_DATA_DESCRIPTER_SIZE64` regardless of whether the `m_zip64` is true.
…On Mon, Jan 22, 2024, 6:19 AM Kuba Podgórski ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/zip.h
<https://github.com/kuba--/zip/pull/333#discussion_r1461716537>:
> @@ -494,6 +495,9 @@ extern ZIP_EXPORT ssize_t zip_stream_copy(struct zip_t *zip, void **buf,
/**
The realloc may happen any time someone passes a stream to zip_open_stream
and then adds a file to the zip that was big enough that the realloc moved
the memory that stream originally pointed to to another address.
Ok, I understand what you mean. That's why to kept workflows safe and
separated delete mode from write/append.
Frankly speaking, I'm not super convinced that it's worth to redesign the
API, or copy the stream, ...
Personally, I do not think it's a big deal to open the stream twice. First
time for delete, the second time for write/append.
—
Reply to this email directly, view it on GitHub
<https://github.com/kuba--/zip/pull/333#discussion_r1461716537>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACGZBJBEQQNJKS57B5C6E6TYPZDKPAVCNFSM6AAAAABB32JXT6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTQMZWGIZTGNJXGY>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Yes, both operations mutate the same stream, but mutate in totally different way (especially from zip structure perspective). In other words, they mutate the same critical section, and that's why it can be risky (that's why I mentioned that separated operations are more predictable from API perspective).
|
This required making some updates that are kind of hacky. The path of least edits to miniz necessitated updating miniz mz_zip_reader_init_mem to initialize the writer function and additionally zip_stream_open to handle the case where a user passes a stream and the mode 'r' or 'd'.