Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compression: fix decomp buffer mgt #51

Merged
merged 3 commits into from
Jan 26, 2024
Merged

Conversation

fia0
Copy link

@fia0 fia0 commented Jan 25, 2024

Due to the usage of AlignedStorage and its Wrapper^n Buf when storing and
serializing buffers length information is lost and the total length will simply
be set to the capacity. This is wrong. While serialization libraries seem to
handle this well enough, the decompression structure used from zstd did not
(This could be a bug though). This patch reworks the de/compression part to do 5
things:

  • store the length of the original data alongside the compressed to ensure the
    minimal amount of reallocation on decompression (1); before for each block a
    reallocation happened bc of how the zstd decompression writer works
    internally

  • minimize the amount of copies during compression and decompression; the
    zstd writer holds an internal buffer which, to my understanding, always
    buffers the intermediate results and then copies it to the final buffer. We
    avoid this structure now entirely to reduce the amount of copies

  • estimate the size of the compressed results inbefore to reduce realloc's

  • rework the compression interface to be similar to a one-and-done process,
    compressors and decompressors might be reused later on (needs an additional
    reinit), but the append-append-compress workflow was never used and comes with
    inefficienct reallocs

  • remove redundant memcpy when using the None compression algorithm

Due to the usage of `AlignedStorage` and its Wrapper^n `Buf`  when storing and
serializing buffers length information is lost and the total length will simply
be set to the capacity. This is wrong. While serialization libraries seem to
handle this well enough, the decompression structure used from `zstd` did not
(This could be a bug though). This patch reworks the de/compression part to do 4
things:

- store the length of the original data alongside the compressed to ensure the
  minimal amount of reallocation on decompression (1); before for each block a
  reallocation happened bc of how the `zstd` decompression writer works
  internally

- minimize the amount of copies during compression and decompression; the
  `zstd` writer holds an internal buffer which, to my understanding, always
  buffers the intermediate results and then copies it to the final buffer. We
  avoid this structure now entirely to reduce the amount of copies

- estimate the size of the compressed results inbefore to reduce realloc's

- rework the compression interface to be similar to a one-and-done process,
  compressors and decompressors might be reused later on (needs an additional
  reinit), but the append-append-compress workflow was never used and comes with
  inefficienct reallocs
@fia0 fia0 marked this pull request as ready for review January 26, 2024 16:20
@fia0
Copy link
Author

fia0 commented Jan 26, 2024

This PR also adds some basic tests to the zstd module in compression.

Johannes Wünsche added 2 commits January 26, 2024 17:41
This commit fixes the redundant copy that was done when calling `decompress`
with the `None` compression. We should see less memcpys now.
@fia0
Copy link
Author

fia0 commented Jan 26, 2024

Tests still seem to work fine, will merge now.

@fia0 fia0 merged commit f559bda into parcio:main Jan 26, 2024
7 checks passed
@fia0 fia0 deleted the fix/compression branch February 17, 2024 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant