Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add compress kwarg to CQM.to_file() #1296

Merged
merged 1 commit into from
Nov 22, 2022

Conversation

arcondello
Copy link
Member

Address the CQM part of #1235.

Because we use zipfile for our CQMs, we get backwards compatibility for free.

We use zipfile.ZIP_DEFLATED to be consistent with NumPy.
Copy link
Member

@randomir randomir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment on lines +1792 to +1793
kwargs = dict(compression=zipfile.ZIP_DEFLATED) if compress else dict()
with zipfile.ZipFile(file, mode='a', **kwargs) as zf:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only doubt in this PR was about the compression method and compression level to use.

We have bzip2 and lzma at our disposal, in addition to the default gzip.

Looks like bzip2 provides 15% better compression over gzip, but it's ~8x slower at (de)compression.

lzma halves the archive size over gzip, but it's a lot slower than bzip2 during compression, although roughly as fast as gzip during decompression.

gzip/deflate dominates in memory overhead.

In addition, bzip2/lzma modules might not be available on all systems, especially if user self-compiled Python, because system packages need to installed beforehand.

Taking all that into consideration, looks like ZIP_DEFLATED with the default compression level (6 - balance between speed and size) is the optimal choice here. It's widely supported, it's fast, with minimal memory overhead, and provides decent compression (at least to handle obviously redundant sequences).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants