Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core-clp): Add LZMA Compressor implementation and LZMA dependency. #614

Open
wants to merge 36 commits into
base: main
Choose a base branch
from

Conversation

Bill-hbrhbr
Copy link
Contributor

@Bill-hbrhbr Bill-hbrhbr commented Nov 27, 2024

Description

  1. Port the working version of LZMA compression code into open-source
  2. Refactor the compressor-related files as a continuation of refactor(core): Fix clang-tidy warnings in the streaming compressor interface and its implementations; Modernize and refactor test-StreamingCompression for conciseness. #599
  3. Add scripts to install LZMA dependencies for all builds

Validation performed

Updated test-StreamingCompression.cpp to include LZMA compressors. Tested that compressing workflow finishes without errors.

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Introduced LZMA compression support, enhancing data compression options.
    • Added constants for LZMA compression levels and dictionary size.
  • Bug Fixes

    • Improved error handling in the LZMA compression implementation.
  • Documentation

    • Updated installation scripts to include liblzma and liblzma-dev packages.
  • Tests

    • Expanded the StreamingCompression test suite to include LZMA compression tests.
  • Chores

    • Added a script for automating the installation of the liblzma library.
    • Included installation of xz-devel package in CentOS prebuilt packages script.
    • Added xz package to the Homebrew installation script for macOS.

Copy link
Contributor

coderabbitai bot commented Nov 27, 2024

Walkthrough

The pull request introduces support for LZMA compression within the project. It updates the .clang-format configuration to include the lzma library in the header inclusion regex. The CMakeLists.txt is modified to handle LZMA compression, including new source files and library checks. A new Compressor class for LZMA is implemented, along with constants for compression levels and testing infrastructure. Additionally, scripts for installing the LZMA library are updated to ensure proper setup across different environments.

Changes

File Path Change Summary
components/core/.clang-format Updated IncludeCategories regex to include lzma library.
components/core/CMakeLists.txt Added support for LZMA compression in GENERAL_COMPRESSOR, defined USE_LZMA_COMPRESSION, and included library setup.
components/core/src/clp/streaming_compression/Constants.hpp Added LZMA enumerator to CompressorType enum.
components/core/src/clp/streaming_compression/lzma/Compressor.cpp Implemented Compressor class for LZMA compression with various methods for handling compression.
components/core/src/clp/streaming_compression/lzma/Compressor.hpp Defined Compressor class and methods, including error handling via OperationFailed.
components/core/src/clp/streaming_compression/lzma/Constants.hpp Introduced constants for LZMA compression levels and dictionary size.
components/core/tests/test-StreamingCompression.cpp Added tests for LZMA compression functionality.
components/core/tools/scripts/lib_install/liblzma.sh New script for automated installation of liblzma from source.
components/core/tools/scripts/lib_install/ubuntu-focal/install-packages-from-source.sh Added command to install liblzma version 5.4.6.
components/core/tools/scripts/lib_install/ubuntu-focal/install-prebuilt-packages.sh Added liblzma-dev to the installation command.
components/core/tools/scripts/lib_install/ubuntu-jammy/install-packages-from-source.sh Added command to install liblzma version 5.4.6.
components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh Added liblzma-dev to the installation command.
components/core/tools/scripts/lib_install/centos-stream-9/install-prebuilt-packages.sh Added xz-devel to the installation command.
components/core/tools/scripts/lib_install/macos/install-all.sh Added xz to the installation list for Homebrew.

Possibly related PRs

Suggested reviewers

  • LinZhihao-723

Tip

CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command @coderabbitai generate docstrings to have CodeRabbit automatically generate docstrings for your pull request. We would love to hear your feedback on Discord.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@Bill-hbrhbr
Copy link
Contributor Author

I guess two comments:

  1. Let me know if you want me to remove the decompressor files. I did a direct copy-paste so that I can include unittest for LZMA in this PR.
  2. I added lib install scripts for Ubuntu machines. Do I need to also look into CentOS and MacOS?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

🧹 Outside diff range and nitpick comments (18)
components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh (1)

23-23: Consider pinning the package version for reproducibility.

To ensure consistent builds across different environments and times, consider pinning the package version.

-  liblzma-dev \
+  liblzma-dev=5.2.5-2ubuntu1 \
components/core/tools/scripts/lib_install/ubuntu-jammy/install-packages-from-source.sh (2)

Line range hint 1-20: Consider adding version compatibility comments

The script structure is good, but it would be helpful to document any version interdependencies between the libraries.

Add a comment block at the top of the file:

 #!/usr/bin/env bash

+# Library versions and compatibility notes:
+# - liblzma 5.4.6: Compatible with libarchive 3.5.1
+
 # Exit on any error
 set -e

14-14: Consider error handling for LZMA installation

While the script has basic error handling with set -e, consider adding specific error handling for the LZMA installation.

-"$lib_install_scripts_dir"/liblzma.sh 5.4.6
+if ! "$lib_install_scripts_dir"/liblzma.sh 5.4.6; then
+    echo "Error: LZMA installation failed. Please check system requirements."
+    exit 1
+fi
components/core/tools/scripts/lib_install/ubuntu-focal/install-packages-from-source.sh (1)

17-17: Consider installation order implications

The LZMA library is installed after libarchive. If libarchive is built with LZMA support, it might need to be rebuilt after LZMA installation.

Consider either:

  1. Moving the LZMA installation before libarchive
  2. Adding a rebuild step for libarchive after LZMA installation if necessary
components/core/tools/scripts/lib_install/liblzma.sh (1)

1-66: Consider adding logging, dry-run mode, and uninstall capability

To improve the script's maintainability and user experience, consider these architectural improvements:

  1. Add logging functionality with different verbosity levels
  2. Implement a dry-run mode for testing
  3. Add uninstall capability
  4. Consider adding a progress indicator for long-running operations

These improvements would make the script more robust and user-friendly for both development and production use.

Would you like me to provide an implementation for any of these features?

components/core/tests/test-StreamingCompression.cpp (2)

60-64: Consider enhancing test coverage for LZMA-specific scenarios.

While the basic compression/decompression test is good, consider adding LZMA-specific test cases to validate:

  • Different compression levels
  • Dictionary size variations
  • Error handling for corrupted LZMA streams

Would you like me to provide example test cases for these scenarios?


Line range hint 1-113: Consider refactoring to use parameterized tests.

The test structure is duplicated across compression types. Consider using Catch2's test case templates to reduce duplication and make it easier to add new compression methods.

Here's a suggested approach:

TEMPLATE_TEST_CASE("StreamingCompression", "[StreamingCompression]",
    std::pair<clp::streaming_compression::zstd::Compressor, clp::streaming_compression::zstd::Decompressor>,
    std::pair<clp::streaming_compression::passthrough::Compressor, clp::streaming_compression::passthrough::Decompressor>,
    std::pair<clp::streaming_compression::lzma::Compressor, clp::streaming_compression::lzma::Decompressor>
) {
    auto compressor = std::make_unique<typename TestType::first_type>();
    auto decompressor = std::make_unique<typename TestType::second_type>();
    // ... rest of the test ...
}
components/core/CMakeLists.txt (4)

21-23: Standardize the status message format

The status message format differs from other compressors. Consider using a consistent format:

-    message(STATUS "Using Lempel–Ziv–Markov chain Algorithm compression")
+    message(STATUS "Using LZMA compression")

241-241: Use STATUS for LZMA include directory message

The include directory message should use the STATUS level for consistency with other similar messages:

-message("LZMA Include Dir: ${LIBLZMA_INCLUDE_DIRS}")
+message(STATUS "LZMA Include Dir: ${LIBLZMA_INCLUDE_DIRS}")

231-232: Address the static/shared library support TODO

The TODO comment indicates missing functionality for handling static vs. shared libraries.

Would you like me to help create a CMake module to handle static/shared library selection for LZMA? This would align with the project's existing pattern for other libraries.


573-573: Consider grouping compression-related libraries together

For better organization, consider grouping LZMA with other compression-related libraries:

-        ${LIBLZMA_LIBRARIES}
-        ZStd::ZStd
+        ZStd::ZStd
+        ${LIBLZMA_LIBRARIES}
components/core/src/clp/streaming_compression/lzma/Compressor.hpp (1)

120-120: Use type alias LzmaAction for consistency.

In the compress method, consider using the type alias LzmaAction instead of lzma_action to maintain consistency with the defined type aliases and improve readability.

Apply this diff for consistency:

-auto compress(lzma_action action) -> void;
+auto compress(LzmaAction action) -> void;
components/core/src/clp/streaming_compression/lzma/Decompressor.hpp (3)

9-9: Remove unnecessary inclusion of <zlib.h>.

The <zlib.h> header included at line 9 may be unnecessary since this decompressor uses LZMA compression. Eliminating unused headers can reduce dependencies and improve compile times.

Apply this diff to remove the unnecessary include:

-#include <zlib.h>

61-62: Complete or remove the incomplete Doxygen comment.

There is an incomplete Doxygen comment starting at line 61 without any description. To maintain clear documentation, please complete the comment or remove it if it's not needed.

Apply this diff to complete the comment:

 /**
+ * Reads an exact number of bytes from the decompressor into the buffer.
+ * @param buf The buffer to read data into.
+ * @param num_bytes_to_read The exact number of bytes to read.
+ * @param num_bytes_read The actual number of bytes read.
 */
 void exact_read(char* buf, size_t num_bytes_to_read, size_t& num_bytes_read);

89-90: Correct the reference to the appropriate method in the comment.

The comment at line 89 incorrectly references streaming_compression::zstd::Decompressor::try_seek_from_begin, which is not relevant to the LZMA decompressor. It should refer to the appropriate LZMA method or be updated to accurately reflect the functionality.

Apply this diff to correct the comment:

- * @return Same as streaming_compression::zstd::Decompressor::try_seek_from_begin
+ * @return Same as Decompressor::try_seek_from_begin
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1)

86-86: Use LZMA_STREAM_INIT for proper initialization of m_compression_stream

Instead of using memset to initialize m_compression_stream, it's recommended to use the LZMA_STREAM_INIT macro provided by the LZMA library. This ensures all fields are correctly initialized, including any internal pointers, and is the preferred method according to the LZMA documentation.

Apply this diff to initialize m_compression_stream correctly:

-    memset(m_compression_stream.get(), 0, sizeof(LzmaStream));
+    *m_compression_stream = LZMA_STREAM_INIT;
components/core/src/clp/streaming_compression/lzma/Decompressor.cpp (2)

113-113: Correct error messages to reference LZMA functions

The error messages incorrectly mention inflate(), which is associated with zlib. Since we're using LZMA, please update the messages to reference lzma_code() to accurately reflect the context.

Apply this diff to fix the error messages:

-                    SPDLOG_ERROR("streaming_compression::lzma::Decompressor inflate() ran out of memory");
+                    SPDLOG_ERROR("streaming_compression::lzma::Decompressor lzma_code() ran out of memory");

...

-                    SPDLOG_ERROR("inflate() returned an unexpected value - {}.", int(return_value));
+                    SPDLOG_ERROR("lzma_code() returned an unexpected value - {}.", int(return_value));

Also applies to: 117-117


130-130: Update comment to reflect LZMA instead of ZStd

The comment references 'ZStd', but this is an LZMA decompressor. Please correct the comment to avoid confusion.

Apply this diff to fix the comment:

-            // ZStd has no way for us to seek back to the desired position, so just reset the stream
+            // LZMA has no way for us to seek back to the desired position, so just reset the stream
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 7aea626 and 707c412.

📒 Files selected for processing (14)
  • components/core/.clang-format (1 hunks)
  • components/core/CMakeLists.txt (4 hunks)
  • components/core/src/clp/streaming_compression/Constants.hpp (1 hunks)
  • components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1 hunks)
  • components/core/src/clp/streaming_compression/lzma/Compressor.hpp (1 hunks)
  • components/core/src/clp/streaming_compression/lzma/Constants.hpp (1 hunks)
  • components/core/src/clp/streaming_compression/lzma/Decompressor.cpp (1 hunks)
  • components/core/src/clp/streaming_compression/lzma/Decompressor.hpp (1 hunks)
  • components/core/tests/test-StreamingCompression.cpp (2 hunks)
  • components/core/tools/scripts/lib_install/liblzma.sh (1 hunks)
  • components/core/tools/scripts/lib_install/ubuntu-focal/install-packages-from-source.sh (1 hunks)
  • components/core/tools/scripts/lib_install/ubuntu-focal/install-prebuilt-packages.sh (1 hunks)
  • components/core/tools/scripts/lib_install/ubuntu-jammy/install-packages-from-source.sh (1 hunks)
  • components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • components/core/.clang-format
  • components/core/src/clp/streaming_compression/lzma/Constants.hpp
🧰 Additional context used
🪛 cppcheck (2.10-2)
components/core/src/clp/streaming_compression/lzma/Decompressor.cpp

[error] 41-41: Rethrowing current exception with 'throw;', it seems there is no current exception to rethrow. If there is no current exception this calls std

(rethrowNoCurrentException)


[error] 45-45: Rethrowing current exception with 'throw;', it seems there is no current exception to rethrow. If there is no current exception this calls std

(rethrowNoCurrentException)

🔇 Additional comments (11)
components/core/src/clp/streaming_compression/Constants.hpp (2)

10-10: Well-structured enum value addition!

The chosen value (0x20) follows good practices:

  • Maintains consistent power-of-2 spacing with existing values
  • Leaves room for future compression types between values
  • Follows the established naming convention

10-10: Verify enum value handling across the codebase

Let's ensure the new LZMA enum value is properly handled in all relevant code paths.

✅ Verification successful

Let me gather more information about the CompressorType enum usage and switch statements.


Based on the comprehensive analysis of the codebase, I can now provide a final response:

LZMA enum value is properly integrated and handled

The new LZMA enum value (0x20) is correctly integrated into the codebase with proper handling:

  • The CompressorType enum is consistently defined across namespaces (clp::streaming_compression)
  • LZMA implementation follows the same pattern as existing compressors (ZSTD, Passthrough)
  • No switch statements directly use CompressorType enum, as the type is handled through class inheritance
  • Proper initialization of LZMA compressor/decompressor classes using CompressorType::LZMA
  • Test cases include LZMA compression alongside other compression types
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find potential switch statements and enum handling code that might need updates

# Search for switch statements handling CompressorType
echo "Checking for switch statements handling CompressorType..."
rg -l "switch.*CompressorType" 

# Search for serialization/deserialization code
echo "Checking for serialization/deserialization code..."
rg -l "serialize.*CompressorType|deserialize.*CompressorType"

# Look for direct CompressorType comparisons
echo "Checking for CompressorType comparisons..."
rg -l "CompressorType::(ZSTD|Passthrough)"

Length of output: 989


Script:

#!/bin/bash
# Search for CompressorType enum usage with context
echo "Checking CompressorType enum usage with context..."
rg -A 5 "CompressorType" 

# Search for switch statements in compression-related files
echo "Checking switch statements in compression files..."
fd -e cpp -e hpp | grep -i "compress" | xargs rg "switch.*\{" -A 5

# Look for factory or creation methods that might use CompressorType
echo "Checking factory methods..."
rg "create.*Compressor|make.*Compressor" -A 5

Length of output: 26592

components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh (1)

23-23: LGTM! Verify package availability in Ubuntu Jammy.

The addition of liblzma-dev is correct and properly placed in alphabetical order.

✅ Verification successful

Package addition is properly integrated and verified

The liblzma-dev package addition is well-integrated into the existing LZMA compression workflow:

  • Package exists in Ubuntu Jammy repositories
  • Consistent with Ubuntu Focal configuration
  • Properly supports the LZMA compression option in CMakeLists.txt
  • Aligns with existing LZMA build infrastructure (liblzma.sh)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify liblzma-dev package availability and its dependencies in Ubuntu Jammy
# Expected: Package should be available and not have conflicting dependencies

# Check if the package exists in Ubuntu Jammy repositories
curl -s "https://packages.ubuntu.com/jammy/liblzma-dev" | grep -q "Page not found" && echo "Package not found" || echo "Package exists"

# Get package details including version and dependencies
curl -s "https://packages.ubuntu.com/jammy/liblzma-dev" | grep -A 5 "Package:"

Length of output: 300


Script:

#!/bin/bash
# Let's verify the package dependencies and version availability more thoroughly
apt-cache policy liblzma-dev 2>/dev/null || echo "Package info not available in sandbox"

# Check if the package is already used elsewhere in the codebase
rg -l "liblzma-dev" --type sh

# Look for any LZMA related build dependencies or configurations
rg -l "lzma" --type cmake --type make --type sh

Length of output: 787


Script:

#!/bin/bash
# Let's check the CMakeLists.txt to understand LZMA integration
cat components/core/CMakeLists.txt | grep -A 5 -B 5 "lzma"

# Check the content of liblzma.sh to understand existing LZMA setup
cat components/core/tools/scripts/lib_install/liblzma.sh

# Check if the package is also used in Focal
cat components/core/tools/scripts/lib_install/ubuntu-focal/install-prebuilt-packages.sh | grep "liblzma-dev"

Length of output: 3455

components/core/tools/scripts/lib_install/ubuntu-jammy/install-packages-from-source.sh (1)

14-14: Verify LZMA installation script and dependencies

The addition of LZMA library installation looks good, but let's verify a few important aspects:

✅ Verification successful

Let me gather more information about the liblzma.sh script content and its error handling:


LZMA installation script verified and properly configured

The installation script for LZMA 5.4.6 has been thoroughly verified and shows:

  • Proper error handling with set -e and set -u
  • Secure download from the official tukaani-project repository
  • Appropriate privilege management
  • CMake configuration with position-independent code
  • Parallel build support
  • Clean installation process

The script follows the project's established patterns and integrates well with the existing build system. The CMake configuration properly requires and validates LZMA presence.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify LZMA installation script and its dependencies

# Check if liblzma.sh exists and is executable
echo "Checking liblzma.sh existence and permissions..."
fd --type f "liblzma.sh" --exec ls -l {}

# Look for version constraints or dependencies in build files
echo -e "\nChecking for version constraints..."
rg -g "CMakeLists.txt" --multiline "find_package\(.*LZMA.*\)" -A 5

Length of output: 872


Script:

#!/bin/bash
# Check the content of liblzma.sh for error handling and version validation
echo "Checking liblzma.sh content..."
cat ./components/core/tools/scripts/lib_install/liblzma.sh

# Check if there are any version-specific dependencies or constraints in the codebase
echo -e "\nChecking for LZMA version references..."
rg "LZMA.*5\.4" --type cpp --type cmake

Length of output: 1777

components/core/tools/scripts/lib_install/ubuntu-focal/install-packages-from-source.sh (1)

17-17: Verify LZMA version compatibility and security

Please ensure that LZMA version 5.4.6 meets the project's requirements and has no known vulnerabilities.

✅ Verification successful

Let me gather more information about LZMA usage and dependencies in the project.


LZMA 5.4.6 is compatible with the project requirements

Based on the codebase analysis:

  • LZMA is used as an optional general-purpose compressor in the project
  • The project uses CMake's find_package(LibLZMA REQUIRED) without version constraints
  • No known vulnerabilities were found for LZMA 5.4.6
  • The installation script location appears to be incorrect as liblzma.sh was not found at the expected path

The version 5.4.6 is suitable for use, but please ensure the installation script is properly placed at components/core/tools/scripts/lib_install/ubuntu-focal/liblzma.sh before proceeding.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for security advisories and compatibility issues with LZMA 5.4.6

# Check for CVEs in the National Vulnerability Database
curl -s "https://services.nvd.nist.gov/rest/json/cves/2.0?keywordSearch=liblzma%205.4.6" | jq '.vulnerabilities[].cve.descriptions[].value'

# Check if any other components in the project specify LZMA version requirements
rg -i "lzma.*version|version.*lzma" --type cmake --type=txt

Length of output: 302


Script:

#!/bin/bash
# Check for LZMA dependencies and usage
rg -i "lzma" --type cmake -A 3 -B 3

# Check the liblzma installation script
cat components/core/tools/scripts/lib_install/ubuntu-focal/liblzma.sh

# Check if there are any specific LZMA version requirements in package configs
fd -e pc -e cmake -x cat {} \; | rg -i "lzma"

Length of output: 4152

components/core/tools/scripts/lib_install/ubuntu-focal/install-prebuilt-packages.sh (2)

23-23: LGTM! The LZMA development package is correctly added.

The addition of liblzma-dev is properly positioned in alphabetical order and aligns with the script's existing package installation pattern.


23-23: Consider documenting Ubuntu version compatibility.

Since this script is specifically for Ubuntu Focal (20.04), we should verify and document LZMA package compatibility for other Ubuntu versions that might be supported by the project.

components/core/tests/test-StreamingCompression.cpp (1)

18-19: LGTM! Headers are properly organized.

The LZMA compression headers follow the established pattern and are correctly placed with other compression-related includes.

components/core/src/clp/streaming_compression/lzma/Decompressor.hpp (2)

69-69: Verify the correctness of the method reference in the comment.

The comment at line 69 mentions ReaderInterface::try_read_exact_length, but it's unclear if this is the correct method reference. Please ensure that this comment accurately describes the return behaviour and references the appropriate method.


146-146: ⚠️ Potential issue

Ensure proper initialisation and cleanup of m_decompression_stream.

The member variable m_decompression_stream is a raw pointer initialised to nullptr. To prevent potential memory leaks, ensure that it is properly allocated during initialisation and that all resources are released in the destructor.

Would you like assistance in reviewing the resource management for m_decompression_stream?

components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1)

20-75: Initialization of the LZMA encoder is correctly implemented

The init_lzma_encoder function properly initializes the LZMA encoder with the specified compression level and dictionary size. Error handling for initialization failures is thorough and follows best practices.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
components/core/CMakeLists.txt (3)

234-235: Fix inconsistent variable name in status message

The status message uses Lzma instead of the package name LibLZMA:

-    message(STATUS "Found Lzma ${LIBLZMA_VERSION_STRING}")
+    message(STATUS "Found LibLZMA ${LIBLZMA_VERSION_STRING}")

241-241: Remove redundant debug message

The include directory message is redundant since CMake already logs this information at a more appropriate debug level.

-message("Lzma Include Dir: ${LIBLZMA_INCLUDE_DIRS}")

231-232: Add support for static/shared LZMA library selection

The TODO comment indicates missing support for selecting between static and shared LZMA libraries. This should be implemented to maintain consistency with other library configurations in the project.

Would you like me to help create a FindLZMA.cmake module that supports static/shared library selection?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 89b5707 and c646cea.

📒 Files selected for processing (4)
  • components/core/CMakeLists.txt (4 hunks)
  • components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1 hunks)
  • components/core/src/clp/streaming_compression/lzma/Compressor.hpp (1 hunks)
  • components/core/tools/scripts/lib_install/liblzma.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • components/core/src/clp/streaming_compression/lzma/Compressor.hpp
  • components/core/tools/scripts/lib_install/liblzma.sh
🔇 Additional comments (3)
components/core/CMakeLists.txt (1)

481-485: Missing test coverage for LZMA compression implementation

The LZMA source files have been added to the test suite in CMakeLists.txt, but there are no corresponding test cases in the test-StreamingCompression.cpp file. This needs to be addressed to ensure the reliability of the LZMA compression functionality.

components/core/src/clp/streaming_compression/lzma/Compressor.cpp (2)

173-183: ⚠️ Potential issue

Prevent potential infinite loops by handling LZMA_BUF_ERROR appropriately

In the compress function, when lzma_code returns LZMA_BUF_ERROR, the current implementation continues the loop without taking corrective action. This could lead to an infinite loop, as LZMA_BUF_ERROR indicates that no progress can be made due to insufficient input or output buffer space.

Consider handling LZMA_BUF_ERROR explicitly to prevent potential infinite loops.

Apply this diff to handle LZMA_BUF_ERROR:

             switch (rc) {
                 case LZMA_OK:
                     break;
-                case LZMA_BUF_ERROR:
-                    break;
+                case LZMA_BUF_ERROR:
+                    // No progress can be made; handle or report the error
+                    SPDLOG_ERROR("Compression stalled: LZMA_BUF_ERROR encountered.");
+                    throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
                 case LZMA_STREAM_END:
                     hit_stream_end = true;
                     break;

207-213: 🛠️ Refactor suggestion

Handle potential exceptions from m_compressed_stream_file_writer->write

Currently, the code assumes that the write operation always succeeds. To enhance robustness, consider handling any exceptions or errors that may occur during the write operation to ensure that failures are properly reported and managed.

Apply this diff to handle possible write errors:

 auto Compressor::pipe_data() -> void {
+    try {
         m_compressed_stream_file_writer->write(
             size_checked_pointer_cast<char>(m_compressed_stream_block_buffer.data()),
             cCompressedStreamBlockBufferSize - m_compression_stream.avail_out
         );
+    } catch (const std::exception& e) {
+        SPDLOG_ERROR("Failed to write compressed data: {}", e.what());
+        throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+    }
     m_compression_stream.next_out = m_compressed_stream_block_buffer.data();
     m_compression_stream.avail_out = cCompressedStreamBlockBufferSize;
 }

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (4)
components/core/tests/test-StreamingCompression.cpp (2)

59-61: Add documentation for intentionally omitted decompressor

While the LZMA test section follows the established pattern, it's worth documenting why the decompressor is intentionally not initialized here, especially since it differs from other test sections.

Consider adding a comment like:

 SECTION("LZMA compression") {
+    // Note: Decompressor intentionally not initialized as it's handled separately
     compressor = std::make_unique<clp::streaming_compression::lzma::Compressor>();
 }

82-85: Consider adding test reporting for skipped decompression

While the null check correctly handles missing decompressors, it silently skips the decompression test. Consider using Catch2's INFO or WARN to report when decompression is skipped.

 if (nullptr == decompressor) {
+    WARN("Skipping decompression test - decompressor not initialized");
     boost::filesystem::remove(compressed_file_path);
     return;
 }
components/core/CMakeLists.txt (2)

241-241: Remove redundant debug message

The include directory message is redundant since CMake already logs this information when the package is found.

-message("Lzma Include Dir: ${LIBLZMA_INCLUDE_DIRS}")

231-232: Add support for static/shared LZMA library selection

The TODO comment indicates missing support for selecting between static and shared LZMA libraries. This should be implemented to maintain consistency with other compression libraries in the project.

Would you like me to help create a FindLibLZMA.cmake module that supports static/shared library selection?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between c646cea and c91e5fb.

📒 Files selected for processing (2)
  • components/core/CMakeLists.txt (4 hunks)
  • components/core/tests/test-StreamingCompression.cpp (3 hunks)
🔇 Additional comments (4)
components/core/tests/test-StreamingCompression.cpp (2)

18-18: LGTM! Header inclusion follows project conventions

The LZMA compressor header is properly included alongside other compression-related headers.


59-61: Verify LZMA-specific test scenarios

While the test covers basic compression functionality, consider whether LZMA-specific scenarios need testing:

  • Different compression levels
  • Dictionary size variations
  • Error handling for invalid LZMA streams
✅ Verification successful

Let me gather more information about the LZMA implementation and test coverage.


Test coverage appears adequate for LZMA compression

The implementation already includes essential LZMA-specific configurations and error handling:

  • Compression level configuration via init_lzma_encoder
  • Dictionary size customization
  • Error handling for LZMA stream operations
  • CRC64 checksum verification

The basic test case is sufficient as it exercises the core compression path, which internally handles these configurations. The implementation's error handling is covered through the LZMA stream workflow.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any existing LZMA-specific test cases
rg -l "LZMA" --type cpp --glob "test-*.cpp"

# Check for LZMA configuration options in source
ast-grep --pattern 'class $_ {
  $$$
  // LZMA specific configuration
  $$$
}'

Length of output: 176


Script:

#!/bin/bash
# Check the LZMA compressor implementation and available configuration options
rg -A 10 "class.*[Ll]zma.*Compressor" --type cpp

# Look for LZMA-specific test cases in more detail
rg -B 2 -A 10 "SECTION.*LZMA" components/core/tests/test-StreamingCompression.cpp

# Search for any LZMA configuration or parameter settings
rg "lzma.*\(" --type cpp

Length of output: 2117

components/core/CMakeLists.txt (2)

235-235: Fix incorrect variable name in status message

The status message uses Lzma instead of the package name LibLZMA.


481-483: LGTM! Source files and library linking look good

The LZMA source files are properly organized in the streaming_compression directory, and the library is correctly linked in the target_link_libraries command.

Also applies to: 571-571

@Bill-hbrhbr Bill-hbrhbr requested a review from haiqi96 November 27, 2024 20:48
@Bill-hbrhbr Bill-hbrhbr requested a review from haiqi96 November 30, 2024 07:18
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (3)

31-35: Consider simplifying the comment block.

The comment about XZ Embedded could be more concise while maintaining its informative value.

-    // Initialize the encoder using a preset. Set the integrity to check
-    // to CRC64, which is the default in the xz command line tool. If
-    // the .xz file needs to be decompressed with XZ Embedded, use
-    // LZMA_CHECK_CRC32 instead.
+    // Initialize encoder with CRC64 integrity check (use LZMA_CHECK_CRC32 for XZ Embedded compatibility)

171-174: Document LZMA_BUF_ERROR handling strategy.

The code handles LZMA_BUF_ERROR but should document why it's treated the same as LZMA_OK.

             case LZMA_OK:
             case LZMA_BUF_ERROR:
+                // LZMA_BUF_ERROR indicates temporary buffer condition, safe to continue
                 break;

1-207: Consider adding performance optimizations for large datasets.

The current implementation could benefit from:

  1. Buffer size tuning based on input characteristics
  2. Parallel compression for large datasets
  3. Memory-mapped I/O for large files
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between c91e5fb and 26b0663.

📒 Files selected for processing (2)
  • components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1 hunks)
  • components/core/src/clp/streaming_compression/lzma/Compressor.hpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • components/core/src/clp/streaming_compression/lzma/Compressor.hpp
🔇 Additional comments (5)
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (5)

1-17: LGTM! Header organization and includes are well-structured.

The includes are logically organized into standard libraries, external dependencies, and project-specific headers.


70-93: LGTM! Well-structured initialization with proper validation.

The method properly validates the initialization state and compression level before setting up the compression stream and buffers.


95-108: LGTM! Proper resource cleanup and finalization.

The method ensures proper cleanup of resources and follows RAII principles.


110-131: ⚠️ Potential issue

Add error handling for the compression operation.

The write operation should handle potential compression failures.

     run_lzma(LZMA_RUN);
+    if (!m_compression_stream_is_flushed) {
+        throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+    }

199-206: ⚠️ Potential issue

Add exception handling for file write operations.

The write operation could fail and should be wrapped in a try-catch block.

 auto Compressor::pipe_data() -> void {
+    try {
         m_compressed_stream_file_writer->write(
                 clp::size_checked_pointer_cast<char>(m_compressed_stream_block_buffer.data()),
                 cCompressedStreamBlockBufferSize - m_compression_stream.avail_out
         );
+    } catch (const std::exception& e) {
+        SPDLOG_ERROR("Failed to write compressed data: {}", e.what());
+        throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+    }
     m_compression_stream.next_out = m_compressed_stream_block_buffer.data();
     m_compression_stream.avail_out = cCompressedStreamBlockBufferSize;
 }

@haiqi96
Copy link
Contributor

haiqi96 commented Dec 2, 2024

  • I added lib install scripts for Ubuntu machines.

Sorry for missing this question. Yes I think you need to look into CentOS and MacOS. It looks like you have already added the dependency for centos.

Actually I would expect the build of MacOS to fail because LZMA library is not installed, and the updated Unittest should require the lzma dependency. Maybe it means the MacOS already comes with LZMA dependencies?

@haiqi96
Copy link
Contributor

haiqi96 commented Dec 5, 2024

Added my last batch of comments. After you fix them, can you ask David to do a final check?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (4)
components/core/src/clp/streaming_compression/lzma/Compressor.hpp (4)

20-30: Add documentation for the exception class.

The OperationFailed exception class should include documentation explaining when this exception is thrown and what the error codes represent.

Add this documentation above the class declaration:

+    /**
+     * Exception thrown when LZMA compression operations fail
+     * @throws When LZMA operations return unexpected error codes or invalid states
+     */
     class OperationFailed : public TraceableException {

47-52: Enhance write method documentation.

The documentation should specify potential exceptions and behaviour when data is null or length is zero.

Add these details to the documentation:

     /**
      * Writes the given data to the compressor
-     * @param data
-     * @param data_length
+     * @param data Pointer to the data to compress
+     * @param data_length Length of the data in bytes
+     * @throw OperationFailed if LZMA compression fails
+     * @throw std::invalid_argument if data is null and length is non-zero
      */

91-91: Document the buffer size constant choice.

Add a comment explaining why 4KiB was chosen as the buffer size and any performance implications.

-    static constexpr size_t cCompressedStreamBlockBufferSize{4096};  // 4KiB
+    // 4KiB buffer size chosen to match typical page size for optimal I/O performance
+    static constexpr size_t cCompressedStreamBlockBufferSize{4096};

126-126: Document the default dictionary size.

Add a comment explaining the choice of default dictionary size and its impact on compression ratio vs memory usage.

-    size_t m_dict_size{cDefaultDictionarySize};
+    // Default dictionary size balancing compression ratio and memory usage
+    size_t m_dict_size{cDefaultDictionarySize};
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 8ab0653 and c436f21.

📒 Files selected for processing (2)
  • components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1 hunks)
  • components/core/src/clp/streaming_compression/lzma/Compressor.hpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • components/core/src/clp/streaming_compression/lzma/Compressor.cpp
🔇 Additional comments (2)
components/core/src/clp/streaming_compression/lzma/Compressor.hpp (2)

1-15: LGTM! Header organization follows best practices.

The includes are well-organized, following the convention of grouping system headers, external dependencies, and project headers with appropriate spacing.


125-125: ⚠️ Potential issue

Initialize LZMA stream properly.

The m_compression_stream member variable should be initialized with LZMA_STREAM_INIT to ensure proper initialization of all fields.

-    lzma_stream m_compression_stream;
+    lzma_stream m_compression_stream = LZMA_STREAM_INIT;

Likely invalid or redundant comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
components/core/src/clp/streaming_compression/lzma/Compressor.hpp (3)

27-29: Enhance error message specificity

The error message could be more descriptive by including the specific operation that failed.

-            return "streaming_compression::lzma::Compressor operation failed";
+            return "LZMA compression operation failed: " + this->get_error_code_string();

48-52: Enhance method documentation

The write method's documentation is missing parameter descriptions and throw specifications.

     /**
      * Writes the given data to the compressor
-     * @param data
-     * @param data_length
+     * @param data Pointer to the data to be compressed
+     * @param data_length Length of the data in bytes
+     * @throw OperationFailed if the compression operation fails
      */

94-101: Enhance error handling documentation

The documentation should specify which LZMA error codes can trigger the OperationFailed exception.

     /**
      * Invoke lzma_code() repeatedly with LZMA_RUN until the input is exhausted
      *
      * At the end of the workflow, the last bytes of encoded data may still be buffered in the LZMA
      * stream and thus not immediately available at the output block buffer.
      *
      * Assumes input stream and output block buffer are both in valid states.
-     * @throw `OperationFailed` if LZMA returns an unexpected error value
+     * @throw OperationFailed with ErrorCode_Compression if LZMA returns LZMA_MEM_ERROR
+     * @throw OperationFailed with ErrorCode_Compression if LZMA returns LZMA_DATA_ERROR
+     * @throw OperationFailed with ErrorCode_Compression if LZMA returns LZMA_OPTIONS_ERROR
      */
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between c436f21 and 7bd34d2.

📒 Files selected for processing (2)
  • components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1 hunks)
  • components/core/src/clp/streaming_compression/lzma/Compressor.hpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • components/core/src/clp/streaming_compression/lzma/Compressor.cpp
🔇 Additional comments (3)
components/core/src/clp/streaming_compression/lzma/Compressor.hpp (3)

32-44: LGTM! Special member functions are well-defined

The class correctly implements the rule of five with appropriate move semantics and deleted copy operations.


125-125: ⚠️ Potential issue

Initialize LZMA stream properly

The LZMA stream should be initialized with LZMA_STREAM_INIT to ensure proper initialization of all fields.

-    lzma_stream m_compression_stream;
+    lzma_stream m_compression_stream = LZMA_STREAM_INIT;

Likely invalid or redundant comment.


7-7: Consider platform-specific LZMA header inclusion

The direct inclusion of <lzma.h> might need platform-specific handling or fallback mechanisms.

Consider:

  1. Adding platform-specific include guards
  2. Using a common header that handles platform differences
  3. Adding CMake configuration checks for LZMA availability

@Bill-hbrhbr Bill-hbrhbr requested a review from haiqi96 December 6, 2024 14:11
@Bill-hbrhbr
Copy link
Contributor Author

@haiqi96 Fixed the comments
@davidlion Can you do a round of review when you get the chance? Thank you

Copy link
Member

@davidlion davidlion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets do a pass to try and:

  1. Remove excessive function body comments
  2. Add periods to complete sentences.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (3)
components/core/CMakeLists.txt (3)

14-14: Maintain alphabetical ordering of options.

The options in the STRINGS property should be alphabetically ordered for consistency.

-set_property(CACHE GENERAL_COMPRESSOR PROPERTY STRINGS lzma passthrough zstd)
+set_property(CACHE GENERAL_COMPRESSOR PROPERTY STRINGS lzma passthrough zstd)

233-233: Add version requirement for LibLZMA.

For consistency with other package requirements and to ensure compatibility, specify a minimum version requirement for LibLZMA.

-find_package(LibLZMA REQUIRED)
+find_package(LibLZMA 5.2.5 REQUIRED)

231-232: Consider implementing static linking support for LZMA.

The TODO comment indicates that static linking support for LZMA is needed. This should be addressed to maintain consistency with other libraries that support static linking.

Would you like me to help implement the static linking support for LZMA or create a GitHub issue to track this task?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c530f92 and 1c5efcd.

📒 Files selected for processing (2)
  • components/core/CMakeLists.txt (4 hunks)
  • components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • components/core/src/clp/streaming_compression/lzma/Compressor.cpp
🔇 Additional comments (1)
components/core/CMakeLists.txt (1)

234-235: Fix incorrect variable name in status message.

The status message uses LIBLZMA_FOUND instead of the package name LibLZMA.

-if(LIBLZMA_FOUND)
-    message(STATUS "Found Lzma ${LIBLZMA_VERSION_STRING}")
+if(LIBLZMA_FOUND)
+    message(STATUS "Found LibLZMA ${LIBLZMA_VERSION_STRING}")

components/core/CMakeLists.txt Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1)

25-25: Correct grammatical error in comment

In the comment on line 25, change "resets" to "reset" to agree with the plural subject "calls".

components/core/CMakeLists.txt (1)

237-237: Use consistent library naming in status messages

For consistency with other libraries, consider updating the status message to 'Found LibLZMA' and ensure the correct variable is used for the version string.

Apply this diff to fix the message:

-        message(STATUS "Found Lzma ${LIBLZMA_VERSION_STRING}")
+        message(STATUS "Found LibLZMA ${LIBLZMA_VERSION_STRING}")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1c5efcd and 829a6b2.

📒 Files selected for processing (4)
  • components/core/CMakeLists.txt (4 hunks)
  • components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1 hunks)
  • components/core/src/clp/streaming_compression/lzma/Compressor.hpp (1 hunks)
  • components/core/tests/test-StreamingCompression.cpp (4 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
components/core/tests/test-StreamingCompression.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/streaming_compression/lzma/Compressor.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

📓 Learnings (1)
components/core/tests/test-StreamingCompression.cpp (1)
Learnt from: Bill-hbrhbr
PR: y-scope/clp#614
File: components/core/tests/test-StreamingCompression.cpp:45-54
Timestamp: 2024-12-04T15:50:37.827Z
Learning: In `components/core/tests/test-StreamingCompression.cpp`, within the `compress` function, compressing the same data repeatedly by passing the same `src` pointer without advancing is intentional to test the compressor with the same data multiple times.
🔇 Additional comments (5)
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (2)

268-271: ⚠️ Potential issue

Handle potential exceptions from m_compressed_stream_file_writer->write

Currently, the write operation assumes success without error handling. To enhance robustness, consider wrapping the write call in a try-catch block to handle any exceptions that may occur, ensuring failures are properly reported and managed.

Apply this diff to handle possible write errors:

+    try {
        m_compressed_stream_file_writer->write(
            clp::size_checked_pointer_cast<char>(m_compressed_stream_block_buffer.data()),
            cCompressedStreamBlockBufferSize - m_compression_stream.avail_out
        );
+    } catch (const std::exception& e) {
+        SPDLOG_ERROR("Failed to write compressed data: {}", e.what());
+        throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+    }

246-251: 🛠️ Refactor suggestion

Review the handling of LZMA_BUF_ERROR in flush_lzma

The LZMA_BUF_ERROR return code can occur during flushing actions and doesn't always signify a fatal error. Throwing an exception here might prematurely terminate the compression process. Consider revising the error handling to properly manage LZMA_BUF_ERROR, possibly allowing the loop to continue or exit gracefully when appropriate.

components/core/src/clp/streaming_compression/lzma/Compressor.hpp (1)

130-130: ⚠️ Potential issue

Initialize m_compression_stream with LZMA_STREAM_INIT for proper setup

Currently, m_compression_stream is default-initialized, which may leave its fields uninitialized and can lead to undefined behaviour. Initializing it with LZMA_STREAM_INIT ensures that all fields are properly set.

Apply this diff to initialize m_compression_stream correctly:

-        lzma_stream m_compression_stream;
+        lzma_stream m_compression_stream = LZMA_STREAM_INIT;

Likely invalid or redundant comment.

components/core/tests/test-StreamingCompression.cpp (1)

132-135: Clarify the LZMA test section to explain the absence of decompression

To enhance understanding, consider updating the section name and adding a comment to indicate that this is a compression-only test for LZMA.

Apply this diff to improve the test:

-        SECTION("LZMA compression") {
+        SECTION("LZMA compression (compression-only test)") {
+            // Note: This is a compression-only test as decompression is not implemented yet for LZMA.
components/core/CMakeLists.txt (1)

483-485: ⚠️ Potential issue

Implement LZMA decompressor to complete compression support

Currently, only the LZMA compressor is implemented. To fully support LZMA compression, please implement the corresponding decompressor.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (2)

111-114: Consider handling LZMA_PROG_ERROR separately.

LZMA_PROG_ERROR indicates programming errors that should never occur in production. Consider adding additional logging or assertions for debugging purposes.


196-219: Consider refactoring encode_lzma for better maintainability.

The encode_lzma method could be split into smaller, more focused functions to improve readability and maintainability. Consider extracting the error handling logic into a separate function.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 829a6b2 and 81e1807.

📒 Files selected for processing (1)
  • components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

🔇 Additional comments (3)
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (3)

97-98: 🛠️ Refactor suggestion

Initialize the error message string to prevent potential undefined behavior.

The message pointer should be initialized with a default value.

-    char const* msg{nullptr};
+    char const* msg{"Unknown error"};

Likely invalid or redundant comment.


264-273: ⚠️ Potential issue

Add error handling for write operations.

The write operation to the file writer should be wrapped in a try-catch block to handle potential I/O errors.

 auto Compressor::flush_stream_output_block_buffer() -> void {
     if (cCompressedStreamBlockBufferSize == m_compression_stream.avail_out) {
         return;
     }
+    try {
         m_compressed_stream_file_writer->write(
                 clp::size_checked_pointer_cast<char>(m_compressed_stream_block_buffer.data()),
                 cCompressedStreamBlockBufferSize - m_compression_stream.avail_out
         );
+    } catch (const std::exception& e) {
+        SPDLOG_ERROR("Failed to write compressed data: {}", e.what());
+        throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+    }
     attach_stream_output_buffer(&m_compression_stream, m_compressed_stream_block_buffer);
 }

Likely invalid or redundant comment.


206-211: ⚠️ Potential issue

Enhance LZMA_BUF_ERROR handling to consider both input and output buffer states.

The current error handling for LZMA_BUF_ERROR should check both input and output buffer availability to accurately identify corruption.

             case LZMA_BUF_ERROR:
+                if (m_compression_stream.avail_in > 0 && m_compression_stream.avail_out > 0) {
                     SPDLOG_ERROR(
                             "LZMA compressor input stream is corrupt. No encoding progress can be made."
                     );
                     throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+                }
+                break;
-                throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);

Likely invalid or redundant comment.

@davidlion davidlion changed the title feat(core): Add Lzma compression workflow feat(core-clp): Add LZMA Compressor implementation and LZMA dependency. Dec 16, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (2)

95-99: ⚠️ Potential issue

Handle LZMA_BUF_ERROR appropriately to prevent infinite loops.

In encode_lzma(), when lzma_code() returns LZMA_BUF_ERROR, check both avail_in and avail_out to decide the next action, avoiding potential infinite loops.

Apply this diff to enhance error handling:

 case LZMA_BUF_ERROR:
-    SPDLOG_ERROR(
-        "LZMA compressor input stream is corrupt. No encoding progress can be made."
-    );
-    throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+    if (m_compression_stream.avail_in == 0 && m_compression_stream.avail_out > 0) {
+        // No more input data; exit the loop
+        break;
+    } else if (m_compression_stream.avail_out == 0) {
+        // Output buffer is full; flush it
+        flush_stream_output_block_buffer();
+    } else {
+        SPDLOG_ERROR("LZMA_BUF_ERROR encountered. No progress can be made.");
+        throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+    }

151-160: ⚠️ Potential issue

Handle exceptions from the write operation.

In flush_stream_output_block_buffer(), wrap the write call in a try-catch block to handle any potential exceptions from the file writer.

Apply this diff to add exception handling:

 auto Compressor::flush_stream_output_block_buffer() -> void {
     if (cCompressedStreamBlockBufferSize == m_compression_stream.avail_out) {
         return;
     }
+    try {
     m_compressed_stream_file_writer->write(
         clp::size_checked_pointer_cast<char>(m_compressed_stream_block_buffer.data()),
         cCompressedStreamBlockBufferSize - m_compression_stream.avail_out
     );
+    } catch (const std::exception& e) {
+        SPDLOG_ERROR("Failed to write compressed data: {}", e.what());
+        throw OperationFailed(ErrorCode_WriteFailed, __FILENAME__, __LINE__);
+    }
     m_lzma_ops.attach_output_buffer();
 }
🧹 Nitpick comments (8)
components/core/src/clp/streaming_compression/lzma/Compressor.hpp (6)

23-24: Provide a more descriptive error message for unsupported compression levels.

When the compression level is out of the supported range, consider including the valid range in the exception message to guide the user.

Apply this diff to enhance the error message:

-    throw OperationFailed(ErrorCode_Unsupported, __FILENAME__, __LINE__);
+    SPDLOG_ERROR("Unsupported compression level {}. Valid range is {} to {}.", compression_level, cMinCompressionLevel, cMaxCompressionLevel);
+    throw OperationFailed(ErrorCode_Unsupported, __FILENAME__, __LINE__);

47-51: Document all parameters in the method description.

The open method's documentation lacks a description for the compression_level parameter. Including it enhances clarity for users of the API.

Apply this diff to add the missing parameter description:

 /**
  * Initializes the compression stream with the given compression level.
  *
  * @param file_writer The file writer to output compressed data.
+ * @param compression_level Compression level for the LZMA encoder (valid range is cMinCompressionLevel to cMaxCompressionLevel).
  */

82-89: Group overloaded methods together for better readability.

The open methods are overloaded but are not placed consecutively. Consider moving the default open method to be adjacent to its overloaded counterpart.


134-135: Provide context for the buffer size constant.

The cCompressedStreamBlockBufferSize is set to 4096. Adding a comment to explain the choice of this buffer size can help maintainability.

Apply this diff to add explanatory comments:

-static constexpr size_t cCompressedStreamBlockBufferSize{4096};  // 4KiB
+/**
+ * Size of the compressed stream block buffer. Chosen as 4KiB to balance memory usage and I/O performance.
+ */
+static constexpr size_t cCompressedStreamBlockBufferSize{4096};  // 4KiB

143-144: Include exception details in method documentation.

The encode_lzma method may throw OperationFailed. Update the documentation to reflect this for better clarity.

Apply this diff to add the exception information:

  /**
   * Invokes lzma_code() repeatedly with LZMA_RUN until the input is exhausted.
   *
+  * @throws OperationFailed if lzma_code() returns an unexpected error.
   */

157-158: Use consistent naming conventions for parameters.

In the method parameters, use consistent naming conventions, such as data_length instead of mixing length and size.

Apply this diff to rename the parameter:

-    auto write(char const* data, size_t data_length) -> void override;
+    auto write(char const* data, size_t length) -> void override;

And update all references to data_length in the method implementations.

components/core/src/clp/streaming_compression/lzma/Compressor.cpp (2)

125-146: Simplify the flushing logic in flush_lzma().

The variable flushed may not be necessary since the loop can be controlled by the return value of lzma_code(). This can improve readability.

Apply this diff to simplify the loop:

-bool flushed{false};
-while (false == flushed) {
+while (true) {
     if (0 == m_compression_stream.avail_out) {
         flush_stream_output_block_buffer();
     }
     auto const rc = lzma_code(&m_compression_stream, flush_action);
     switch (rc) {
         case LZMA_OK:
             break;
         case LZMA_STREAM_END:
             // Flushing completed
-            flushed = true;
+            flush_stream_output_block_buffer();
+            return;
         case LZMA_BUF_ERROR:
             SPDLOG_ERROR(
                 "LZMA compressor input stream is corrupt. No encoding progress can be made."
             );
             throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
         default:
             SPDLOG_ERROR(
                 "lzma_code() returned an unexpected value - {}.",
                 static_cast<int>(rc)
             );
             throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
     }
 }
-flush_stream_output_block_buffer();

202-221: Improve clarity in error handling by reducing code duplication.

In init_lzma_encoder(), the error handling code can be simplified by mapping error codes to messages using a map or function, reducing the switch-case boilerplate.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 81e1807 and 09b73c7.

📒 Files selected for processing (2)
  • components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1 hunks)
  • components/core/src/clp/streaming_compression/lzma/Compressor.hpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/streaming_compression/lzma/Compressor.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

Comment on lines +65 to +75
* Forces all the encoded data buffered by LZMA to be available at output
*/
auto flush() -> void override;

/**
* Tries to get the current position of the write head
* @param pos Position of the write head
* @return ErrorCode_NotInit if the compressor is not open
* @return ErrorCode_Success on success
*/
auto try_get_pos(size_t& pos) const -> ErrorCode override;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Ensure consistency between method documentation and implementation.

The try_get_pos method is documented to return ErrorCode, but the implementation throws exceptions. Consider aligning the implementation with the documentation or vice versa.

Apply this diff to modify the method signature and implementation:

-    auto try_get_pos(size_t& pos) const -> ErrorCode override;
+    auto try_get_pos(size_t& pos) const -> void override;

Update the implementation to throw OperationFailed when not initialized, and adjust the documentation accordingly.

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +169 to +170
m_p->m_compression_stream.avail_out = m_p->m_compressed_stream_block_buffer.size();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Check pointers before dereferencing to prevent null pointer exceptions.

In attach_output_buffer(), ensure that m_p and the buffer pointers are valid before use.

Apply this diff to add pointer checks:

 auto Compressor::LzmaStreamOperations::attach_output_buffer() -> void {
+    if (nullptr == m_p || nullptr == m_p->m_compression_stream.next_out) {
+        throw OperationFailed(ErrorCode_NotInit, __FILENAME__, __LINE__);
+    }
     m_p->m_compression_stream.next_out = m_p->m_compressed_stream_block_buffer.data();
     m_p->m_compression_stream.avail_out = m_p->m_compressed_stream_block_buffer.size();
 }

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +39 to +42
if (m_compression_stream.avail_in > 0) {
SPDLOG_ERROR("Tried to close LZMA compressor with unprocessed input data.");
throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Process any remaining input data before closing the compressor.

In close(), if there is unprocessed input data, consider processing it rather than throwing an exception to ensure all data is compressed.

Apply this diff to process remaining data:

 if (m_compression_stream.avail_in > 0) {
-    SPDLOG_ERROR("Tried to close LZMA compressor with unprocessed input data.");
-    throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+    SPDLOG_WARN("Closing compressor with unprocessed input data. Processing remaining data.");
+    encode_lzma();
 }

Committable suggestion skipped: line range outside the PR's diff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants