Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(puffin): not to stage uncompressed blob #4333

Merged
merged 2 commits into from
Jul 11, 2024

Conversation

zhongzc
Copy link
Contributor

@zhongzc zhongzc commented Jul 10, 2024

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

  • Remove compress support for inverted index
  • Directly read uncompressed blob within puffin file instead of staging it first

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.

Summary by CodeRabbit

  • Bug Fixes

    • Adjusted test assertion in test_region_usage for improved accuracy.
  • Documentation

    • Updated integration tests to remove unnecessary compress = true configuration.
  • Refactor

    • Simplified and streamlined blob handling logic in test functions for better readability and efficiency.
    • Removed explicit compression parameters and fields, defaulting to compression where applicable.

Note: These changes enhance the performance and maintainability of the application without altering the user interface or user experience.

@zhongzc zhongzc requested review from evenyag, v0y4g3r, waynexia and a team as code owners July 10, 2024 09:30
Copy link
Contributor

coderabbitai bot commented Jul 10, 2024

Walkthrough

This update focuses on enhancing configurations and refactoring components to improve code clarity, maintainability, and performance. Major changes include the removal of the compress field from several structs, adjustments to various method signatures, and updates to trait implementations, specifically adding asynchronous functionality.

Changes

File(s) Change Summary
src/mito2/src/config.rs Removed compress field from InvertedIndexConfig and defaulted it to true
src/mito2/src/engine/basic_test.rs Updated test_region_usage assertion value
src/mito2/src/sst/index.rs, src/mito2/src/sst/index/inverted_index/creator.rs Removed compress parameter and field from various structs and methods
src/mito2/src/sst/index/puffin_manager.rs Reorganized imports and adjusted type aliases
src/puffin/src/file_format/reader/file.rs Added into_blob_reader method to PuffinFileReader
src/puffin/src/puffin_manager.rs Replaced BoxFuture with async fn in BlobGuard trait
src/puffin/src/puffin_manager/file_accessor.rs PuffinFileAccessor trait now requires Reader type to implement Sync
src/puffin/src/puffin_manager/fs_puffin_manager.rs Added 'static lifetime bound to type parameter S in FsPuffinManager implementation
src/puffin/src/puffin_manager/fs_puffin_manager/reader.rs Introduced RandomReadBlob struct and adjusted blob retrieval logic
src/puffin/src/puffin_manager/stager.rs, src/puffin/src/puffin_manager/stager/bounded_stager.rs Updated trait definitions and method signatures, added #[async_trait] to BlobGuard implementation
src/puffin/src/puffin_manager/tests.rs Refactored and simplified test functions
tests-integration/tests/http.rs Removed compress = true configuration setting

In the code's deep, where logic flows,
A rabbit hops where changes grow.
Compress is gone, async appears,
Efficiency brought, we all cheer!
Refactored lines, a cleaner view,
Code renewed, like morning dew.
Celebrate, with joy and glee,
For the code is now clutter-free!


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added the docs-not-required This change does not impact docs. label Jul 10, 2024
@zhongzc zhongzc force-pushed the zhongzc/puffin-not-stage-blob branch from c3b2f3b to 9bb3a40 Compare July 10, 2024 09:32
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Outside diff range, codebase verification and nitpick comments (4)
src/puffin/src/puffin_manager/fs_puffin_manager/reader.rs (2)

62-62: Update documentation to reflect the new RandomReadBlob type.

The type Blob in FsPuffinReader was changed to RandomReadBlob<F>. Ensure the documentation is updated to reflect this change.

- /// `FsPuffinReader` is a `PuffinReader` that provides fs readers for puffin files.
+ /// `FsPuffinReader` is a `PuffinReader` that provides fs readers for puffin files, using `RandomReadBlob` for blob reading.

225-255: Add documentation for the RandomReadBlob struct.

The RandomReadBlob struct is a new addition. Ensure there is sufficient documentation explaining its purpose and usage.

+ /// `RandomReadBlob` is a `BlobGuard` that directly reads the blob from the puffin file.
+ /// This struct is used for performing random reads on blobs within the puffin file.
src/puffin/src/puffin_manager/tests.rs (2)

195-199: Add documentation for the put_blob helper function.

The put_blob helper function is a new addition. Ensure there is sufficient documentation explaining its purpose and usage.

+ /// Puts a blob into the puffin writer.
+ /// This function is used to add a blob to the puffin file.

202-210: Add documentation for the check_blob helper function.

The check_blob helper function is a new addition. Ensure there is sufficient documentation explaining its purpose and usage.

+ /// Checks the contents of a blob in the puffin reader.
+ /// This function is used to verify the contents of a blob in the puffin file.
Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between da0c840 and 9bb3a40.

Files selected for processing (9)
  • src/mito2/src/config.rs (2 hunks)
  • src/mito2/src/engine/basic_test.rs (1 hunks)
  • src/mito2/src/sst/index.rs (1 hunks)
  • src/mito2/src/sst/index/inverted_index/creator.rs (6 hunks)
  • src/mito2/src/sst/index/puffin_manager.rs (2 hunks)
  • src/puffin/src/file_format/reader/file.rs (1 hunks)
  • src/puffin/src/puffin_manager/fs_puffin_manager/reader.rs (6 hunks)
  • src/puffin/src/puffin_manager/tests.rs (4 hunks)
  • tests-integration/tests/http.rs (1 hunks)
Files skipped from review due to trivial changes (2)
  • src/mito2/src/engine/basic_test.rs
  • tests-integration/tests/http.rs
Additional comments not posted (15)
src/mito2/src/sst/index/puffin_manager.rs (2)

24-25: Remove unused imports if they are unnecessary.

Ensure the imports use puffin::puffin_manager::stager::BoundedStager; and use puffin::puffin_manager::{BlobGuard, PuffinManager, PuffinReader}; are necessary. If not, remove them to keep the code clean.

Verification successful

The imports are necessary and should not be removed.

The following imported entities are used in the code:

  • BoundedStager: Used in the type alias SstPuffinManager, in the PuffinManagerFactory struct, and within the PuffinManagerFactory implementation.
  • BlobGuard: Used in the type alias BlobReader.
  • PuffinManager: Used in the type aliases SstPuffinReader and SstPuffinWriter.
  • PuffinReader: Used in the type alias BlobReader.

Therefore, these imports are necessary for the functionality of the code.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify if the imports are used in the codebase.

# Test: Search for the usage of the imported entities. Expect: No unused imports.
rg --type rust -A 5 $'use puffin::puffin_manager::stager::BoundedStager;'
rg --type rust -A 5 $'use puffin::puffin_manager::{BlobGuard, PuffinManager, PuffinReader};'

Length of output: 818


Script:

#!/bin/bash
# Description: Verify if the imports are used in the codebase.

# Search for the usage of BoundedStager, BlobGuard, PuffinManager, and PuffinReader in the file.
rg 'BoundedStager|BlobGuard|PuffinManager|PuffinReader' src/mito2/src/sst/index/puffin_manager.rs

Length of output: 1284


40-42: Ensure type aliases are used consistently across the codebase.

The type aliases SstPuffinManager, SstPuffinReader, SstPuffinWriter, and BlobReader were defined. Ensure they are used consistently across the codebase.

Verification successful

Type aliases are used consistently across the codebase.

The type aliases SstPuffinManager, SstPuffinReader, SstPuffinWriter, and BlobReader are utilized in multiple files, ensuring their consistent usage throughout the codebase.

  • SstPuffinManager: used in src/mito2/src/sst/index.rs and src/mito2/src/sst/index/puffin_manager.rs.
  • SstPuffinReader: used in src/mito2/src/sst/index/puffin_manager.rs.
  • SstPuffinWriter: used in src/mito2/src/sst/index/inverted_index/creator.rs, src/mito2/src/sst/index/fulltext_index/creator.rs, src/mito2/src/sst/index/indexer/finish.rs, and src/mito2/src/sst/index/puffin_manager.rs.
  • BlobReader: used in src/mito2/src/sst/index/inverted_index/applier.rs and src/mito2/src/sst/index/puffin_manager.rs.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the consistent usage of type aliases across the codebase.

# Test: Search for the usage of the type aliases. Expect: Consistent usage.
rg --type rust -A 5 $'type SstPuffinManager ='
rg --type rust -A 5 $'type SstPuffinReader ='
rg --type rust -A 5 $'type SstPuffinWriter ='
rg --type rust -A 5 $'type BlobReader ='

Length of output: 2314


Script:

#!/bin/bash
# Description: Verify the consistent usage of type aliases across the codebase.

# Test: Search for the usage of the type aliases. Expect: Consistent usage.
rg --type rust 'SstPuffinManager'
rg --type rust 'SstPuffinReader'
rg --type rust 'SstPuffinWriter'
rg --type rust 'BlobReader'

Length of output: 5372

src/puffin/src/puffin_manager/fs_puffin_manager/reader.rs (1)

66-81: Ensure proper error handling for blob metadata retrieval.

The method blob retrieves blob metadata and ensures that the blob is not compressed. Ensure proper error handling and logging for this process.

src/puffin/src/puffin_manager/tests.rs (3)

44-59: Ensure comprehensive test coverage for FsPuffinManager.

The test function test_put_get_file ensures basic functionality of FsPuffinManager. Verify that it covers all edge cases and scenarios.

Verification successful

Comprehensive test coverage for FsPuffinManager verified.

The FsPuffinManager is well-tested with multiple scenarios, including various blob data and directory management. The test cases cover a broad range of functionalities, ensuring robustness.

  • File: src/puffin/src/puffin_manager/tests.rs
  • Lines: Multiple instances of FsPuffinManager instantiation and testing.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify comprehensive test coverage for `FsPuffinManager`.

# Test: Search for additional test cases. Expect: Comprehensive test coverage.
rg --type rust -A 5 $'#[tokio::test]'

Length of output: 15682


Script:

#!/bin/bash
# Description: Verify test coverage for `FsPuffinManager` specifically.

# Search for test functions related to `FsPuffinManager` within the `src/puffin/src/puffin_manager/tests.rs` file.
rg 'FsPuffinManager' -A 10 --type rust

Length of output: 10324


Line range hint 169-188:
Ensure comprehensive test coverage for mixed files and directories.

The test function test_put_get_mix_file_dir ensures functionality for mixed files and directories. Verify that it covers all edge cases and scenarios.

Verification successful

Comprehensive Test Coverage Verified

The test_put_get_mix_file_dir function, along with its helper functions, ensures comprehensive test coverage for mixed files and directories. It handles various capacities, compression codecs, and nested directories, ensuring data integrity through read and write operations.

  • test_put_get_mix_file_dir function covers different capacities and compression codecs.
  • Helper functions like put_blob, check_blob, put_dir, and check_dir ensure thorough testing of blob and directory operations.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify comprehensive test coverage for mixed files and directories.

# Test: Search for additional test cases. Expect: Comprehensive test coverage.
rg --type rust -A 5 $'#[tokio::test]'

Length of output: 15682


Script:

#!/bin/bash
# Locate the implementation of the `test_put_get_mix_file_dir` function and review its content.
rg 'async fn test_put_get_mix_file_dir' -A 50

# Check for other tests in the same file to review related test cases.
rg --type rust '#[tokio::test]' -A 5 src/puffin/src/puffin_manager/tests.rs

Length of output: 4305


Script:

#!/bin/bash
# Capture the complete implementation of the `test_put_get_mix_file_dir` function.
rg 'async fn test_put_get_mix_file_dir' -A 200

# Check for other tests in the same file to review related test cases.
rg --type rust '#[tokio::test]' -A 10 src/puffin/src/puffin_manager/tests.rs

Length of output: 12359


64-88: Ensure comprehensive test coverage for multiple blobs.

The test function test_put_get_files ensures functionality for multiple blobs. Verify that it covers all edge cases and scenarios.

src/mito2/src/config.rs (4)

Line range hint 187-216: Removal of compress field from InvertedIndexConfig is appropriate.

The removal simplifies the configuration and the default implementation reflects this change.


Line range hint 225-250: Verify the necessity of the compress field in FulltextIndexConfig.

While the compress field was removed from InvertedIndexConfig, it is still present in FulltextIndexConfig. Ensure that this is consistent with the overall design and necessary for the configuration.


Line range hint 37-143: Verify the integration of InvertedIndexConfig within MitoConfig.

Ensure that the removal of the compress field from InvertedIndexConfig does not affect the overall configuration and functionality of MitoConfig.


Line range hint 149-195: Ensure sanitize method handles updated InvertedIndexConfig.

Verify that the sanitize method in MitoConfig correctly handles the updated InvertedIndexConfig without the compress field.

src/mito2/src/sst/index.rs (3)

Line range hint 95-128: Removal of compress field from IndexerBuilder is appropriate.

The removal simplifies the configuration and the build method reflects this change.


Line range hint 134-161: Ensure build_inverted_indexer handles updated InvertedIndexConfig.

Verify that the build_inverted_indexer method correctly handles the updated InvertedIndexConfig without the compress field.


Line range hint 163-196: Ensure build_fulltext_indexer handles compress field in FulltextIndexConfig.

Verify that the build_fulltext_indexer method correctly handles the presence of the compress field in FulltextIndexConfig.

src/mito2/src/sst/index/inverted_index/creator.rs (2)

Line range hint 77-116: Removal of compress field from SstIndexCreator is appropriate.

The removal simplifies the configuration and the constructor reflects this change.


237-244: Ensure do_finish method handles updated InvertedIndexConfig.

Verify that the do_finish method correctly handles the updated InvertedIndexConfig without the compress field.

src/puffin/src/file_format/reader/file.rs Show resolved Hide resolved
src/puffin/src/puffin_manager/fs_puffin_manager/reader.rs Outdated Show resolved Hide resolved
Copy link

codecov bot commented Jul 10, 2024

Codecov Report

Attention: Patch coverage is 96.11650% with 8 lines in your changes missing coverage. Please review.

Project coverage is 84.91%. Comparing base (52a9a74) to head (a764f27).
Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4333      +/-   ##
==========================================
- Coverage   85.19%   84.91%   -0.29%     
==========================================
  Files        1060     1060              
  Lines      189048   189165     +117     
==========================================
- Hits       161054   160621     -433     
- Misses      27994    28544     +550     

@zhongzc zhongzc changed the title feat(puffin): not to stage blob perf(puffin): not to stage uncompressed blob Jul 10, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 9bb3a40 and a764f27.

Files selected for processing (7)
  • src/puffin/src/puffin_manager.rs (2 hunks)
  • src/puffin/src/puffin_manager/file_accessor.rs (1 hunks)
  • src/puffin/src/puffin_manager/fs_puffin_manager.rs (1 hunks)
  • src/puffin/src/puffin_manager/fs_puffin_manager/reader.rs (5 hunks)
  • src/puffin/src/puffin_manager/stager.rs (1 hunks)
  • src/puffin/src/puffin_manager/stager/bounded_stager.rs (6 hunks)
  • src/puffin/src/puffin_manager/tests.rs (7 hunks)
Files skipped from review as they are similar to previous changes (2)
  • src/puffin/src/puffin_manager/fs_puffin_manager/reader.rs
  • src/puffin/src/puffin_manager/tests.rs
Additional comments not posted (11)
src/puffin/src/puffin_manager/file_accessor.rs (1)

24-24: Ensure Reader type is thread-safe.

The Sync trait has been added to the Reader type, which ensures that the Reader can be safely shared between threads.

src/puffin/src/puffin_manager/fs_puffin_manager.rs (1)

49-49: Ensure type safety with 'static lifetime bound.

The 'static lifetime bound has been added to the S type parameter. This ensures that S does not contain any non-static references, enhancing the safety and stability of the code.

src/puffin/src/puffin_manager/stager.rs (3)

45-45: Changed InitBlobFn to FnOnce.

The InitBlobFn trait has been changed from Fn to FnOnce, indicating that the initialization function is expected to be called only once.


51-51: Changed InitDirFn to FnOnce.

The InitDirFn trait has been changed from Fn to FnOnce, indicating that the initialization function is expected to be called only once.


57-57: Ensure Blob type is thread-safe.

The Blob type in the Stager trait now requires the Sync trait, ensuring that it can be safely shared between threads.

src/puffin/src/puffin_manager.rs (1)

98-98: Modernize async method declaration.

The reader method in the BlobGuard trait now uses an async fn instead of returning a BoxFuture, improving readability and aligning with modern async programming practices.

src/puffin/src/puffin_manager/stager/bounded_stager.rs (5)

130-132: Change Approved: Enhanced flexibility with boxed trait object.

The change to use Box<dyn InitBlobFn + Send + Sync + '_> for the init_fn parameter increases flexibility and maintainability.


165-167: Change Approved: Enhanced flexibility with boxed trait object.

The change to use Box<dyn InitDirFn + Send + Sync + '_> for the init_fn parameter increases flexibility and maintainability.


227-229: Change Approved: Enhanced flexibility with boxed trait object.

The change to use Box<dyn InitBlobFn + Send + Sync + '_> for the init_fn parameter increases flexibility and maintainability.


249-251: Change Approved: Enhanced flexibility with boxed trait object.

The change to use Box<dyn InitDirFn + Send + Sync + '_> for the init_fn parameter increases flexibility and maintainability.


431-433: Change Approved: Converted to async function.

The conversion of the reader function to async improves efficiency in asynchronous contexts.

Copy link
Member

@waynexia waynexia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DLJB

Copy link
Contributor

@evenyag evenyag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🫣 💨

@evenyag evenyag added this pull request to the merge queue Jul 11, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 11, 2024
@zhongzc zhongzc added this pull request to the merge queue Jul 11, 2024
Merged via the queue into GreptimeTeam:main with commit da1ea25 Jul 11, 2024
59 checks passed
@zhongzc zhongzc deleted the zhongzc/puffin-not-stage-blob branch July 11, 2024 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required This change does not impact docs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants