Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(chunkv5): Chunk V5 structure, encoding and decoding #14674

Draft
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

shantanualsi
Copy link
Contributor

@shantanualsi shantanualsi commented Oct 30, 2024

Add New Organized Chunk Format (V5)

Overview

This PR introduces a new chunk format (V5) for Loki that organizes log data into distinct sections: log lines, timestamps, and structured metadata. This organization enables more efficient querying by allowing selective decompression of only the required sections.

Changes

  • Added new ChunkFormatV5 constant
  • Implemented organisedHeadBlock for organizing data during writes
  • Added organizedBufferedIterator for efficient reading of organized chunks
  • Implemented serialization/deserialization for the new format
  • Added comprehensive block organization and section management

Current Status

This PR focuses on the storage format implementation and core functionality. The following aspects will be addressed in subsequent PRs:

Coming in Future PRs

  1. Query Path Implementation

    • Integration with the query engine
    • Optimization of query patterns for the new format
    • Implementation of selective section reading based on query type
  2. Performance Benchmarking

    • Comprehensive benchmarks comparing V4 vs V5 formats
    • Memory usage analysis
    • Query performance metrics
    • Write performance impact
  3. Migration Support

    • Tools for migrating existing chunks to V5 format
    • Backward compatibility handling

Design Benefits

  • Enables selective decompression of chunk sections
  • Improves query performance for label and timestamp-based queries
  • Better compression ratios due to grouping similar data types
  • Cleaner separation of concerns in the codebase

Testing

Current PR includes:

  • Unit tests for serialization/deserialization
  • Basic functionality tests for organized blocks
  • Iterator tests

What this PR does / why we need it:

Which issue(s) this PR fixes:
https://github.com/grafana/loki-private/issues/1134

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@shantanualsi shantanualsi changed the title chunkv5 [feat]: Chunk V5 structure, encoding and decoding Nov 4, 2024
@shantanualsi shantanualsi changed the title [feat]: Chunk V5 structure, encoding and decoding feat(chunkv5): Chunk V5 structure, encoding and decoding Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant