This project is an experiment to find out. So far it's looking good. At the 48 commit mark there were only two commits, I had to create manually. The rest is entirely AI-generated.
Follow me on Twitter / X to see how this project develops and benefit from all my learnings.
Everything below this line is AI-generated.
Vibe-Col is a high-performance column-oriented storage engine designed for efficient data storage and retrieval.
- Column-oriented storage: Optimized for analytical workloads with efficient column-wise data access
- Multi-block support: Store large datasets across multiple blocks
- Flexible encoding options:
- Raw encoding (fixed-width)
- Delta encoding for IDs and values
- Variable-length (VarInt) encoding for IDs and values
- Combined Delta + VarInt encoding for maximum compression
- Metadata caching: Pre-calculated statistics for fast aggregation queries
- Direct data access: Option to bypass cached metadata for verification
- Support for 64-bit unsigned integers (uint64) for IDs
- Support for 64-bit signed integers (int64) for values
- Significant space savings with variable-length encoding:
- Up to 8x compression ratio for sequential data
- 4-5x compression ratio for real-world data with gaps and variability
- Delta encoding for further compression of sequential or closely related values
- Fast aggregation operations:
- Count
- Min
- Max
- Sum
- Average
- Block-level data access for targeted queries
- Direct key-value pair retrieval
- Efficient encoding and decoding of variable-length integers
- Optimized block layout for fast data access
- Metadata-based aggregation for near-instant results on large datasets
- Option to verify aggregation results by reading all values directly
- Compact binary file format
- Header with file metadata
- Multiple data blocks
- Footer with block index for fast random access
- Checksum support for data integrity
- Writer API for creating and populating column files
- Reader API for querying and analyzing data
- Command-line tools for data inspection
The library provides simple APIs for writing and reading column files:
// Writing data
writer, _ := col.NewWriter("data.col", col.WithEncoding(col.EncodingVarIntBoth))
writer.WriteBlock(ids, values)
writer.FinalizeAndClose()
// Reading data
reader, _ := col.NewReader("data.col")
ids, values, _ := reader.GetPairs(0)
// Fast aggregation
result := reader.Aggregate()
fmt.Printf("Count: %d, Min: %d, Max: %d, Sum: %d, Avg: %.2f\n",
result.Count, result.Min, result.Max, result.Sum, result.Avg)
// Verification by reading all values
directResult := reader.AggregateWithOptions(col.AggregateOptions{SkipPreCalculated: true})