Skip to content

etiennedi/vibe-col

Repository files navigation

Is it possible to vibe-code a production-grade database?

This project is an experiment to find out. So far it's looking good. At the 48 commit mark there were only two commits, I had to create manually. The rest is entirely AI-generated.

Follow me on Twitter / X to see how this project develops and benefit from all my learnings.

Everything below this line is AI-generated.


Vibe-Col

Vibe-Col is a high-performance column-oriented storage engine designed for efficient data storage and retrieval.

Features

Storage Capabilities

  • Column-oriented storage: Optimized for analytical workloads with efficient column-wise data access
  • Multi-block support: Store large datasets across multiple blocks
  • Flexible encoding options:
    • Raw encoding (fixed-width)
    • Delta encoding for IDs and values
    • Variable-length (VarInt) encoding for IDs and values
    • Combined Delta + VarInt encoding for maximum compression
  • Metadata caching: Pre-calculated statistics for fast aggregation queries
  • Direct data access: Option to bypass cached metadata for verification

Data Types

  • Support for 64-bit unsigned integers (uint64) for IDs
  • Support for 64-bit signed integers (int64) for values

Compression

  • Significant space savings with variable-length encoding:
    • Up to 8x compression ratio for sequential data
    • 4-5x compression ratio for real-world data with gaps and variability
  • Delta encoding for further compression of sequential or closely related values

Query Capabilities

  • Fast aggregation operations:
    • Count
    • Min
    • Max
    • Sum
    • Average
  • Block-level data access for targeted queries
  • Direct key-value pair retrieval

Performance

  • Efficient encoding and decoding of variable-length integers
  • Optimized block layout for fast data access
  • Metadata-based aggregation for near-instant results on large datasets
  • Option to verify aggregation results by reading all values directly

File Format

  • Compact binary file format
  • Header with file metadata
  • Multiple data blocks
  • Footer with block index for fast random access
  • Checksum support for data integrity

Tools

  • Writer API for creating and populating column files
  • Reader API for querying and analyzing data
  • Command-line tools for data inspection

Usage

The library provides simple APIs for writing and reading column files:

// Writing data
writer, _ := col.NewWriter("data.col", col.WithEncoding(col.EncodingVarIntBoth))
writer.WriteBlock(ids, values)
writer.FinalizeAndClose()

// Reading data
reader, _ := col.NewReader("data.col")
ids, values, _ := reader.GetPairs(0)

// Fast aggregation
result := reader.Aggregate()
fmt.Printf("Count: %d, Min: %d, Max: %d, Sum: %d, Avg: %.2f\n",
    result.Count, result.Min, result.Max, result.Sum, result.Avg)

// Verification by reading all values
directResult := reader.AggregateWithOptions(col.AggregateOptions{SkipPreCalculated: true})

About

A columnar index that is 100% vibe-coded

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages