system-design-excercises/36-LSM-KVStore at main · Addi-11/system-design-excercises

Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
main.py	main.py

Implement LSM Tree based KV Store

Reference: CS265 Systems Project: An LSM-tree based key-value store

Similar to Word Dictionary Implementation, but easier in python

An LSM (Log-Structured Merge) Tree is a type of key-value store architecture designed for high write throughput and efficient storage management. Widely used in modern databases like LevelDB, RocksDB, and Cassandra. LSM trees optimize for write-heavy workloads by buffering writes in memory and periodically merging them into sorted files on disk.

Working Details

Code Optimization TODO: For range queries and Compaction can be using MIN HEAPS for sort.

MemTable (In-Memory Storage):
- All writes (put, delete) are first written to an in-memory data structure, called the MemTable.
- This structure allows fast writes and reads because it resides in memory.
- The MemTable accumulates key-value pairs until it reaches a size threshold.
Flush to Disk (SSTables):
- When the MemTable becomes full, it is sorted and written to disk as an SSTable (Sorted String Table).
- SSTables are immutable, ordered key-value stores, typically stored as serialized files on disk.
Compaction:
- Over time, multiple SSTables accumulate, potentially leading to redundant data or fragmentation.
- A compaction process merges and reorganizes SSTables, eliminating duplicate or deleted keys and maintaining sorted order for efficient range queries.
Reads (MemTable + SSTables):
- For get operations, the MemTable is checked first since it holds the most recent data.
- If the key is not found, older SSTables are searched in reverse order of creation (newer data is checked first).
Range Queries:
- Range queries scan across all SSTables and the MemTable, in ascending order, overwriting older values with new ones, aggregating matching key-value pairs within the specified range.

Key Features and Advantages of LSM Trees

Write Optimization:
- Writes are buffered in memory, reducing frequent disk I/O.
Efficient Compaction:
- Periodic compaction ensures disk space is reclaimed, and data remains sorted for quick access.
Sequential Disk Writes:
- Writing SSTables is sequential, avoiding the overhead of random I/O.
Read Trade-offs:
- Reads may involve checking multiple SSTables, but techniques like bloom filters can mitigate this overhead.

Advantages of LSM Trees

High write throughput due to sequential writes.
Compact storage with minimal fragmentation.
Good for workloads with frequent writes and infrequent reads.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

36-LSM-KVStore

36-LSM-KVStore

README.md

Implement LSM Tree based KV Store

Working Details

Key Features and Advantages of LSM Trees

Advantages of LSM Trees

Files

36-LSM-KVStore

Directory actions

More options

Directory actions

More options

Latest commit

History

36-LSM-KVStore

Folders and files

parent directory

README.md

Implement LSM Tree based KV Store

Working Details

Key Features and Advantages of LSM Trees

Advantages of LSM Trees