TrieIndexLogDB is a key/value datastore using log-structured files with trie structured indexes, allowing very fast write access and lowering RAM requirements compared to other log-structured datastores which use Hash Table Indices
- Clone repo
git clone https://github.com/yozaam/TrieIndexLogDB.git
- Go inside directory
cd TrieIndexLogDB
- Compile the program
g++ main.cpp -o main.o
- Run the program
./main.o
"Example of Hash Index: Bitcask is an Erlang application that provides an API for storing and retrieving key/value data using log-structured hash tables that provide very fast access."
-
General Architecture for Log Structured DB
- Every create/update appends new key value pair into the file
- maintain the pointer to the latest occurance of a key in the hash table / index
-
Bitcask stores every key in memory, along with the byte offset for that key in disk
-
This limits the no. of keys that may be stored because they all have to fit in RAM
-
My lightwieght storage engine will reduce the space requirements by using a 'Trie' index
(A string Hash (Bitcask) & Trie traversal both have the same asymptotic time complexity but Trie saves a lot of space by avoiding duplication!)
- Divide into separate classfiles
- Implement RestAPI class as well as CLI
- Implement segmentation, by splitting the files (will need to convert offset into (file_segment_no, offset) pair)
- Range Updates (by simultaneous Trie Traversal and sorted Trie Children)