Sarthak Makhija SarthakMakhija

hey there

👨‍💻 About Me

🔭 Princial architect, Caizin.
✍️ Passionate about Distributed Systems and Storage engines. In my free time, I share my learnings on my blog.
⭐ Contributed to the validation of distributed system patterns in the book Patterns of Distributed Systems by Unmesh Joshi.
⭐ I authored articles on persistent memory for the renowned author Marcin Moskala.
⚡ Personal projects, I am building picodb a relational database inspired by the book: Database design and implementation by Edward Sciore.
📘 I love reading books, and currently I am reading Database design and implementation by Edward Sciore.
📫 Let's connect:

⚙️ Open-source projects

Some of my open-source project include:

Go-LSM

LSM-based key-value store in Go for educational purpose.

Rewrite of the existing workshop code.

Inspired by LSM in a Week.

Exploring LSM with go-lsm

Learn LSM from the ground up: Dive deep into the core concepts of Log-Structured Merge-Trees (LSM) through a practical, well-documented implementation.
Benefit from clean code: Analyze a meticulously crafted codebase that prioritizes simplicity and readability.
Gain confidence with robust tests: Verify the correctness and reliability of the storage engine through comprehensive tests.
Experiment and extend: Customize the code to explore different LSM variations or integrate it into your own projects.

Clearcheck

Write expressive and elegant assertions with ease!

clearcheck is designed to make assertion statements in Rust as clear and concise as possible.

It allows chaining multiple assertions together for a fluent and intuitive syntax, leading to more self-documenting test cases.

let pass_phrase = "P@@sw0rd1 zebra alpha";
pass_phrase.should_not_be_empty()
    .should_have_at_least_length(10)
    .should_contain_all_characters(vec!['@', ' '])
    .should_contain_a_digit()
    .should_not_contain_ignoring_case("pass")
    .should_not_contain_ignoring_case("word");

It has close to 1K downloads.

blast

blast is a load generator for TCP servers, especially if such servers maintain persistent connections. It is implemented in golang. It is used in my current project to do the load testing of the distributed key/value storage engine that we are building.

CacheD

CacheD is a high performance, LFU based in-memory cache in Rust inspired by Ristretto.

#[tokio::test]
async fn put_a_key_value() {
    let cached = CacheD::new(ConfigBuilder::new(COUNTERS, CAPACITY, CACHE_WEIGHT).build());
    let acknowledgement =
            cached.put("topic", "LFU cache").unwrap();
     
     let status = acknowledgement.handle().await;
     assert_eq!(CommandStatus::Accepted, status);
    
     let value = cached.get(&"topic");
     assert_eq!(Some("LFU cache"), value);
}

It has close to 1.9K downloads.

The complete list of my side projects is available on my blog.

⚙️ Talks

Questioning datbase claim: Design patterns of storage engines

I gave a talk on "Questioning database claims: Design patterns of storage engines” at GoConIndia24 on 2nd December.

The idea of the talk was to understand various patterns of storage engines (/key-value storage engines) like persistence (WAL, fsync), efficient retrieval (B+tree, bloom filters, data layouts), efficient ingestion (Sequential IO, LSM, Wisckey) and then question variety of database claims like durability, read optimization, write optimization and pick the right database(s) for our use case. The recording is available here.

🎤 Workshops that I conduct

Gamifying Refactoring

I created the idea of Gamifying refactoring which is run as a game (/mini workshop) in ThoughtWorks. The idea behind this game is to identify code smells, justify each of them by going beyond ilities, finish all of this in a fixed time and win points for your team.

Storage Engine

This hands-on workshop focusses on building a tiny LSM-tree based storage engine. It covers the basics including: Hard disks, blocks, OS page cache, encoding, decoding, endianness, basics of B+Tree and detailed internals of LSM-tree. The LSM-based storage engine code is available here.

🛠️ Languages and Tools

🔥 My Stats

✍️ Blog Posts

Some of my latest blogs include:

Many flavors of Networking IO

The foundation of any networked application hinges on its ability to efficiently handle data exchange. But beneath the surface, there’s a hidden world of techniques for managing this communication. This article dives into various “flavors” of networking IO, exploring the trade-offs associated with each approach. To illustrate various ways applications handle network traffic, we’ll build a TCP server using four distinct approaches: blocking I/O with a single thread, blocking I/O with multiple threads, non-blocking I/O with busy waiting, and a single-threaded event loop. Each approach offers unique advantages and drawbacks, and by constructing a server for each approach, we’ll gain a deeper understanding of their strengths and weaknesses.

Serializable Snapshot Isolation

Ensuring data consistency in the face of concurrent transactions is a critical challenge in database management. Traditional serializable isolation, while guaranteeing data integrity, often suffers from performance bottlenecks due to extensive locking. This article explores Serializable Snapshot Isolation (SSI) that promises the best of both worlds: strong data consistency without sacrificing performance. The article delves into the inner workings of SSI and explore its implementation for a Key/Value storage engine. I will refer to the research paper titled A critique of snapshot isolation .

Cache-Line Hash Table

In the world of multi-core processors, managing concurrent access to data structures is crucial for efficient performance. But frequent updates can trigger a hidden bottleneck: cache coherence traffic. This traffic arises when one core modifies the data another core has cached, forcing updates and invalidation across the system. This article dives into a clever solution: the Cache-Line Hash Table (CLHT). CLHTs are specifically designed to minimize this cache coherence traffic, boosting the speed of concurrent data access. We’ll explore the core ideas behind CLHTs, including:

One Bucket Per CPU Cache-Line: By cleverly aligning buckets with CPU cache line sizes, CLHTs minimize the number of lines written during updates.
In-Place Updates: Instead of shuffling data around, CLHTs update key-value pairs directly within the bucket, reducing memory movement.
Lock-Free Reads: Reads are designed to be lock-free, meaning they can proceed without acquiring locks, further enhancing performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly