Implement local disk storage #13

dgrnbrg · 2016-05-08T03:22:33Z

We could build a fast local KV storage for the hitchhiker tree, so that it can run as an embedded persistent Datomic-like DB for local applications.

cddr · 2016-08-15T02:50:17Z

Something like this?

https://gist.github.com/cddr/7eec70360439b4a83fa75ddb5cf94d08

There's definitely some subtleties to garbage collection I've missed but I'm not sure you'd want to do it in precisely the same way anyway.

dgrnbrg · 2016-08-15T16:28:49Z

Wow, this is awesome! I am going to add a few comments to the code--I'm really excited for this :)

The next step to look at will be adding this to bench, which will enable stress tests against this implementation :)

dgrnbrg · 2016-08-15T17:11:47Z

Actually, I can't comment on the code, so I'll write my comments here:

Does ON DELETE CASCADE do refcounting? I was thinking that refcounting would be unique to redis (since there's a low cost to updates thanks to in-mem). If SQLite has cheap refcounting--awesome! Otherwise, I think that a tracing GC (initially using a naive mark & sweep strategy) would work well, because it gives the user control of when to spend resources on GC. Such a GC could be implemented so that it could be not only plugged into the local backend, but even into distributed backends like Cassandra.

Besides thinking about ways to implement GC, I think that benchmarking is the next step, as that'll find any serialization edge cases and performance bottlenecks.

Again, this is so cool! I can't wait to merge it!

cddr · 2016-08-15T17:39:04Z

Actually I added the "on delete cascade" before I fully understood what was going on with ref-counting. For it to work the same as the ref counting, I think I'd need to add a trigger to maintain the :rc and then use a recursive common table expression to delete refs when the rc hits 0. But yeah a tracing GC sounds like it would be more applicable in the general case.

Thanks for taking a look.

dgrnbrg · 2016-08-15T17:44:36Z

The way I imagine implementing a tracing GC in the simplest case is this:

anchor-root will add the current time + the root's ID to a table.

When the GC runs, it will start by deleting all the roots older than 1 hour (or whatever). Then, for the rest of the roots, it'll add them to the scan set. Then, while the scan set is nonempty, we'll choose a key in the scan set, add that key to the keep set, and then add all the children not in the scan or keep set to the scan set. Repeat this until scan is empty. Finally, we'll enumerate all keys in the DB, filter out anything in the keep set, and delete the rest. This is essentially the tricolor tracing GC algorithm.

cddr · 2016-09-05T22:19:56Z

Another thing I'm wondering about is whether the backend should support any of the operations required by the outboard (e.g. named snapshots). On the one hand, it's nice not to have to do it but it seems like a recipe for confusion operationally.

btw: the most recent commit adds a test like the one in redis_test so it seems to be working correctly.

dgrnbrg · 2016-09-06T16:17:58Z

You know, I thought about whether the backend should support storing/naming roots, and at the time, I decided it wasn't necessary. The reason for this is that a named root simple needs to store the addr of the actual data, and the method for doing this could be nearly anything--from a simple Key/Value storage, to a key with a history of several roots.

Ultimately, I thought of the backend protocol as being what's necessary for the data structure, and the naming/storage of the data structures is an application-level API concern. Put another way, I imagine there usually being an intermediate API between the HH tree and the application.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement local disk storage #13

Implement local disk storage #13

dgrnbrg commented May 8, 2016

cddr commented Aug 15, 2016

dgrnbrg commented Aug 15, 2016 •

edited

Loading

dgrnbrg commented Aug 15, 2016

cddr commented Aug 15, 2016

dgrnbrg commented Aug 15, 2016

cddr commented Sep 5, 2016 •

edited

Loading

dgrnbrg commented Sep 6, 2016

Implement local disk storage #13

Implement local disk storage #13

Comments

dgrnbrg commented May 8, 2016

cddr commented Aug 15, 2016

dgrnbrg commented Aug 15, 2016 • edited Loading

dgrnbrg commented Aug 15, 2016

cddr commented Aug 15, 2016

dgrnbrg commented Aug 15, 2016

cddr commented Sep 5, 2016 • edited Loading

dgrnbrg commented Sep 6, 2016

dgrnbrg commented Aug 15, 2016 •

edited

Loading

cddr commented Sep 5, 2016 •

edited

Loading