How does the distributed part work? #32

sbrl · 2021-01-06T02:48:46Z

When a project mentions 'distributed', I think of things like wesher, Consul, and GlusterFS, which are distributed across multiple hosts.

I'm somewhat confused as to how the distributed nature of this project works? Could you explain this a little better please? How does it handle storing the data? Does each host have to keep a copy of the data, or is it mounted via e.g. NFS? Does the system support fault-tolerance - i.e. if 1 out of 3 hosts is down?

geohot · 2021-01-06T16:47:38Z

The go server master sends 302s to nginx volume servers which hold the data.

It's fault tolerant from volume server or drive failure on read, but not on write. It's not tolerant to the master being down. The read is a redirect, so there's not a single point the data goes through, but the write is not. We do 100x more reads than writes.

The replication is mainly for not losing data, but it gives free benefits of read fault tolerance and a multiplier on read speed.

minikeyvalue has three main purposes:

Huge drive sizes
No data loss
Insane speeds

99% uptime is plenty good for our use case, this is where we keep all the training data. If it has to go down for an hour, it's not a big deal, since it's not user facing.

We are running:

A 2 PB array of spinning disks across 40 machines with 400 drives with 3 replicas.
A 320 TB array of SSDs across 20 machines with 80 drives and no replicas. Crazy bandwidth!

sbrl · 2021-01-07T02:47:07Z

Cool! That does help explain it a bit more. The README was rather light on these details, and it assumed I knew what a volume server was.

Wow, those are some insane figures there!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does the distributed part work? #32

How does the distributed part work? #32

sbrl commented Jan 6, 2021 •

edited

Loading

geohot commented Jan 6, 2021 •

edited

Loading

sbrl commented Jan 7, 2021

How does the distributed part work? #32

How does the distributed part work? #32

Comments

sbrl commented Jan 6, 2021 • edited Loading

geohot commented Jan 6, 2021 • edited Loading

sbrl commented Jan 7, 2021

sbrl commented Jan 6, 2021 •

edited

Loading

geohot commented Jan 6, 2021 •

edited

Loading