You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a project mentions 'distributed', I think of things like wesher, Consul, and GlusterFS, which are distributed across multiple hosts.
I'm somewhat confused as to how the distributed nature of this project works? Could you explain this a little better please? How does it handle storing the data? Does each host have to keep a copy of the data, or is it mounted via e.g. NFS? Does the system support fault-tolerance - i.e. if 1 out of 3 hosts is down?
The text was updated successfully, but these errors were encountered:
The go server master sends 302s to nginx volume servers which hold the data.
It's fault tolerant from volume server or drive failure on read, but not on write. It's not tolerant to the master being down. The read is a redirect, so there's not a single point the data goes through, but the write is not. We do 100x more reads than writes.
The replication is mainly for not losing data, but it gives free benefits of read fault tolerance and a multiplier on read speed.
minikeyvalue has three main purposes:
Huge drive sizes
No data loss
Insane speeds
99% uptime is plenty good for our use case, this is where we keep all the training data. If it has to go down for an hour, it's not a big deal, since it's not user facing.
We are running:
A 2 PB array of spinning disks across 40 machines with 400 drives with 3 replicas.
A 320 TB array of SSDs across 20 machines with 80 drives and no replicas. Crazy bandwidth!
When a project mentions 'distributed', I think of things like wesher, Consul, and GlusterFS, which are distributed across multiple hosts.
I'm somewhat confused as to how the distributed nature of this project works? Could you explain this a little better please? How does it handle storing the data? Does each host have to keep a copy of the data, or is it mounted via e.g. NFS? Does the system support fault-tolerance - i.e. if 1 out of 3 hosts is down?
The text was updated successfully, but these errors were encountered: