High availability? #5147
Replies: 6 comments 19 replies
-
Not yet, but I would propose to create something based on Apache Ratis. |
Beta Was this translation helpful? Give feedback.
-
I would like to start another thread here to talk about implementing support for Jena's RDF Delta Patch Log Server instead of actually developing a HA system from scratch. What makes the Jena setup HA is that the patch log server runs in HA with multiple servers. We could create a Sail that interfaces with the patch log server to track which patch version we are on, fetch any newer patches and apply them to the underlying store, and also sync back changes to the patch log server before a transaction is allowed to finish committing. The NotifyingSail would allow us to track changes in a transaction, and we can override the begin, prepare and commit methods to add the interaction with the patch log server. An important simplification that the patch log server allows for is that a transaction starts at a patch version, and needs to be commit at the same version. This means that a high replicated write load would end up with a lot of cancelled transactions. |
Beta Was this translation helpful? Give feedback.
-
(Sorry if I'm butting in here) Just to be clear; RDF Delta isn't part of Apache Jena. There's no technical barrier but RDF Delta is significantly more user-support-intensive. Not surprising - it is affected by the deployment environment. Delta could do with a refresh, even a V2. Apache Zookeeper takes quite a lot of effect to deploy and operate. It can not store the patches (size limits). Apache Ratis for the system lock and metadata would be a good choice. I'm not clear whether storing bulk data in Ratis for a long-lived deployment is a good idea or not. This might depend if the design is to keep patches "for a long time" (c.f. incremental backup, rebuild new triplestores from a long-term data snapshot) or just until the triplestores have all taken the updates. RDF Delta does not track the state of the front ends. If patches are outside Ratis, for storing the patches, there are options: There is a lot to be said in favour of small deployments using a "safe" filesystem for patch storage. The advantage for users is that the deployment is simple to operate. SPARQL is continuously available for query; yes, there is there is a pause on updates if the patch server needs to be bumped. It starts up very fast (1-2s). Only loosing the patches is catastrophic; a blank patch server can discover log state (once) and that takes a little longer to startup. For a more complex deployment, Apache Pulsar looks interesting. It is designed to be a distributed log. It supports migrating stored data to different storage hierarchies and also ageing off patches. Apache Kafka nowadays can be used in a log-like manner if old patches are migrated away from expensive broker storage. This still needs Ratis to coordinate writing the log consistently. Current RDF Delta can use blob stores - that is another operational cost. Pulsar should be a more integrated solution as well as being cloud-neutral. |
Beta Was this translation helpful? Give feedback.
-
Having a machine Ratis in the Ratis membership group could be writing out to the long term log. The members don't have to have the same functionality. Delta minimised implementation on the triplestore process because Zookeeper isn't a library. @kenwenzel What are your thoughts on recover after a failure? |
Beta Was this translation helpful? Give feedback.
-
Has there been any further work done in this space? I could be/get interested in chipping in here. What would be the desired feature set for HA in RDF4J? What kind of transaction support is desired? Is read-committed isolation okay for interested parties? Are there use cases that would require higher levels of isolation? Are we only looking at HA here? Or perhaps also distribution? Could we mary the two? As for the use of Ratis in this area: I think the data/metadata separation in Ratis built for Hadoop HDDS/Ozone can be used here. This stream feature allows users to quite efficiently distribute patches without ending up in the Raft logs. See also this blog from Cloudera. |
Beta Was this translation helpful? Give feedback.
-
For the combination of Raft / Ratis and RDF4J stores (probably any HA solution), it makes sense to create/distribute snapshots. Otherwise the state of a node is essentially a big WAL from which a store needs to be rebuilt on start. Perhaps with some trickery this can be worked around; but e.g. Ratis has a concept of snapshots and it would be nice to "go with it's flow". And for nodes to be added, catch up, be replaced, etc. it's a lot more efficient if they can start from a snapshot, instead of fetching the log and applying that locally. I've worked with a lot of LSM based databases. For such systems creating a snapshot is often just a matter of creating (hard)links of the files (segments or whatever they're called in the particular system). Is there any such (general) mechanism imaginable for RDF4J stores? I'm not sure about the mechanics of the native store; I would guess it updates pages in place. I don't believe LMDB has such snapshot capability. An alternative design is to (shortly) interrupt processing on a node or at least go into read-only mode, and create a copy of the RDF4J on disk data. I believe this is also done in Apache Ozone (for which Ratis was initially developed) with their RocksDB (metadata) databases. Some downsides for that to work:
|
Beta Was this translation helpful? Give feedback.
-
There is a HA solution for Jena Is there anything comparable for RDF4J?
(Alternatively, is there appetite for creating it?)
Beta Was this translation helpful? Give feedback.
All reactions