-
Notifications
You must be signed in to change notification settings - Fork 107
Getting Started
Here we explain all the steps needed to get a transactional HBase working assuming you already have a (pseudo-)distributed HBase cluster.
To install Omid, just download an unpack the repository. Under the main directory, run this
$ mvn install
Now you can add a dependency on Omid-0.0.1-SNAPSHOT
on your project if you are using maven, otherwise just point to the jar created in target
from your application's classpath.
You must run the Status Oracle in one server of your cluster. You can either use the script provided in the repository under bin/omid.sh
or run directly the class TSOServer:
$ java -cp ./target/Omid-0.0.1-SNAPSHOT.jar com.yahoo.omid.tso.TSOServer -port <port>
Omid's clients use the usual hbase-site.xml
for configuration, it needs two additional parameters
<configuration>
<property>
...
Hbase configuration properties
....
</property>
<property>
<name>tso.host</name>
<value> the.host.you.run.the.status.oracle.on </value>
</property>
<property>
<name>tso.port</name>
<value> the.port.you.chose.for.the.status.oracle </value>
</property>
</configuration>
To use HBase transactional support the relevant interfaces are TransactionManager
and TransactionalTable
, both in
com.yahoo.omid.client
. The common use case is to start a new transaction with TransactionState txn1 = transactionManager.beginTransaction()
, then use this transaction to perform different HBase operations, like transactionalTable.put(txn1, putOperation)
and finally commit it transactionManager.tryCommit(txn1)
.
Configuration conf = HBaseConfiguration.create();
TransactionManager tm = new TransactionManager(conf);
TransactionalTable tt = new TransactionalTable(conf, TEST_TABLE);
TransactionState t1 = tm.beginTransaction();
Put put = new Put(row);
putt.add(fam, col, data);
tt.put(t1, p);
ResultScanner rs = tt.getScanner(t1, new Scan().setStartRow(startrow).setStopRow(stoprow));
Result r = rs.next();
while (r != null) {
...
r = rs.next();
}
tm.tryCommit(t1);
If you know the maximum number of concurrent transactions that could be modifying a data item at the same time, you could set the maximum number of versions HBase will store to that number (plus some threshold), because Omid needs to have access to that many versions to guarantee Snapshot Isolation.
If this number is not bounded or you don't want risk it, we provide a custom compacter that uses HBase's Coprocessors. You should set the maximum number of versions to infinity (Long.MAX_VALUE) and our compactertakes care of deleting the versions not needed anymore.
You must make sure the Omid jar is available to all region servers and add this to your hbase-site.xml (the one the HBase servers use):
<property>
<name>hbase.coprocessor.region.classes</name>
<value>com.yahoo.omid.client.regionserver.Compacter</value>
</property>
The Status Oracle writes to a Write Ahead Log (WAL) in order to be able to recover from a failure. Currently we only support BookKeeper for this.
Please refer to the BookKeeper documentation on how to set it up.
To run a Highly Available Status Oracle you need to specify the ZooKeeper server in which the Bookies get registered and also that you want to be HA, like this:
$ java -cp ./target/Omid-0.0.1-SNAPSHOT.jar com.yahoo.omid.tso.TSOServer -port <port> -ha -zk <zk connection string>
There are a couple of parameters you can change, like the ensemble size (-ensemble) and the quorum (-quorum).
- Ensemble: number of Bookies the Status Oracle is going to use. It has to be <= the total number of bookies started.
- Quorum: number of replicas for each entry, it has to be <= the ensemble size.
For example you could run 5 Bookies, have an ensemble size of 5 and a quorum of 3, so each write to the WAL is replicated to 3 bookies (which is faster than replicating to all of them).
Omid
Copyright 2011-2015 Yahoo Inc. Licensed under the Apache License, Version 2.0