-
Notifications
You must be signed in to change notification settings - Fork 64
Accumulo as Primary Data Provider
This page is meant to be a guide to the Accumulo Data Provider interface for MrGeo.
The first step in working with Accumulo is to make Accumulo as Primary Data Store.
This document describes how to make Accumulo the primary data store for the functionality of MrGeo.
If you had MrGeo working with HDFS as the primary location of Images, then it may be set already. If not, make sure the system that is being used to launch jobs has the variable set. For this example, say the config files for MrGeo is in /opt/mrgeo. Then, assuming the use of linux and bash as the shell, in the $HOME/.bash_profile file, add the following:
export MRGEO_HOME=/opt/mrgeo
Save the file and source it:
#> source ~/.bash_profile
Then check to see that the variable got set:
#> echo $MRGEO_HOME
The result should be "/opt/mrgeo".
Next, edit the /opt/mrgeo/conf/mrgeo.conf file. Find:
datasource = hdfs
change the line so it reads:
datasource = accumulo
That now sets Accumulo as the primary data provider for MrGeo.
Now, there needs to be an Accummulo configuration file for MrGeo. Create a file $MRGEO_HOME/conf/mrgeo-accumulo.conf. Add the following to the file. It is mandatory to have the accumulo connection information in the file.
accumulo.user = root accumulo.password = secret accumulo.instance = accumulo accumulo.zookeepers = localhost:2181 accumulo.viz = A|B accumulo.auths = A,B,C,D,E,F,G,U accumulo.queryauths = U accumulo.root.auths = U accumulo.default.write.viz = null accumulo.default.read.auths = U accumulo.bulkthreshold = 500
The Accumulo Data Provider can do bulk ingest on jobs. The number of output tiles is a factor in the determination of how the data provider deal with pushing into Accumulo. The "accumlo.bulkthreshold" value sets the threshold for for when to use bulk ingest or just a push with a batch writer. This value should be a consideration based on the size of the cloud in use. In experimentation and development, small clouds were in use and 500 was a good number. Your mileage may very.
When using the Accumulo Data Provider, ensure that a table exists when creating data. If you are ingesting data and want the name to show up in the GetCapabilities as "kathmandu" then make sure the table exists. This is a manual process. It is possible from the APIs of Accumulo to create a table from a program. This has not been implemented. If this is something that is requested, the change to the provider can be made.