Skip to content

AccumuloIngest

Andrew Levine edited this page Apr 14, 2015 · 13 revisions

Table of Contents

MrGeo Ingest into Accumulo

Ingesting into Accumulo is straight forward. For the most part, the command line is no different then the ingest for MrGeo using HDFS. The first thing that will need to happen is that Accumulo must be the primary data provider for MrGeo. There is another help page for preparing this.

Setting up Accumulo

For the Accumulo Data Provider to work with an instance of Accumulo, there is nothing special to install. The only thing that may be considered is to set up a special user in the Accumulo instance for geospatial images. It is recommended that this is done. The main benefit of doing this is to essentially separate off the geospatial images for access and processing. Inside the Accumulo shell as root:

  root@accumulo> createuser mrgeo
  Enter new password for 'mrgeo': *****
  Please confirm new password for 'mrgeo': *****

Now, give the mrgeo user the right to create and delete tables:

  root@accumulo> grant System.CREATE_TABLE -s -u mrgeo
  root@accumulo> grant System.DROP_TABLE -s -u mrgeo

Next, give the mrgeo user authorizations that will be needed for reading data. These need to be a all the possible authorizations that will be used for data that goes into the tables. Set this via a comma separated list:

  root@accumulo> addauths -s A,B,C,FOO,BAR -u mrgeo

Check your work:

  root@accumulo> getauths -u mrgeo
  FOO,C,A,B,BAR
  root@accumulo> userpermissions -u mrgeo
  System permissions: System.CREATE_TABLE, System.DROP_TABLE
  Table permissions (!METADATA): Table.READ

If everything looks good at this point, log out of the root account and then test the log in with the mrgeo user account. Next, try creating a table.

  mrgeo@accumulo> createtable junk
  mrgeo@accumulo junk> 

Getting the prompt with the table name means things were successfully created. Now, just confirm things look good:

  mrgeo@accumulo junk> tablepermissions
  Table.READ
  Table.WRITE
  Table.BULK_IMPORT
  Table.ALTER_TABLE
  Table.GRANT
  Table.DROP_TABLE

This is what should be seen. If this is not the case, make sure the setup is correct for the user. Assuming things are working correctly, get rid of the table:

  mrgeo@accumulo junk> deletetable -t junk
  deletetable { junk } (yes|no)? yes
  Table: [junk] has been deleted. 
  mrgeo@accumulo> 

If there is a problem with the deleting of the table created by this user, then make sure the user has the correct system privileges. If the problem persists past that, then there is something more wrong in the Accumulo setup and it is out of the scope of the information here.

Preparing a Table for Ingest

At this time, the Accumulo Data Provider will not create a table on it's own. The table must exist before it can ingest data. This is the same for the mapalgebra capabilities of MrGeo. Create a table:

  mrgeo@accumulo> createtable rome
  mrgeo@accumulo rome>

The table rome will now be available for ingest. Once real mrgeo image data is put into the table the GetCapabilities of the MrGeo WMS servlet will report rome as a layer.

One thing that the Accumulo data provider works with is locality groups. Inside Accumulo, a locality group will group data together based on column family. For this, it is best to know the maximum zoom level of the image to be ingested. The locality group for image data in Accumulo is based on the zoom level of the image. The zoom level in the MrGeo Accumulo Data Provider is stored in the cloumn family. This step can be done without the knowledge of the maximum zoom level.

  mrgeo@accumulo rome> setgroups 18=18 17=17 16=16 15=15 14=14 13=13 12=12 11=11 10=10 9=9 8=8 7=7 6=6 5=5 4=4 3=3 2=2 1=1 -t rome
  mrgeo@accumulo rome> getgroups -t rome
  17=17
  18=18
  15=15
  16=16
  13=13
  14=14
  11=11
  12=12
  3=3
  2=2
  1=1
  10=10
  7=7
  6=6
  5=5
  4=4
  9=9
  8=8
  mrgeo@accumulo rome> 

NOTE: The most important locality group is the maximum zoom level. This is the case because the base zoom level will be used for analytics and derived layers. However, display will be able to take advantage of locality groups as well. So, a possible protective step may be to do all possible zoom levels of 24 down to 1.

The table is now ready for ingest.

Running Ingest

For Ingest to work properly, Accumulo must be the preferred data provider for the MrGeo environment. There is a separate document on setting this up for the environment.

Straight Ingest

Here is an example of a simple ingest to a table "rome" from the source image or "Rome".

  mrgeo ingest -v -o rome /data/sourceImagery/Rome

If everything is setup correctly, the command line will start a map reduce job. This command line will also produce the "pyramid" for the image utilizing Accumulo. It is possible to skip the pyramid step by adding a "-sp" to the command line.

Note: If there are errors, try and resolve them. On initial installs, or on new users, one issue that can come up may be a permissions issue inside HDFS. The Accumulo Data Provider will use HDFS if the job is large and the choice of an Accumulo Bulk Ingest is performed. This is easily solved, just make sure that the user running the job can write to the users home directory inside HDFS. That will usually be: /user/[username]

Ingest with Protections

Within the MrGeo core, there is a mechanism for passing "Protection Levels" to the data providers. This can be ignored or, like in the case of the Accumulo Data Provider, it can be used to ensure that data gets protections. For MrGeo, what is passed down to the data provider is a string. There is no interpretation or alteration.

  mrgeo ingest -v -pl 'A|B|C' -o rome /data/sourceImagery/Rome

Here, the "-pl" is used in the command line to set the Protection Level for data. Once the map reduce ingest job is finished for the base layer, it is recommended to take a quick look inside Accumulo to ensure that the ingest put data in correctly and the protections are there correctly.

Clone this wiki locally