-
Notifications
You must be signed in to change notification settings - Fork 64
AccumuloIngest
Ingesting into Accumulo is straight forward. For the most part, the command line is no different then the ingest for MrGeo using HDFS. The first thing that will need to happen is that Accumulo must be the primary data provider for MrGeo. There is another help page for preparing this.
For the Accumulo Data Provider to work with an instance of Accumulo, there is nothing special to install. The only thing that may be considered is to set up a special user in the Accumulo instance for geospatial images. It is recommended that this is done. The main benefit of doing this is to essentially separate off the geospatial images for access and processing. Inside the Accumulo shell as root:
root@accumulo> createuser mrgeo Enter new password for 'mrgeo': ***** Please confirm new password for 'mrgeo': *****
Now, give the mrgeo user the right to create and delete tables:
root@accumulo> grant System.CREATE_TABLE -s -u mrgeo root@accumulo> grant System.DROP_TABLE -s -u mrgeo
Next, give the mrgeo user authorizations that will be needed for reading data. These need to be a all the possible authorizations that will be used for data that goes into the tables. Set this via a comma separated list:
root@accumulo> addauths -s A,B,C,FOO,BAR -u mrgeo
Check your work:
root@accumulo> getauths -u mrgeo FOO,C,A,B,BAR root@accumulo> userpermissions -u mrgeo System permissions: System.CREATE_TABLE, System.DROP_TABLE Table permissions (!METADATA): Table.READ
If everything looks good at this point, log out of the root account and then test the log in with the mrgeo user account. Next, try creating a table.
mrgeo@accumulo> createtable junk mrgeo@accumulo junk>
Getting the prompt with the table name means things were successfully created. Now, just confirm things look good:
mrgeo@accumulo junk> tablepermissions Table.READ Table.WRITE Table.BULK_IMPORT Table.ALTER_TABLE Table.GRANT Table.DROP_TABLE
This is what should be seen. If this is not the case, make sure the setup is correct for the user. Assuming things are working correctly, get rid of the table:
mrgeo@accumulo junk> deletetable -t junk deletetable { junk } (yes|no)? yes Table: [junk] has been deleted. mrgeo@accumulo>
If there is a problem with the deleting of the table created by this user, then make sure the user has the correct system privileges. If the problem persists past that, then there is something more wrong in the Accumulo setup and it is out of the scope of the information here.
At this time, the Accumulo Data Provider will not create a table on it's own. The table must exist before it can ingest data. This is the same for the mapalgebra capabilities of MrGeo. Create a table:
mrgeo@accumulo> createtable rome mrgeo@accumulo rome>
The table rome will now be available for ingest. Once real mrgeo image data is put into the table the GetCapabilities of the MrGeo WMS servlet will report rome as a layer.
One thing that the Accumulo data provider works with is locality groups. Inside Accumulo, a locality group will group data together based on column family. For this, it is best to know the maximum zoom level of the image to be ingested. The locality group for image data in Accumulo is based on the zoom level of the image. The zoom level in the MrGeo Accumulo Data Provider is stored in the cloumn family. This step can be done without the knowledge of the maximum zoom level.
mrgeo@accumulo rome> setgroups 18=18 17=17 16=16 15=15 14=14 13=13 12=12 11=11 10=10 9=9 8=8 7=7 6=6 5=5 4=4 3=3 2=2 1=1 -t rome mrgeo@accumulo rome> getgroups -t rome 17=17 18=18 15=15 16=16 13=13 14=14 11=11 12=12 3=3 2=2 1=1 10=10 7=7 6=6 5=5 4=4 9=9 8=8 mrgeo@accumulo rome>
NOTE: The most important locality group is the maximum zoom level. This is the case because the base zoom level will be used for analytics and derived layers. However, display will be able to take advantage of locality groups as well. So, a possible protective step may be to do all possible zoom levels of 24 down to 1.
The table is now ready for ingest.
For Ingest to work properly, Accumulo must be the preferred data provider for the MrGeo environment. There is a separate document on setting this up for the environment.
Here is an example of a simple ingest to a table "rome" from the source image or "Rome".
mrgeo ingest -v -o rome /data/sourceImagery/Rome
If everything is setup correctly, the command line will start a map reduce job. This command line will also produce the "pyramid" for the image utilizing Accumulo. It is possible to skip the pyramid step by adding a "-sp" to the command line.
Note: If there are errors, try and resolve them. On initial installs, or on new users, one issue that can come up may be a permissions issue inside HDFS. The Accumulo Data Provider will use HDFS if the job is large and the choice of an Accumulo Bulk Ingest is performed. This is easily solved, just make sure that the user running the job can write to the users home directory inside HDFS. That will usually be: /user/[username]
Within the MrGeo core, there is a mechanism for passing "Protection Levels" to the data providers. This can be ignored or, like in the case of the Accumulo Data Provider, it can be used to ensure that data gets protections. For MrGeo, what is passed down to the data provider is a string. There is no interpretation or alteration.
mrgeo ingest -v -pl 'A|B|C' -o rome /data/sourceImagery/Rome
Here, the "-pl" is used in the command line to set the Protection Level for data. Once the map reduce ingest job is finished for the base layer, it is recommended to take a quick look inside Accumulo to ensure that the ingest put data in correctly and the protections are there correctly.