Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Aggregation Sample #35

Open
jpsphar1 opened this issue Dec 2, 2015 · 13 comments
Open

Issue with Aggregation Sample #35

jpsphar1 opened this issue Dec 2, 2015 · 13 comments

Comments

@jpsphar1
Copy link

jpsphar1 commented Dec 2, 2015

Was looking for a little bit of help here: I am working on the Taxi Demo aggregation sample using Hortonworks Sandbox on Ambari and have been able to get up to step 8 even though step 7 didn't work as quick as possible. I was originally using the command line to clone the GIS tools for later use but have now switched to the GUI interface on Ambari to load data and run SQL queries through the Hive editor. If I navigate to local files I can see the esri tools for hadoop directory such as the tutorial suggests but the data that I loaded only worked in step 7 after I put it up on the HDFS. Anyway... I am running into the following error when just trying to create the temporary function: create temporary function ST_Bin as 'com.esri.hadoop.hive.ST_Bin'; ---- The error comes back with the following - H110 Unable to submit statement. Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask [ERROR_STATUS].

Not really understanding the architecture of local vs. hdfs since this is all new to me but would like to finish the tutorial. Any ideas???

Thanks in advance!
Jason

@climbage
Copy link
Member

climbage commented Dec 3, 2015

Unfortunately that error message doesn't really give us much to go off of. Did you have any issues with the LOAD JAR step of the tutorial?

@jpsphar1
Copy link
Author

jpsphar1 commented Dec 3, 2015

Yeah, that step would not work either.Originally I followed the steps in https://github.com/Esri/gis-tools-for-hadoop/wiki/GIS-Tools-for-Hadoop-for-Beginners where it directed me to set up a new directory and then clone the gis tools github repository. When I log onto the Ambari I can see the cloned directory in the following location on the local files: /root/esri-git/gis-tools-for-hadoop but these tools do not exist on the HDFS, not sure if that is a problem. Once I saw that the jars existed within this cloned github repository I moved onto the next step assuming I would be ok. If I try to run the add jar commands from the tutorial I just get the following error: H110 Unable to submit statement. Error while processing statement: gis-tools-for-hadoop/samples/lib/esri-geometry-api.jar does not exist [ERROR_STATUS]
It is kind of odd too because I have tried to do all this through the Ambari Hive SQL interface because unfortunately many of these commands in the tutorial aren't working from the command line from within Hortonworks Sandbox - Oracle VM VirtualBox - I can't even access Hive from the Oracle VM VirtualBox command line either . Tried Cygwin too and could not connect from there so I abandoned that workflow as a possibility and have just been using Ambari-Hive SQL interface.

I have considered scratching it all and reinstalling and starting from scratch but also feel like since I am very new to this stuff that I am just doing something wrong or not understanding the architecture. Maybe something I did in the first tutorial messed something up for the aggregation tutorial. Also, one last note, I was able to do steps 5,6, and 7 in the aggregation tutorial but had a ton of errors loading the data in step 7 until I moved the taxi data csv to the following location on the hdfs - user/hive -- then it worked. Thought then maybe the gis-tools-for-hadoop with the needed jars should be in that location too but don't think that's right?

Hope this gives a little more detail into the problem I am having and thanks again for any help you might be able to offer.

Jason

@climbage
Copy link
Member

climbage commented Dec 3, 2015

When I log onto the Ambari I can see the cloned directory in the following location on the local files: /root/esri-git/gis-tools-for-hadoop but these tools do not exist on the HDFS, not sure if that is a problem

Shouldn't be a problem. The tutorial doesn't expect you to have anything in HDFS yet.

If I try to run the add jar commands from the tutorial I just get the following error: H110 Unable to submit statement. Error while processing statement: gis-tools-for-hadoop/samples/lib/esri-geometry-api.jar does not exist [ERROR_STATUS]

Ok, let's try this. Since you cloned the repo into /root/esri-git/gis-tools-for-hadoop, make sure you start Hive while you are in /root/esri-git. All of the paths in the tutorial are relative to that folder.

cd /root/esri-git
hive

As for the Ambari-HIVE SQL interface... I've not used it, but @smambrose might be able to help you with that.

@jpsphar1
Copy link
Author

jpsphar1 commented Dec 3, 2015

Thanks for the tip but unfortunately once I change to the /root/esri-git directory and then type in hive it runs through a bunch of stuff and then just ends back up at the [root@sandbox esri-git]# prompt but I was expecting to see hive listed as the new prompt. Unfortunately I am not finding it easy to review the commands that are going on once I type in hive to see if there are any warnings as to why it might not start up. There is a YARN jar warning but don't think that is related.

@climbage
Copy link
Member

climbage commented Dec 3, 2015

Yeah... it's tough if you are going through the VM window and you can't SSH in. If you can capture the output in anyway, I'd like to see what it's saying.

@jpsphar1
Copy link
Author

jpsphar1 commented Dec 3, 2015

Ok, for some reason the ssh portion was not set up properly during install on cygwin so I fixed that and then accessed the vm via the instructions using ssh [email protected] -p 2222. I was able to cd to the esri-git folder and then when I ran hive I got the same issue I got when I tried in Hortonworks Sandbox but this time I was able to copy the messages from cygwin and paste them into a notepad file which I am attaching to this post. Let me know what you think. Thanks!
hive_error.txt
hive_error.txt

@climbage
Copy link
Member

climbage commented Dec 3, 2015

Ah it's a permissions issue. The folder in HDFS /user/root is owned by the user hdfs.

This should change the owner of /user/root to root.

sudo -u hdfs hdfs dfs -chown -R root /user/root

@ddkaiser
Copy link
Contributor

ddkaiser commented Dec 3, 2015

jpsphar1, I just ran the simple earthquakes demo on HDP Sandbox v2.3 and it works in CLI. Note that the demo instructions in this github make an assumption that HDFS contains a directory /user/root, and in the sandbox that directory does not exist. We traditionally don't run apps as root in HDP, so the "user directory" for root user is never needed.

In your case, to run the demos in this github tutorial, you will need to do the following:
sudo -u hdfs hdfs dfs -mkdir /user/root
and then as climbage has mentioned, force the permissions to be correct by running:
sudo -u hdfs hdfs dfs -chown -R root /user/root

@jpsphar1
Copy link
Author

jpsphar1 commented Dec 3, 2015

Awesome!! Thanks ddkaiser and climbage! That worked and now I can access the hive prompt from cygwin. Just tested adding the jar files and it worked too. I will continue with the tutorial and write back if I get stuck. I was about to give up but wanted to learn more about this stuff so thanks again!

@ddkaiser
Copy link
Contributor

ddkaiser commented Dec 3, 2015

jpsphar1, note that at this point in time the Ambari Hive View is 'mostly' complete, but is lacking a few things such as registering certain class objects.

You can register UDF functions, through the 'UDF' tab in the view. This allows you to perform the equivalent of the 'create temporary function' command combined with the 'add jar' command. it works for UDF's, but it does not work for making Serde's and InputFormat classes available.

At a minimum, you will need to load the jars in the path pointed to by the env var hive.aux.jars.path.

Google will find lots of results, and I won't paste the entire solution in here, but I did find it covered in this blog post: http://hadoopwrangler.tumblr.com/post/75477568787/update-iicaptain-install-instructions and various other online documentation sites. (You may want to do your own searching as well as refer to docs.hortonworks.com or hive.apache.org docs/wiki).

Note that you will need to configure that property in multiple places, HCatalog (Hive metastore service), HiveServer2 (if using JDBC/ODBC), hive-site.xml (for Hive CLI). For the Ambari View however, you can supply that property in the 'gear tab' of the Hive view.

For now, if you want to try the Hive view approach, since sandbox already has a copy of the jars on all nodes (easy) you can use the gear tab and add this property:
hive.aux.jars.path=/root/esri-git/gis-tools-for-hadoop/samples/lib

You might also want to click the "+ Save Default Settings" button to have the Ambari Hive View retain the path var.

@jpsphar1
Copy link
Author

jpsphar1 commented Dec 3, 2015

Thanks ddkaiser! I really appreciate the detailed explanation and may try out that gear tab of the hive view as you have mentioned. A work in progress learning this stuff but glad to see the support is out there.

@jpsphar1
Copy link
Author

jpsphar1 commented Dec 8, 2015

Sorry if this is the wrong area to post this question but I had success using some of the tools in the Hadoop Toolbox to convert to json and move data onto my hdfs but now want to try the execute workflow tool. I understand that there will be some setup I have to do in order to get a job created via oozie but not sure I see that capability via Hortonworks Sandbox. Do I need to install Hue or is there any documentation/ tutorials that might help me to understand this part of the setup in order to test the tool successfully?

Thanks in advance, Jason

@randallwhitman
Copy link
Contributor

I do not have a Sandbox, but a couple brief general notes about using Oozie:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants