-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with Aggregation Sample #35
Comments
Unfortunately that error message doesn't really give us much to go off of. Did you have any issues with the |
Yeah, that step would not work either.Originally I followed the steps in https://github.com/Esri/gis-tools-for-hadoop/wiki/GIS-Tools-for-Hadoop-for-Beginners where it directed me to set up a new directory and then clone the gis tools github repository. When I log onto the Ambari I can see the cloned directory in the following location on the local files: /root/esri-git/gis-tools-for-hadoop but these tools do not exist on the HDFS, not sure if that is a problem. Once I saw that the jars existed within this cloned github repository I moved onto the next step assuming I would be ok. If I try to run the add jar commands from the tutorial I just get the following error: H110 Unable to submit statement. Error while processing statement: gis-tools-for-hadoop/samples/lib/esri-geometry-api.jar does not exist [ERROR_STATUS] I have considered scratching it all and reinstalling and starting from scratch but also feel like since I am very new to this stuff that I am just doing something wrong or not understanding the architecture. Maybe something I did in the first tutorial messed something up for the aggregation tutorial. Also, one last note, I was able to do steps 5,6, and 7 in the aggregation tutorial but had a ton of errors loading the data in step 7 until I moved the taxi data csv to the following location on the hdfs - user/hive -- then it worked. Thought then maybe the gis-tools-for-hadoop with the needed jars should be in that location too but don't think that's right? Hope this gives a little more detail into the problem I am having and thanks again for any help you might be able to offer. Jason |
Shouldn't be a problem. The tutorial doesn't expect you to have anything in HDFS yet.
Ok, let's try this. Since you cloned the repo into cd /root/esri-git
hive As for the Ambari-HIVE SQL interface... I've not used it, but @smambrose might be able to help you with that. |
Thanks for the tip but unfortunately once I change to the /root/esri-git directory and then type in hive it runs through a bunch of stuff and then just ends back up at the [root@sandbox esri-git]# prompt but I was expecting to see hive listed as the new prompt. Unfortunately I am not finding it easy to review the commands that are going on once I type in hive to see if there are any warnings as to why it might not start up. There is a YARN jar warning but don't think that is related. |
Yeah... it's tough if you are going through the VM window and you can't SSH in. If you can capture the output in anyway, I'd like to see what it's saying. |
Ok, for some reason the ssh portion was not set up properly during install on cygwin so I fixed that and then accessed the vm via the instructions using ssh [email protected] -p 2222. I was able to cd to the esri-git folder and then when I ran hive I got the same issue I got when I tried in Hortonworks Sandbox but this time I was able to copy the messages from cygwin and paste them into a notepad file which I am attaching to this post. Let me know what you think. Thanks! |
Ah it's a permissions issue. The folder in HDFS This should change the owner of sudo -u hdfs hdfs dfs -chown -R root /user/root |
jpsphar1, I just ran the simple earthquakes demo on HDP Sandbox v2.3 and it works in CLI. Note that the demo instructions in this github make an assumption that HDFS contains a directory /user/root, and in the sandbox that directory does not exist. We traditionally don't run apps as root in HDP, so the "user directory" for root user is never needed. In your case, to run the demos in this github tutorial, you will need to do the following: |
Awesome!! Thanks ddkaiser and climbage! That worked and now I can access the hive prompt from cygwin. Just tested adding the jar files and it worked too. I will continue with the tutorial and write back if I get stuck. I was about to give up but wanted to learn more about this stuff so thanks again! |
jpsphar1, note that at this point in time the Ambari Hive View is 'mostly' complete, but is lacking a few things such as registering certain class objects. You can register UDF functions, through the 'UDF' tab in the view. This allows you to perform the equivalent of the 'create temporary function' command combined with the 'add jar' command. it works for UDF's, but it does not work for making Serde's and InputFormat classes available. At a minimum, you will need to load the jars in the path pointed to by the env var hive.aux.jars.path. Google will find lots of results, and I won't paste the entire solution in here, but I did find it covered in this blog post: http://hadoopwrangler.tumblr.com/post/75477568787/update-iicaptain-install-instructions and various other online documentation sites. (You may want to do your own searching as well as refer to docs.hortonworks.com or hive.apache.org docs/wiki). Note that you will need to configure that property in multiple places, HCatalog (Hive metastore service), HiveServer2 (if using JDBC/ODBC), hive-site.xml (for Hive CLI). For the Ambari View however, you can supply that property in the 'gear tab' of the Hive view. For now, if you want to try the Hive view approach, since sandbox already has a copy of the jars on all nodes (easy) you can use the gear tab and add this property: You might also want to click the "+ Save Default Settings" button to have the Ambari Hive View retain the path var. |
Thanks ddkaiser! I really appreciate the detailed explanation and may try out that gear tab of the hive view as you have mentioned. A work in progress learning this stuff but glad to see the support is out there. |
Sorry if this is the wrong area to post this question but I had success using some of the tools in the Hadoop Toolbox to convert to json and move data onto my hdfs but now want to try the execute workflow tool. I understand that there will be some setup I have to do in order to get a job created via oozie but not sure I see that capability via Hortonworks Sandbox. Do I need to install Hue or is there any documentation/ tutorials that might help me to understand this part of the setup in order to test the tool successfully? Thanks in advance, Jason |
I do not have a Sandbox, but a couple brief general notes about using Oozie:
|
Was looking for a little bit of help here: I am working on the Taxi Demo aggregation sample using Hortonworks Sandbox on Ambari and have been able to get up to step 8 even though step 7 didn't work as quick as possible. I was originally using the command line to clone the GIS tools for later use but have now switched to the GUI interface on Ambari to load data and run SQL queries through the Hive editor. If I navigate to local files I can see the esri tools for hadoop directory such as the tutorial suggests but the data that I loaded only worked in step 7 after I put it up on the HDFS. Anyway... I am running into the following error when just trying to create the temporary function: create temporary function ST_Bin as 'com.esri.hadoop.hive.ST_Bin'; ---- The error comes back with the following - H110 Unable to submit statement. Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask [ERROR_STATUS].
Not really understanding the architecture of local vs. hdfs since this is all new to me but would like to finish the tutorial. Any ideas???
Thanks in advance!
Jason
The text was updated successfully, but these errors were encountered: