Reduce job runs forever when attempting to process earthquake sample against US Zip5 shapefile #13

jmadison222 · 2014-07-29T17:45:16Z

I'm using Hive to do reverse geocoding (RG). I have the earthquake sample working. I then attempt to do the RG on the earthquake data using the Zip5 shape file for the entire U.S. The mapping step runs in seconds. Then the reduce step runs to over 90% completion in a few seconds, but never finishes from there. Hoping to find out why.

I converted the shape file using ArcMap, per the instructions here. That worked fine. I also selected just state = 'CA' to make it somewhat manageable, so the RG I'm doing is just against the CA table (but the same problem happens with the whole country.

At a minimum, is there some type of log setting I can set to verbose and then check a log to see what the code is doing?

All assistance appreciated!

jmadison222 · 2014-07-29T17:47:29Z

Just so you can see it, the Zip5 shape file in JSON form appears as shown below.

{
"fields": [
    {
        "alias": "ZIP", 
        "length": 5, 
        "type": "esriFieldTypeString", 
        "name": "ZIP"
    }, 
    {
        "alias": "NAME", 
        "length": 40, 
        "type": "esriFieldTypeString", 
        "name": "NAME"
    }, 
    {
        "alias": "ZIPTYPE", 
        "length": 20, 
        "type": "esriFieldTypeString", 
        "name": "ZIPTYPE"
    }, 
    {
        "alias": "STATE", 
        "length": 2, 
        "type": "esriFieldTypeString", 
        "name": "STATE"
    }, 
    {
        "alias": "STATEFIPS", 
        "length": 2, 
        "type": "esriFieldTypeString", 
        "name": "STATEFIPS"
    }, 
    {
        "alias": "COUNTYFIPS", 
        "length": 5, 
        "type": "esriFieldTypeString", 
        "name": "COUNTYFIPS"
    }, 
    {
        "alias": "COUNTYNAME", 
        "length": 60, 
        "type": "esriFieldTypeString", 
        "name": "COUNTYNAME"
    }, 
    {
        "alias": "S3DZIP", 
        "length": 3, 
        "type": "esriFieldTypeString", 
        "name": "S3DZIP"
    }, 
    {
        "alias": "LAT", 
        "type": "esriFieldTypeDouble", 
        "name": "LAT"
    }, 
    {
        "alias": "LON", 
        "type": "esriFieldTypeDouble", 
        "name": "LON"
    }, 
    {
        "alias": "EMPTYCOL", 
        "length": 5, 
        "type": "esriFieldTypeString", 
        "name": "EMPTYCOL"
    }, 
    {
        "alias": "TOTRESCNT", 
        "type": "esriFieldTypeDouble", 
        "name": "TOTRESCNT"
    }, 
    {
        "alias": "MFDU", 
        "type": "esriFieldTypeDouble", 
        "name": "MFDU"
    }, 
    {
        "alias": "SFDU", 
        "type": "esriFieldTypeDouble", 
        "name": "SFDU"
    }, 
    {
        "alias": "BOXCNT", 
        "type": "esriFieldTypeDouble", 
        "name": "BOXCNT"
    }, 
    {
        "alias": "BIZCNT", 
        "type": "esriFieldTypeDouble", 
        "name": "BIZCNT"
    }, 
    {
        "alias": "RELVER", 
        "length": 8, 
        "type": "esriFieldTypeString", 
        "name": "RELVER"
    }, 
    {
        "alias": "COLOR", 
        "type": "esriFieldTypeDouble", 
        "name": "COLOR"
    }
],
"hasZ": false,
"hasM": false,
"spatialReference": {"wkid":4326},
"features": [
{
"attributes": {
    "BOXCNT": 0.0, 
    "COUNTYFIPS": "56001", 
    "NAME": " ", 
    "ZIP": "820MX", 
    "COLOR": 99.0, 
    "COUNTYNAME": "ALBANY", 
    "STATEFIPS": "56", 
    "TOTRESCNT": 0.0, 
    "LON": -105.947315216, 
    "RELVER": "1.12.4", 
    "EMPTYCOL": " ", 
    "BIZCNT": 0.0, 
    "STATE": "WY", 
    "S3DZIP": "820", 
    "LAT": 41.6488456726074, 
    "MFDU": 0.0, 
    "ZIPTYPE": "FILLER", 
    "SFDU": 0.0
},
"geometry": {
    "rings": [
        [
            [
                -105.952858, 
                41.646918
            ], 
            [
                -105.954499, 
                41.646716999999995
            ],

climbage · 2014-07-29T17:54:09Z

Hi @jmadison222. A couple questions...

How recently did you pull the the source/libs from this repository?
Can you provide a sample of your RG query?

Getting stuck at 90% seems odd. There isn't any extra verbose logging. Have you looked at the MapReduce logs of your cluster?

jmadison222 · 2014-07-29T19:40:19Z

Thanks for the quick reply! Pulled on 7/3. Query is:

SELECT zip5_CA.CountyName, count(*) cnt FROM zip5_CA
JOIN earthquakes
WHERE ST_Contains(zip5_CA.boundaryshape, ST_Point(earthquakes.longitude, earthquakes.latitude))
GROUP BY zip5_CA.CountyName
ORDER BY cnt desc;

Where the zip5_CA table is:

create table zip5_CA as
select *
from zip5
where State = 'CA'

Where the zip5 table is:

CREATE EXTERNAL TABLE IF NOT EXISTS zip5 (
State string, 
CountyName string, 
Zip string,
BoundaryShape binary
)
ROW FORMAT SERDE 'com.esri.hadoop.hive.serde.JsonSerde'              
STORED AS INPUTFORMAT 'com.esri.json.hadoop.EnclosedJsonInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/user/jmxxxx/data/zip5/no_format';

Such that the earthquakes table is the one provided in the sample.

I did check the logs. The only questionable message is:

14/07/29 12:33:42 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.

This messages occurs quite a few times.

I'm on CDH4 in case that matters.

James

climbage · 2014-07-29T19:48:48Z

OK, so basically the same query. The deprecation warning can be ignored. I'm more interested in the logs you see by going through the resource manager, not what you're getting through the Hive command line. What version of Hadoop are you running?

jmadison222 · 2014-07-30T01:28:36Z

Hadoop version is: Hadoop 2.0.0-cdh4.7.0.

Forgive my ignorance, but where would I find the resource manager log? Is it at the location associated with hadoop.log.dir when I do "ps -ef | grep jobtracker"? In that location I see these files of interest:

-rw-r--r-- 1 mapred mapred   18748 Jul 29 15:33 mapred-audit.log
-rw-r--r-- 1 mapred mapred  177732 Jul 29 15:33 job_201406232152_0106_conf.xml
-rw-r--r-- 1 mapred mapred 2648469 Jul 29 15:33 hadoop-cmf-mapreduce1-JOBTRACKER-lad1dithd1002.thehartford.com.log.out

Such that job 0106 has these entries in the *.out file:

2014-07-29 15:33:43,315 INFO org.apache.hadoop.mapred.JobInProgress: job_201406232152_0106: nMaps=2 nReduces=1 max=-1
2014-07-29 15:33:43,326 INFO org.apache.hadoop.mapred.JobTracker: Job job_201406232152_0106 added successfully for user 'jmXXXXX' to queue 'default'
2014-07-29 15:33:43,326 INFO org.apache.hadoop.mapred.JobTracker: Initializing job_201406232152_0106
2014-07-29 15:33:43,326 INFO org.apache.hadoop.mapred.JobInProgress: Initializing job_201406232152_0106
2014-07-29 15:33:43,355 INFO org.apache.hadoop.mapred.JobInProgress: jobToken generated and stored with users keys in /tmp/mapred/system/job_201406232152_0106/jobToken
2014-07-29 15:33:43,357 INFO org.apache.hadoop.mapred.JobInProgress: Input size for job job_201406232152_0106 = 213655516. Number of splits = 2
2014-07-29 15:33:43,357 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201406232152_0106_m_000000 has split on node:/default/lad1dithd1004.thehartford.com
2014-07-29 15:33:43,358 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201406232152_0106_m_000001 has split on node:/default/lad1dithd1004.thehartford.com
2014-07-29 15:33:43,358 INFO org.apache.hadoop.mapred.JobInProgress: job_201406232152_0106 LOCALITY_WAIT_FACTOR=0.33333334
2014-07-29 15:33:43,358 INFO org.apache.hadoop.mapred.JobInProgress: Job job_201406232152_0106 initialized successfully with 2 map tasks and 1 reduce tasks.
2014-07-29 15:33:43,522 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
2014-07-29 15:33:43,522 INFO org.apache.hadoop.mapred.JobTracker: Adding task (JOB_SETUP) 'attempt_201406232152_0106_m_000003_0' to tip task_201406232152_0106_m_000003, for tracker 'tracker_lad1dithd1004.thehartford.com:localhost.localdomain/127.0.0.1:42106'
2014-07-29 15:33:45,331 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.JobInProgress$Counter is deprecated. Use org.apache.hadoop.mapreduce.JobCounter instead
2014-07-29 15:33:45,331 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201406232152_0106_m_000003_0' has completed task_201406232152_0106_m_000003 successfully.
2014-07-29 15:33:47,138 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
2014-07-29 15:33:47,138 INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'attempt_201406232152_0106_m_000000_0' to tip task_201406232152_0106_m_000000, for tracker 'tracker_lad1dithd1004.thehartford.com:localhost.localdomain/127.0.0.1:42106'
2014-07-29 15:33:47,139 INFO org.apache.hadoop.mapred.JobInProgress: Choosing data-local task task_201406232152_0106_m_000000
2014-07-29 15:33:47,742 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
2014-07-29 15:33:47,742 INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'attempt_201406232152_0106_m_000001_0' to tip task_201406232152_0106_m_000001, for tracker 'tracker_lad1dithd1005.thehartford.com:localhost.localdomain/127.0.0.1:36567'
2014-07-29 15:33:47,742 INFO org.apache.hadoop.mapred.JobInProgress: Choosing rack-local task task_201406232152_0106_m_000001
2014-07-29 15:33:50,753 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201406232152_0106_m_000001_0' has completed task_201406232152_0106_m_000001 successfully.
2014-07-29 15:33:53,763 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201406232152_0106_m_000000_0' has completed task_201406232152_0106_m_000000 successfully.
2014-07-29 15:33:54,665 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
2014-07-29 15:33:54,665 INFO org.apache.hadoop.mapred.JobTracker: Adding task (REDUCE) 'attempt_201406232152_0106_r_000000_0' to tip task_201406232152_0106_r_000000, for tracker 'tracker_lad1dithd1005.thehartford.com:localhost.localdomain/127.0.0.1:36567'

Sorry if I'm in the wrong logging location. Where should I be?

Thanks!

James

climbage · 2014-07-30T15:26:37Z

That's not exactly what I'm looking for, but no worries. When you run the Hive job, you should be given a URL you can use to track the MapReduce job.

Mine looks like this...

hive> select count(*) from earthquakes;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_1404940326865_1041, Tracking URL = http://dredd11:8088/proxy/application_1404940326865_1041/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1404940326865_1041
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2014-07-30 08:22:31,281 Stage-1 map = 0%,  reduce = 0%
2014-07-30 08:22:42,598 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 6.42 sec
2014-07-30 08:22:43,628 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 6.42 sec

and you're looking specifically for this line...

Starting Job = job_1404940326865_1041, Tracking URL = http://dredd11:8088/proxy/application_1404940326865_1041/

You're using CDH 4.7, so it may actually be running MapReduce v1. Either way, you should see something like this.

jmadison222 · 2014-07-30T17:09:44Z

Found it! Thanks for the detailed instructions. BTW, it finished in 10 hours, but that can't be right. How do I get the log to you in some sane form on this forum? If I paste it, it's a nightmare.

climbage · 2014-07-30T17:14:06Z

Yeah 10 hours is crazy. When did you pull the spatial-framework library and how many machines are you running?

Surround your log info with ```, like this...

```
Hive> select count(*) from earthquakes;
Total MapReduce jobs = 1
Launching Job 1 out of 1
```

I just updated your comments with the same.

jmadison222 · 2014-07-30T17:22:52Z

Got the libraries on 7/3. Happy to re-pull just to get things off the plate. Thanks for helping the new guy with all these basics too! The log doesn't work well as text, so I'll send it with markup:

<html>
<head>
<title>
Hadoop Job 0106 on History Viewer
</title>
<link rel="stylesheet" type="text/css" href="/static/hadoop.css">
<link rel="icon" type="image/vnd.microsoft.icon" href="/static/images/favicon.ico" />
</head>
<body>

<h2>Hadoop Job 0106 on <a href="jobhistory.jsp">History Viewer</a></h2>

<b>User: </b> jm43436<br/> 
<b>JobName: </b> SELECT zip5_CA.CountyName, count(*) c...desc(Stage-1)<br/>  
<b>JobConf: </b> <a href="jobconf_history.jsp?logFile=file:/var/log/hadoop-0.20-mapreduce/history/done/lad1dithd1002.thehartford.com_1403574726096_/2014/07/30/000000/job_201406232152_0106_1406662423290_jm43436_SELECT%2Bzip5_CA.CountyName%252C%2Bcount%2528*%2529%2Bc...desc%2528Stage"> 
                 hdfs://lad1dithd1002.thehartford.com:8020/user/jm43436/.staging/job_201406232152_0106/job.xml</a><br/> 
<b>Job-ACLs: All users are allowed</b><br> 
<b>Submitted At: </b> 29-Jul-2014 15:33:43<br/> 
<b>Launched At: </b> 29-Jul-2014 15:33:43 (0sec)<br/>
<b>Finished At: </b>  30-Jul-2014 02:11:42 (10hrs, 37mins, 59sec)<br/>
<b>Status: </b> SUCCESS<br/> 

<b><a href="analysejobhistory.jsp?logFile=file:/var/log/hadoop-0.20-mapreduce/history/done/lad1dithd1002.thehartford.com_1403574726096_/2014/07/30/000000/job_201406232152_0106_1406662423290_jm43436_SELECT%2Bzip5_CA.CountyName%252C%2Bcount%2528*%2529%2Bc...desc%2528Stage">Analyse This Job</a></b> 
<hr/>
<center>
<table>
<tr>
<th>Kind</th><th>Total Tasks(successful+failed+killed)</th><th>Successful tasks</th><th>Failed tasks</th><th>Killed tasks</th><th>Start Time</th><th>Finish Time</th>
</tr>
<tr>
<th>Setup</th>
    <td><a href="jobtaskshistory.jsp?logFile=file:/var/log/hadoop-0.20-mapreduce/history/done/lad1dithd1002.thehartford.com_1403574726096_/2014/07/30/000000/job_201406232152_0106_1406662423290_jm43436_SELECT%2Bzip5_CA.CountyName%252C%2Bcount%2528*%2529%2Bc...desc%2528Stage&taskType=SETUP&status=all">
        1</a></td>
    <td><a href="jobtaskshistory.jsp?logFile=file:/var/log/hadoop-0.20-mapreduce/history/done/lad1dithd1002.thehartford.com_1403574726096_/2014/07/30/000000/job_201406232152_0106_1406662423290_jm43436_SELECT%2Bzip5_CA.CountyName%252C%2Bcount%2528*%2529%2Bc...desc%2528Stage&taskType=SETUP&status=SUCCESS">
        1</a></td>
    <td><a href="jobtaskshistory.jsp?logFile=file:/var/log/hadoop-0.20-mapreduce/history/done/lad1dithd1002.thehartford.com_1403574726096_/2014/07/30/000000/job_201406232152_0106_1406662423290_jm43436_SELECT%2Bzip5_CA.CountyName%252C%2Bcount%2528*%2529%2Bc...desc%2528Stage&taskType=SETUP&status=FAILED">
        0</a></td>
    <td><a href="jobtaskshistory.jsp?logFile=file:/var/log/hadoop-0.20-mapreduce/history/done/lad1dithd1002.thehartford.com_1403574726096_/2014/07/30/000000/job_201406232152_0106_1406662423290_jm43436_SELECT%2Bzip5_CA.CountyName%252C%2Bcount%2528*%2529%2Bc...desc%2528Stage&taskType=SETUP&status=KILLED">
        0</a></td>  
    <td>29-Jul-2014 15:33:43</td>
    <td>29-Jul-2014 15:33:45 (1sec)</td>
</tr>
<tr>
<th>Map</th>
    <td><a href="jobtaskshistory.jsp?logFile=file:/var/log/hadoop-0.20-mapreduce/history/done/lad1dithd1002.thehartford.com_1403574726096_/2014/07/30/000000/job_201406232152_0106_1406662423290_jm43436_SELECT%2Bzip5_CA.CountyName%252C%2Bcount%2528*%2529%2Bc...desc%2528Stage&taskType=MAP&status=all">
        2</a></td>
    <td><a href="jobtaskshistory.jsp?logFile=file:/var/log/hadoop-0.20-mapreduce/history/done/lad1dithd1002.thehartford.com_1403574726096_/2014/07/30/000000/job_201406232152_0106_1406662423290_jm43436_SELECT%2Bzip5_CA.CountyName%252C%2Bcount%2528*%2529%2Bc...desc%2528Stage&taskType=MAP&status=SUCCESS">
        2</a></td>
    <td><a href="jobtaskshistory.jsp?logFile=file:/var/log/hadoop-0.20-mapreduce/history/done/lad1dithd1002.thehartford.com_1403574726096_/2014/07/30/000000/job_201406232152_0106_1406662423290_jm43436_SELECT%2Bzip5_CA.CountyName%252C%2Bcount%2528*%2529%2Bc...desc%2528Stage&taskType=MAP&status=FAILED">
        0</a></td>
    <td><a href="jobtaskshistory.jsp?logFile=file:/var/log/hadoop-0.20-mapreduce/history/done/lad1dithd1002.thehartford.com_1403574726096_/2014/07/30/000000/job_201406232152_0106_1406662423290_jm43436_SELECT%2Bzip5_CA.CountyName%252C%2Bcount%2528*%2529%2Bc...desc%2528Stage&taskType=MAP&status=KILLED">
        0</a></td>
    <td>29-Jul-2014 15:33:47</td>
    <td>29-Jul-2014 15:33:53 (6sec)</td>
</tr>
<tr>
<th>Reduce</th>
    <td><a href="jobtaskshistory.jsp?logFile=file:/var/log/hadoop-0.20-mapreduce/history/done/lad1dithd1002.thehartford.com_1403574726096_/2014/07/30/000000/job_201406232152_0106_1406662423290_jm43436_SELECT%2Bzip5_CA.CountyName%252C%2Bcount%2528*%2529%2Bc...desc%2528Stage&taskType=REDUCE&status=all">
        1</a></td>
    <td><a href="jobtaskshistory.jsp?logFile=file:/var/log/hadoop-0.20-mapreduce/history/done/lad1dithd1002.thehartford.com_1403574726096_/2014/07/30/000000/job_201406232152_0106_1406662423290_jm43436_SELECT%2Bzip5_CA.CountyName%252C%2Bcount%2528*%2529%2Bc...desc%2528Stage&taskType=REDUCE&status=SUCCESS">
        1</a></td>
    <td><a href="jobtaskshistory.jsp?logFile=file:/var/log/hadoop-0.20-mapreduce/history/done/lad1dithd1002.thehartford.com_1403574726096_/2014/07/30/000000/job_201406232152_0106_1406662423290_jm43436_SELECT%2Bzip5_CA.CountyName%252C%2Bcount%2528*%2529%2Bc...desc%2528Stage&taskType=REDUCE&status=FAILED">
        0</a></td>
    <td><a href="jobtaskshistory.jsp?logFile=file:/var/log/hadoop-0.20-mapreduce/history/done/lad1dithd1002.thehartford.com_1403574726096_/2014/07/30/000000/job_201406232152_0106_1406662423290_jm43436_SELECT%2Bzip5_CA.CountyName%252C%2Bcount%2528*%2529%2Bc...desc%2528Stage&taskType=REDUCE&status=KILLED">
        0</a></td>  
    <td>29-Jul-2014 15:33:54</td>
    <td>30-Jul-2014 02:11:40 (10hrs, 37mins, 46sec)</td>
</tr>
<tr>
<th>Cleanup</th>
    <td><a href="jobtaskshistory.jsp?logFile=file:/var/log/hadoop-0.20-mapreduce/history/done/lad1dithd1002.thehartford.com_1403574726096_/2014/07/30/000000/job_201406232152_0106_1406662423290_jm43436_SELECT%2Bzip5_CA.CountyName%252C%2Bcount%2528*%2529%2Bc...desc%2528Stage&taskType=CLEANUP&status=all">
        1</a></td>
    <td><a href="jobtaskshistory.jsp?logFile=file:/var/log/hadoop-0.20-mapreduce/history/done/lad1dithd1002.thehartford.com_1403574726096_/2014/07/30/000000/job_201406232152_0106_1406662423290_jm43436_SELECT%2Bzip5_CA.CountyName%252C%2Bcount%2528*%2529%2Bc...desc%2528Stage&taskType=CLEANUP&status=SUCCESS">
        1</a></td>
    <td><a href="jobtaskshistory.jsp?logFile=file:/var/log/hadoop-0.20-mapreduce/history/done/lad1dithd1002.thehartford.com_1403574726096_/2014/07/30/000000/job_201406232152_0106_1406662423290_jm43436_SELECT%2Bzip5_CA.CountyName%252C%2Bcount%2528*%2529%2Bc...desc%2528Stage&taskType=CLEANUP&status=FAILED">
        0</a></td>
    <td><a href="jobtaskshistory.jsp?logFile=file:/var/log/hadoop-0.20-mapreduce/history/done/lad1dithd1002.thehartford.com_1403574726096_/2014/07/30/000000/job_201406232152_0106_1406662423290_jm43436_SELECT%2Bzip5_CA.CountyName%252C%2Bcount%2528*%2529%2Bc...desc%2528Stage&taskType=CLEANUP&status=KILLED">
        0</a></td>  
    <td>30-Jul-2014 02:11:40</td>
    <td>30-Jul-2014 02:11:42 (1sec)</td>
</tr>
</table>

<br>
<br>

<table border=2 cellpadding="5" cellspacing="2">
  <tr>
  <th><br/></th>
  <th>Counter</th>
  <th>Map</th>
  <th>Reduce</th>
  <th>Total</th>
</tr>


       <tr>

         <td rowspan="10">
         File System Counters</td>

       <td>FILE: Number of bytes read</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">147,578,959</td>
     </tr>

       <tr>

       <td>FILE: Number of bytes written</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">296,170,190</td>
     </tr>

       <tr>

       <td>FILE: Number of read operations</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">0</td>
     </tr>

       <tr>

       <td>FILE: Number of large read operations</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">0</td>
     </tr>

       <tr>

       <td>FILE: Number of write operations</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">0</td>
     </tr>

       <tr>

       <td>HDFS: Number of bytes read</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">213,656,317</td>
     </tr>

       <tr>

       <td>HDFS: Number of bytes written</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">537</td>
     </tr>

       <tr>

       <td>HDFS: Number of read operations</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">9</td>
     </tr>

       <tr>

       <td>HDFS: Number of large read operations</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">0</td>
     </tr>

       <tr>

       <td>HDFS: Number of write operations</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">2</td>
     </tr>

       <tr>

         <td rowspan="8">
         Job Counters </td>

       <td>Launched map tasks</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">2</td>
     </tr>

       <tr>

       <td>Launched reduce tasks</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">1</td>
     </tr>

       <tr>

       <td>Data-local map tasks</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">1</td>
     </tr>

       <tr>

       <td>Rack-local map tasks</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">1</td>
     </tr>

       <tr>

       <td>Total time spent by all maps in occupied slots (ms)</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">12,318</td>
     </tr>

       <tr>

       <td>Total time spent by all reduces in occupied slots (ms)</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">38,266,190</td>
     </tr>

       <tr>

       <td>Total time spent by all maps waiting after reserving slots (ms)</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">0</td>
     </tr>

       <tr>

       <td>Total time spent by all reduces waiting after reserving slots (ms)</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">0</td>
     </tr>

       <tr>

         <td rowspan="15">
         Map-Reduce Framework</td>

       <td>Map input records</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">92,634</td>
     </tr>

       <tr>

       <td>Map output records</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">92,634</td>
     </tr>

       <tr>

       <td>Map output bytes</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">158,042,807</td>
     </tr>

       <tr>

       <td>Input split bytes</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">801</td>
     </tr>

       <tr>

       <td>Combine input records</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">0</td>
     </tr>

       <tr>

       <td>Combine output records</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">0</td>
     </tr>

       <tr>

       <td>Reduce input groups</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">92,634</td>
     </tr>

       <tr>

       <td>Reduce shuffle bytes</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">147,578,128</td>
     </tr>

       <tr>

       <td>Reduce input records</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">92,634</td>
     </tr>

       <tr>

       <td>Reduce output records</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">0</td>
     </tr>

       <tr>

       <td>Spilled Records</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">185,268</td>
     </tr>

       <tr>

       <td>CPU time spent (ms)</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">39,195,760</td>
     </tr>

       <tr>

       <td>Physical memory (bytes) snapshot</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">2,346,823,680</td>
     </tr>

       <tr>

       <td>Virtual memory (bytes) snapshot</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">4,821,557,248</td>
     </tr>

       <tr>

       <td>Total committed heap usage (bytes)</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">2,408,513,536</td>
     </tr>

       <tr>

         <td rowspan="2">
         org.apache.hadoop.hive.ql.exec.FilterOperator$Counter</td>

       <td>FILTERED</td>
       <td align="right">0</td>
       <td align="right">1,201,482,520</td>
       <td align="right">1,201,482,520</td>
     </tr>

       <tr>

       <td>PASSED</td>
       <td align="right">0</td>
       <td align="right">2,128</td>
       <td align="right">2,128</td>
     </tr>

       <tr>

         <td rowspan="1">
         org.apache.hadoop.hive.ql.exec.JoinOperator$SkewkeyTableCounter</td>

       <td>SKEWJOINFOLLOWUPJOBS</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">0</td>
     </tr>

       <tr>

         <td rowspan="1">
         org.apache.hadoop.hive.ql.exec.MapOperator$Counter</td>

       <td>DESERIALIZE_ERRORS</td>
       <td align="right">0</td>
       <td align="right">0</td>
       <td align="right">0</td>
     </tr>

       <tr>

         <td rowspan="1">
         org.apache.hadoop.hive.ql.exec.Operator$ProgressCounter</td>

       <td>CREATED_FILES</td>
       <td align="right">0</td>
       <td align="right">1</td>
       <td align="right">1</td>
     </tr>

       <tr>

         <td rowspan="1">
         org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter</td>

       <td>BYTES_READ</td>
       <td align="right">178,210,890</td>
       <td align="right">0</td>
       <td align="right">178,210,890</td>
     </tr>

</table>
<br>

<br/>

</table>
<br/>


</table>
</center>
</body></html>

jmadison222 · 2014-07-30T17:24:33Z

Oh, sorry, and: 4 nodes, each with 4 cores and 128GB of memory.

climbage · 2014-07-30T17:28:14Z

Ah just noticed you already told me when you pulled. So I updated the libraries in this repository on 7/7, referencing this blog post.

My gut feeling is that you will see things running much faster if you pull the latest.

jmadison222 · 2014-07-30T17:42:54Z

Excellent. I'll get the latest. Stay tuned. BTW, we're seeking to replicate a reverse geocoding process that runs in SAS in 10 hours processing our telematics data against the USPS Zip5 shape file. Thus, we have a benchmark to beat. We're hoping for 2 hours. Point is though: if you have any performance enhancements in the queue and were looking for a customer to run your stuff so you can have one of those "one customer took a job running in X and brought it down to Y" stories that vendors love, I'd be happy to be on the bleeding edge if it gives us speed. But let me try the new code first and let you know.

climbage · 2014-07-30T17:50:03Z

Very cool - thank you. Those are exactly the stories we're looking for.

climbage · 2014-07-30T17:57:13Z

I'd be interested to see how it works with thousands of polygons. You might look in to developing a custom MapReduce job like this sample where you'll have the added benefit of a spatial index on top of the polygons.

ddkaiser · 2014-07-30T18:15:08Z

hadoop-0.20. definitely MR1. Generally if a reduce stage zooms up to 90% then stalls for a long time there could be a horrific shuffle operation in there. There may be the potential to implement a combiner (I don't know your use-case and it's not always an option). The combiner would reduce the amount of data to be shuffled prior to reduction, so it generally helps with long-running reduces.

Also, I noticed your spilled records number...

Reduce input records are 1:1 with Map output records. 92,634 (good, they should match)
Spilled records (counter is for both map spill and reduce spill) is exactly 2x the input size, 185,268, so that would mean that every single mapped and reduce record was spilled to disk during processing.

Might look at information like this:

http://stackoverflow.com/questions/8504611/should-spilled-records-always-be-equal-to-mapinput-records-or-mapoutput-records

or this:
http://grokbase.com/t/hadoop/common-user/112pwwaft9/spilled-records

(Or other similar internet searching about how to manage memory and lower record spillage)

climbage · 2014-07-30T18:27:58Z

@ddkaiser It is MRv1 (from the logs he posted), but he's running Hadoop 2.0 (cdh4.7). cdh4 ships with YARN disabled by default.

Also, he's running Hive queries so I don't know what options he has for tuning the query.

jmadison222 · 2014-07-30T23:59:47Z

Great thoughts. A few things:

 I’ll get the latest ESRI code and see what that does.

 Does enabling YARN risk other things?  If you don’t know, I’ll just try it.  It’s my cluster.  I’d only shoot myself in the foot.  We’re refreshing the cluster in early Q4 to all the latest stack, so we’ll have it then at a minimum.

 Goal #1 is to get the time down from 10 hours in SAS to something “better” on Hadoop.  But a major sub-goal connects to our attempt to democratize Hadoop, thus: HUE is favored over command line; Hive is favored over Pig; Pig is favored over Java; vendor code at any level is favored over home grown; open source “vendor” code is favored over proprietary.  Rationale simply being that GUI with SQL is a paradigm many power users are comfortable with.  Java programming on the other hand is a harder skill set.  We also want thee lowest cost code possible, where home-grown is generally the most expensive.  Note that if we can put all logic in JAR files and hook that to Hive in HUE, that’s not so bad.  Point is: we do want to use the most un-technical tool we can.  Thus I’m attempting HUE/Hive for now.  If that clearly won’t work, then I can go more low-level.

Thanks!

From: Michael Park [mailto:[email protected]]
Sent: Wednesday, July 30, 2014 2:28 PM
To: Esri/gis-tools-for-hadoop
Cc: Madison, James A (Shared Services)
Subject: Re: [gis-tools-for-hadoop] Reduce job runs forever when attempting to process earthquake sample against US Zip5 shapefile (#13)

@ddkaiserhttps://github.com/ddkaiser It is MRv1 (from the logs he posted), but he's running Hadoop 2.0 (cdh4.7). cdh4 ships with YARN disabled by default.

Also, he's running Hive queries so I don't know what options he has for tuning the query.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/13#issuecomment-50658844.

This communication, including attachments, is for the exclusive use of addressee and may contain proprietary, confidential and/or privileged information. If you are not the intended recipient, any use, copying, disclosure, dissemination or distribution is strictly prohibited. If you are not the intended recipient, please notify the sender immediately by return e-mail, delete this communication and destroy all copies.

climbage · 2014-07-31T15:15:22Z

Does enabling YARN risk other things?

Not that I know of. We have YARN running on our cluster and it has been fine. I vaguely remember running into some issues upgrading from MRv1 to YARN using CDH 4.1, but they were only configuration issues.

Hive is becoming much faster, so if you were to go cutting edge (Hive .13), you may get even better results.

Point is: we do want to use the most un-technical tool we can. Thus I’m attempting HUE/Hive for now. If that clearly won’t work, then I can go more low-level.

Good strategy. I know that HUE will work (might need some extra configuration), so hopefully the performance is acceptable. If it isn't, there may also be room for improvement on our side so you don't have go with a more complicated route.

prateekparasher · 2018-03-20T19:07:46Z

@climbage
MR job stuck in between i am able to find out the exact problem

climbage closed this as completed Dec 3, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce job runs forever when attempting to process earthquake sample against US Zip5 shapefile #13

Reduce job runs forever when attempting to process earthquake sample against US Zip5 shapefile #13

jmadison222 commented Jul 29, 2014

jmadison222 commented Jul 29, 2014

climbage commented Jul 29, 2014

jmadison222 commented Jul 29, 2014

climbage commented Jul 29, 2014

jmadison222 commented Jul 30, 2014

climbage commented Jul 30, 2014

jmadison222 commented Jul 30, 2014

climbage commented Jul 30, 2014

jmadison222 commented Jul 30, 2014

jmadison222 commented Jul 30, 2014

climbage commented Jul 30, 2014

jmadison222 commented Jul 30, 2014

climbage commented Jul 30, 2014

climbage commented Jul 30, 2014

ddkaiser commented Jul 30, 2014

climbage commented Jul 30, 2014

jmadison222 commented Jul 30, 2014

climbage commented Jul 31, 2014

prateekparasher commented Mar 20, 2018

Reduce job runs forever when attempting to process earthquake sample against US Zip5 shapefile #13

Reduce job runs forever when attempting to process earthquake sample against US Zip5 shapefile #13

Comments

jmadison222 commented Jul 29, 2014

jmadison222 commented Jul 29, 2014

climbage commented Jul 29, 2014

jmadison222 commented Jul 29, 2014

climbage commented Jul 29, 2014

jmadison222 commented Jul 30, 2014

climbage commented Jul 30, 2014

jmadison222 commented Jul 30, 2014

climbage commented Jul 30, 2014

jmadison222 commented Jul 30, 2014

jmadison222 commented Jul 30, 2014

climbage commented Jul 30, 2014

jmadison222 commented Jul 30, 2014

climbage commented Jul 30, 2014

climbage commented Jul 30, 2014

ddkaiser commented Jul 30, 2014

climbage commented Jul 30, 2014

jmadison222 commented Jul 30, 2014

climbage commented Jul 31, 2014

prateekparasher commented Mar 20, 2018