Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linear regression #9

Open
achaubal opened this issue Feb 3, 2012 · 3 comments
Open

linear regression #9

achaubal opened this issue Feb 3, 2012 · 3 comments

Comments

@achaubal
Copy link

achaubal commented Feb 3, 2012

Hi Harish,

trying linear regression,
this works:
*from <select acct_no, mydt, x, y
from mytab>
partition by acct_no
order by acct_no,mydt
with
linearRegSlope(x,y) as slope,
linearRegIntercept(x,y) as intercept
select acct_no, slope,intercept
into path='ma_9_2011_08'; *

but this does not, which tries to do regression on ALL the records without partitioning
**
from <select 1 as blah,acct_no, mydt, x, y
from mytab>
partition by blah
order by blah
with
linearRegSlope(x,y) as slope,
linearRegIntercept(x,y) as intercept
select blah, slope,intercept
into path='ma_9_2011_08';
**

Error:

12/02/03 12:24:55 INFO mapred.JobClient: Task Id : attempt_201201151510_0220_r_000002_3, Status : FAILED
java.lang.reflect.UndeclaredThrowableException: Unknown exception in doAs
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1139)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: java.security.PrivilegedActionException: com.sap.hadoop.ds.list.ByteBasedList$ListFullException
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
... 1 more
Caused by: com.sap.hadoop.ds.list.ByteBasedList$ListFullException
at com.sap.hadoop.ds.list.ByteBasedList.ensureCapacity(ByteBasedList.java:80)
at com.sap.hadoop.ds.list.ByteBasedList.write(ByteBasedList.java:108)
at com.sap.hadoop.ds.list.ByteBasedList.append(ByteBasedList.java:158)
at com.sap.hadoop.ds.list.ByteBasedList$append.call(Unknown Source)
at com.sap.hadoop.windowing.runtime.Partition.leftShift(Partition.groovy:100)
at com.sap.hadoop.windowing.runtime.Partit
attempt_201201151510_0220_r_000002_3: Query:
attempt_201201151510_0220_r_000002_3: tableInput=(hiveTable=WindowingTempTable_1328289790395)
attempt_201201151510_0220_r_000002_3: partitionColumns=all_in_one
attempt_201201151510_0220_r_000002_3: orderColumns=all_in_one ASC
attempt_201201151510_0220_r_000002_3: funcSpecs=[linearregslope(alias=slope, param=[id=bur_fico_scor_no, id=cv_score], type=null, window=null),
attempt_201201151510_0220_r_000002_3: linearregintercept(alias=intercept, param=[id=bur_fico_scor_no, id=cv_score], type=null, window=null)]
attempt_201201151510_0220_r_000002_3: select=slope, intercept
attempt_201201151510_0220_r_000002_3: whereExpr=null

@hbutani
Copy link
Owner

hbutani commented Feb 3, 2012

You have hit the default partition size limit. :)
Caused by: java.security.PrivilegedActionException: com.sap.hadoop.ds.list.ByteBasedList$ListFullException

Set the 'com.sap.hadoop.windowing.partition.memory.size' as we discussed in the earlier issue today.

@achaubal
Copy link
Author

achaubal commented Feb 3, 2012

ok any pointers to figuring out how to determine that size?

@hbutani
Copy link
Owner

hbutani commented Feb 3, 2012

A rough way to calculate is:
num of rows * sizeof a row

sizeof row = 8 * num of double columns in query + 4 * num of ints...

the numerics may take up more space, because they are held as Writables, also depends on the SerDe of the table, whether Variable datatypes (VLong, VInt etc) are used; in that case could be considerably less space.
Strings columns are hard to estimate. You will have to run some queries to get their avg size. As far as I know hive (at least 0.71) doesn't collect statistics on tables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants