Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to pass an object to RUDF in hive #73

Open
rajasekhariitbbs opened this issue Nov 4, 2014 · 5 comments
Open

How to pass an object to RUDF in hive #73

rajasekhariitbbs opened this issue Nov 4, 2014 · 5 comments

Comments

@rajasekhariitbbs
Copy link

Working Function

Minimum = function(column1,column2){
min(column1,column2)
}

Not Working

a=-10000
Minimum = function(column1,column2){
min(column1,column2,a)
}

How to pass an object/dataframe/function into the RHive-UDF

Thanks
Raja Sekhar

@ssshow16
Copy link
Contributor

ssshow16 commented Nov 4, 2014

If you just pass value into the RHive-UDF, you can do like the following:

Minimum = function(column1,column2_, a_){
min(column1,column2,a)
}

rhive.query("select R('Minimum',col1,col2,-10000,0.0) from table_name")

However, RHive just export the UDF function and RHive-UDF is executed at
each DataNode, so this cannot reference your function/object/dataframe.
If you need to use other function, you have to define a inner function like
the following.

Minimum = function(column1,column2_, a_){
min(column1,column2,a)
sub_func <- function(a,b){
....
}
sub_func(column1, column2)
}

Thanks.

On Tue, Nov 4, 2014 at 3:10 PM, rajasekhariitbbs [email protected]
wrote:

Working Function

Minimum = function(column1,column2){
min(column1,column2)
}
Not Working

a=-10000
Minimum = function(column1,column2){
min(column1,column2,a)
}

How to pass an object/dataframe/function into the RHive-UDF

Thanks
Raja Sekhar


Reply to this email directly or view it on GitHub
#73.

@rajasekhariitbbs
Copy link
Author

These are the following rhive.evn()
rhive.env()
hadoop home: /home/training/hadoop-2.4.0/
hadoop conf: /home/training/hadoop-2.4.0/etc/hadoop
fs: hdfs://localhost:9000
hive home: /home/training/hive/
user name: training
user home: /home/training
temp dir: /tmp/training

select queries are working good

######### User Define Functions ########
coefficient <- 1.1
scoring <- function(sal) {
coefficient * sal
}
rhive.assign('coefficient',coefficient)
rhive.assign('scoring',scoring)
rhive.export('scoring')
rhive.export('coefficient')

The above rhive.export is saving the files in filesystem /rhive/udf/training, Is that correct?

@ssshow16
Copy link
Contributor

ssshow16 commented Nov 4, 2014

You cannot assign and export a variable 'coefficient'.

Exported file will be saved in HDFS : /rhive/udf/{user}.

On Tue, Nov 4, 2014 at 5:51 PM, rajasekhariitbbs [email protected]
wrote:

These are the following rhive.evn()
rhive.env()
hadoop home: /home/training/hadoop-2.4.0/
hadoop conf: /home/training/hadoop-2.4.0/etc/hadoop
fs: hdfs://localhost:9000
hive home: /home/training/hive/
user name: training
user home: /home/training
temp dir: /tmp/training

select queries are working good

######### User Define Functions ########
coefficient <- 1.1
scoring <- function(sal) {
coefficient * sal
}
rhive.assign('coefficient',coefficient)
rhive.assign('scoring',scoring)
rhive.export('scoring')
rhive.export('coefficient')

The above rhive.export is saving the files in filesystem
/rhive/udf/training, Is that correct?


Reply to this email directly or view it on GitHub
#73 (comment).

@rajasekhariitbbs
Copy link
Author

In my case the file is being saved in local file system, I'm confused and don't know why it is happening,
For now if I manually copy the .RDA file from local system to HDFS then the code is working.

In the below URL they are exporting the coefficient also
https://github.com/nexr/RHive/wiki/RHive-example-code

@ssshow16
Copy link
Contributor

ssshow16 commented Nov 5, 2014

If RHive-UDF reference R Object( *.RData) as first param of R() in Query.

rhive.query("select R('scoring',col_sal,0.0) from emp")

In your case, two R Object is saved in HDFS because you use rhive.export()
function for each R Object.

  • coefficient.RData
  • scoring.RData

So, scoring function in scoring.RData cannot reference coefficient value
in coefficient.RData.
In this case, you have to call rhive.exportAll(‘scoring’).
rhive.exportAll() function save all R Object into scoring.RData.

Please, try again.

On Tue, Nov 4, 2014 at 6:03 PM, rajasekhariitbbs [email protected]
wrote:

In my case the file is being saved in local file system, I'm confused and
don't know why it is happening,
For now if I manually copy the .RDA file from local system to HDFS then
the code is working.

In the below URL they are exporting the coefficient also
https://github.com/nexr/RHive/wiki/RHive-example-code


Reply to this email directly or view it on GitHub
#73 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants