-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFTrainer causes error about random number generator #1
Comments
thanks for the resport, i'll look into it, but it will take some time. sorry for the trouble you have. i didn't even know anyone was using this package ;) |
Thanks. I have been thinking about this some more. I expect it will be very difficult to use the RNG from R for a parallel program. In Writing R Extensions it says:
I am not sure if the RNGs are thread-safe. However, I have also been wondering, where the requirement to use R's RNG comes from. The best I could find is the follwoing
(https://cran.rstudio.com/doc/manuals/r-release/R-exts.html#Writing-portable-packages, emphasis is mine) In addition, there exist packages like https://cran.r-project.org/package=RcppZiggurat that include different RNGs. I am not sure what to make from this. |
hard to tell, i'd first put emphasis to just replace all the RNGs. then at least would run and we can test it. is the randomforest feature important to you? |
For me it is/was a way of learning about extending R. I will try to get the OP from stackoverflow into the loop. |
I asked the question in stackoverflow that started this thread. The reason I needed this RCppShark-random-forest implementation is because of a benchmark of random forest C++ implementations I did. I am a software developer who works with our AI team. They develop their algorithm in R. I need to implement it in C++. I did the benchmark of several C++ implementation vs. R-Ranger package implementation. It seems that the Shark random forest C++ implementation is very accurate but takes too much time to run. So we wanted to check the R interface of Shark to see If I did something wrong in wrapping the Shark C++ implementation. |
thanks for the info. i did not use the RFTrainer in shark yet-- and from the above mentioned issue i think your observations are accurate. so is there still a need from your side to get it wrapped? as said, it will take some time, as i currently am not able to work on it. maybe some time next month. if not, i'd rather just wait for the faster RFTrainer implementation that is mentioned in the issue (though this might take much longer?) |
This issue is a result of investigating https://stackoverflow.com/questions/45455318/rcppshark-random-forest-example-throws-exception-about-the-random-number-generat
When one tries to follow the example in https://www.2021.ai/randsharkmachinelearning/, one ends up with th error message:
This error message is produced here: https://github.com/aydindemircioglu/RcppShark/blob/master/src/shark/Rng/Runif.h#L71
If I run R in gdb, I get the following backtrace at that point:
I have tried to go from
SHARK_PARALLEL_FOR
to normalfor
and useshark::Rng::globalRng
inside ofRFTrainer::train
, but that only changed the point where the error was thrown.AFAIK, R extensions must not produce their own random numbers but must use the random number generator configured in R. I don't know how to interface this properly with parallel processing, where independent streams of random numbers are needed. Setting consecutive seeds is probably not the best way to do this. I know, that in R one would normally use an approach like described in section 6 of https://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.pdf.
The text was updated successfully, but these errors were encountered: