-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite the SVD #30
Comments
@parkr The more I read about this, the more I think a pure Ruby implementation is just going to always give us poor performance. It might make sense to bundle a C solution into the gem since there a lot of great existing solutions. Then just check to make sure it builds correctly on various platforms. I'm not married to this approach, but it could be the right way forward. Thoughts? |
That seems like a good plan. |
Ok. I'll devise a plan tonight and scout out implementations. It'll be good to wrap this one up. |
Sounds terrific. Thanks, Chase! |
Not a problem. I want to wrap up a few things in a row and get a release out in the next week(ish). |
As long as the license is compatible and GitHub can use it for commercial purposes, it's fine. I usually lean for the MIT. Can MIT projects contain GNU code? I'm not sure. |
I'll read up |
Ok, I've tried few things and have it sort of working, but it segfaults on rare occasions. Still a wip. |
Sweet! When is it segfaulting? Is the C still optional? |
It segfaults while transforming the matrix for some inputs. It could be optional. |
Any news here? :) |
@jayniz I had to take a break from this for a bit, but I'll try to get back to it soon. Basically I found 1 or 2 promising implementations, but they both segfault for some input. I'm not sure why yet so I haven't gotten past that. I'll do my best to figure it out soon, but I'm open to people helping out. |
Hey @Ch4s3 no problemo :-) The bayes classifier is working and goes into production today. I played around with LSI locally, and I got the above errors for my inputs - not with the GLS lib and gem though, that worked. But where training the bayes with 6k inputs took ~8 sec, LSI took a couple of minutes for 600 inputs (and with 6k inputs it's still running after 30h) on a recent 13"mbp with 16G ram. In my benchmarks the bayes classifier performed quite well to detect comment spam though - trained with 6k comments I had it classify ~80k comments and it classified correctly in 94.5% of the cases, with 4.75% false negatives and 0.75% false positives. Let's see how it behaves in production! |
@jayniz Awesome to hear that the bayes classifier is working! I'll keep working on the SVD. |
@jayniz If you can provide me with sample data that worked with the GLS but broke with the ruby implementation, that would be super helpful! |
@Ch4s3 not straight away, I'd have to run it on some data to see if it crashes first :-) if you let me know which implementation I should try to crash, I can try to find some time this week to make it crash and then send you the data that crashed it? |
@jayniz Awesome, see if you can blow up the native ruby implementation. I found an LGPL implementation of the svd in C and I'm writing a wrapper for a c extension, but it won't be done straight away. Any input that crashes other implementations will be good for testing. |
Ah, IIRC it was the LSI that I had issues with. Hmm....
|
ahh sorry, the LSI feature uses the SVD under the hood, and the ruby svd is what makes it slow vs using GSL. See this line. |
Ooops, I don't know how, but I misread SVD as the bayes classifier in my
head. I don't know how, but I did it. Will try to make it crash and let you
know. If I didn't get back to you this week, I have forgotten and will
appreciate a gentle knock on the head.
|
No worries. I probably won't have the c extension done by the weekend anyway. |
So after a few tries, I can't get the c extension right. If anyone else wants to try, I can push up what I tried. Maybe it would make sense to use Helix to wrap a rust SVD lib. That way we could distribute the binary and have some minimal guarantees about safety. |
@Ch4s3 Hey, I love that you brought Rust in on this. I'd like to provide some resources that may be helpful in getting this done. I'm the author of faster_path which rewrites Ruby's Pathname library in Rust for improving the performance of Rails at 30%+ I see you're using Thermite. That's next on my agenda for faster_path as it will allow binary builds of the Rust build of the dynamic library to be served from a host and thereby not require the Ruby users to have Rust installed. My current focus on my project is to have my test suite prove cross-platform Rust compilation works since Mac OS & a few Linux distros skip the Rust library build process. But as for the helpful resources, the code base for faster_path has plenty of working Rust code running under Ruby. The next two things are an article I wrote Coming to Rust from Ruby and my documentation during my first week learning Rust Getting started in Rust. Also when looking up Rust methods the internet is actually a more difficult way to find answers; the best answers to be found are directly from documentation provided in the Dash app. Searching for methods in Dash has been the quickest and most accurate tool for the job. An alternative to Dash is Zeal which is the open source version of it. |
@danielpclark Have you moved to thermite yet? I'm looking to pick this back up. |
@Ch4s3 No… The author updated his PR to be current with the code base but the CI tests results are intermittent/flaky. |
Yeah, I saw that. I need to get back to the Rust book and try this again. |
Hello,
but after specific number of added Items I am getting I am not sure Why this error raised for string rather than other. Any solution ?? |
@Ch4s3 I've got Thermite integrated now. I've submitted a bunch of PRs to ruru that give you most of Ruby's native features (they've been sitting unnoticed for a while) and I've merged them all into my own fork https://github.com/danielpclark/ruru/tree/playground if you want to try them out. I'm building FasterPath directly from it so that repo branch is here to stay. I have pretty much mastered Rust to Ruby integration. Ruru doesn't support splat operators for parameters yet so I wrote code that's a little more bare metal here if you'd like to see it working with any number of parameters of input. Other than that the rest of how to do it should be somewhat easy to see from FasterPath. |
@danielpclark I'll take a look as soon as I can, but it probably won't be until early Summer. Thanks for giving me the heads up! |
As discussed in #27 we need to rewrite the SVD method here. This could also be used as an opportunity to remove the monkey patch on matrix, and provide a method like
svd(matrix)
. This will clear up #5 as well.Here's a resource I found on SVD. I'll definitely need help on this one.
The text was updated successfully, but these errors were encountered: