-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LSI#build_index is very slow #14
Comments
I forgot to mention: I don't have rb-gsl installed, which is why I'm going down this code-path. |
These are valuable findings! The code you mention has been there for a while, and was taken with us when we forked the project. I'll check out the Ruby SVD gem and see if it's good for our purposes – if you get to it before I do, let me know what your findings are! |
The Ruby SVD gem seems to be a bit outdated (even more so than the original Classifier), so I don't know if we'd want to use it. It also depends on Then again, the gem is Public Domain, so we might be able to learn a bit from it (though I do see loads of loops in Update: I found SciRuby/nmatrix, which seems to be actively maintained and has a |
nmatrix looks like a great option, and I really like the SciRuby project, but the install process for for nmatrix is pretty tricky. This bug on gesvd is also concerns me a bit. There's also mdarray but its a jruby project. Otherwise I can't find any active gems with a Singular Value Decomposition method. Thoughts? |
If possible, we like to keep gem dependencies at a minimum. If they're easy to use and install and they solve the problem perfectly (these gems we might add to the codebase), then I think it's fine. Otherwise, I'd be skeptical about adding another dependency. |
It seems I have a blocking issue with LSI. I have this issue I wrote about on Stackoverflow: http://stackoverflow.com/questions/30038899/jekyll-build-stuck-in-rebuilding-index-stage A friend told me to try without LSI, and indeed I have not the issue anymore. Any progress on this since november? |
@nhoizey Sadly no. We still need to rewrite the SVD, and I haven't had to to relearn that corner of linear algebra to rewrite it. If you know anyone who is both a Rubyist and handy at matrix manipulation, they could be super helpful here! Otherwise I'll try my best to work on it, but I'm a bit swamped right now. |
@Ch4s3 ok, thanks for the quick answer! I will do without LSI now, and watch the issue to know when any progress is made. I will try to find if someone can help: https://twitter.com/nhoizey/status/595618088369938432 |
Awesome, thanks! If I decide I absolutely can't do it, I'll consider putting a bounty on it down the line. |
+1 Thanks, I've been wondering why LSI seems so crazy slow once I get above 100 documents... |
Been bitten by this too :) |
I just did a bunch of performance improvements for the GSL (some for pure-ruby as well) and I thought I'd add some benchmarks here .. I index 130 documents in 4 seconds on my 2013 macbook pro. If you are having performance issues, you might want to check out the GSL variant, especially once #46 gets merged |
I have code something like this:
...and
build_index
runs very slowly on lots of items.I tracked it down to #build_index by disabling
auto_rebuild
. From there, I tracked it through LSI#build_reduced_matrix, to the monkey-patched extensionMatrix#SV_decomp
, inside the 3-level nested loop:Based on the name
SV_decomp
, I'll hazard a guess that this is supposed to be a Singular Value Decomposition (which I just discovered). A quick search turned up the Ruby-SVD gem, which could be an option.I don't understand any of the math, or much of this gem's layout yet, but wanted to record my findings.
The text was updated successfully, but these errors were encountered: