Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LSI#build_index is very slow #14

Closed
danbernier opened this issue Sep 13, 2014 · 13 comments
Closed

LSI#build_index is very slow #14

danbernier opened this issue Sep 13, 2014 · 13 comments
Labels

Comments

@danbernier
Copy link
Contributor

I have code something like this:

lsi = ClassifierReborn::LSI.new(auto_rebuild: false)
data.each do |row|
  lsi.add_item(row['foo'], row['bar'])
end
lsi.build_index

...and build_index runs very slowly on lots of items.

  • With ~10 items, it runs in <1 second
  • With ~20 items, it runs in ~15 seconds
  • With ~30 items, it runs in ~130 seconds

I tracked it down to #build_index by disabling auto_rebuild. From there, I tracked it through LSI#build_reduced_matrix, to the monkey-patched extension Matrix#SV_decomp, inside the 3-level nested loop:

     while true do
       for row in (0...qrot.row_size-1) do
         for col in (1..qrot.row_size-1) do

Based on the name SV_decomp, I'll hazard a guess that this is supposed to be a Singular Value Decomposition (which I just discovered). A quick search turned up the Ruby-SVD gem, which could be an option.

I don't understand any of the math, or much of this gem's layout yet, but wanted to record my findings.

@danbernier
Copy link
Contributor Author

I forgot to mention: I don't have rb-gsl installed, which is why I'm going down this code-path.

@parkr
Copy link
Member

parkr commented Sep 13, 2014

These are valuable findings! The code you mention has been there for a while, and was taken with us when we forked the project. I'll check out the Ruby SVD gem and see if it's good for our purposes – if you get to it before I do, let me know what your findings are!

@alfredxing
Copy link
Member

The Ruby SVD gem seems to be a bit outdated (even more so than the original Classifier), so I don't know if we'd want to use it. It also depends on mathn, which was what broke Kramdown in the first place...

Then again, the gem is Public Domain, so we might be able to learn a bit from it (though I do see loads of loops in ext/svd.c).

Update: I found SciRuby/nmatrix, which seems to be actively maintained and has a gesvd function.

@parkr parkr added the bug label Nov 8, 2014
@Ch4s3
Copy link
Member

Ch4s3 commented Nov 14, 2014

nmatrix looks like a great option, and I really like the SciRuby project, but the install process for for nmatrix is pretty tricky. This bug on gesvd is also concerns me a bit. There's also mdarray but its a jruby project. Otherwise I can't find any active gems with a Singular Value Decomposition method.

Thoughts?

@parkr
Copy link
Member

parkr commented Nov 14, 2014

If possible, we like to keep gem dependencies at a minimum. If they're easy to use and install and they solve the problem perfectly (these gems we might add to the codebase), then I think it's fine. Otherwise, I'd be skeptical about adding another dependency.

@nhoizey
Copy link

nhoizey commented May 4, 2015

It seems I have a blocking issue with LSI.

I have this issue I wrote about on Stackoverflow: http://stackoverflow.com/questions/30038899/jekyll-build-stuck-in-rebuilding-index-stage

A friend told me to try without LSI, and indeed I have not the issue anymore.

Any progress on this since november?

@Ch4s3
Copy link
Member

Ch4s3 commented May 5, 2015

@nhoizey Sadly no. We still need to rewrite the SVD, and I haven't had to to relearn that corner of linear algebra to rewrite it. If you know anyone who is both a Rubyist and handy at matrix manipulation, they could be super helpful here!

Otherwise I'll try my best to work on it, but I'm a bit swamped right now.

@nhoizey
Copy link

nhoizey commented May 5, 2015

@Ch4s3 ok, thanks for the quick answer! I will do without LSI now, and watch the issue to know when any progress is made. I will try to find if someone can help: https://twitter.com/nhoizey/status/595618088369938432

@Ch4s3
Copy link
Member

Ch4s3 commented May 5, 2015

Awesome, thanks! If I decide I absolutely can't do it, I'll consider putting a bounty on it down the line.

@inspire22
Copy link

+1 Thanks, I've been wondering why LSI seems so crazy slow once I get above 100 documents...

@jayniz
Copy link

jayniz commented Sep 9, 2015

Been bitten by this too :)

@kreynolds
Copy link
Contributor

I just did a bunch of performance improvements for the GSL (some for pure-ruby as well) and I thought I'd add some benchmarks here .. I index 130 documents in 4 seconds on my 2013 macbook pro. If you are having performance issues, you might want to check out the GSL variant, especially once #46 gets merged

@Ch4s3
Copy link
Member

Ch4s3 commented Dec 30, 2016

Based on #46 and the need to do the rewrite #30 that may or may not happen, I'm calling this closed.

@Ch4s3 Ch4s3 closed this as completed Dec 30, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants