Skip to content

LSI#build_index is very slow #14

Closed
@danbernier

Description

@danbernier

I have code something like this:

lsi = ClassifierReborn::LSI.new(auto_rebuild: false)
data.each do |row|
  lsi.add_item(row['foo'], row['bar'])
end
lsi.build_index

...and build_index runs very slowly on lots of items.

  • With ~10 items, it runs in <1 second
  • With ~20 items, it runs in ~15 seconds
  • With ~30 items, it runs in ~130 seconds

I tracked it down to #build_index by disabling auto_rebuild. From there, I tracked it through LSI#build_reduced_matrix, to the monkey-patched extension Matrix#SV_decomp, inside the 3-level nested loop:

     while true do
       for row in (0...qrot.row_size-1) do
         for col in (1..qrot.row_size-1) do

Based on the name SV_decomp, I'll hazard a guess that this is supposed to be a Singular Value Decomposition (which I just discovered). A quick search turned up the Ruby-SVD gem, which could be an option.

I don't understand any of the math, or much of this gem's layout yet, but wanted to record my findings.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions