Closed
Description
I have code something like this:
lsi = ClassifierReborn::LSI.new(auto_rebuild: false)
data.each do |row|
lsi.add_item(row['foo'], row['bar'])
end
lsi.build_index
...and build_index
runs very slowly on lots of items.
- With ~10 items, it runs in <1 second
- With ~20 items, it runs in ~15 seconds
- With ~30 items, it runs in ~130 seconds
I tracked it down to #build_index by disabling auto_rebuild
. From there, I tracked it through LSI#build_reduced_matrix, to the monkey-patched extension Matrix#SV_decomp
, inside the 3-level nested loop:
while true do
for row in (0...qrot.row_size-1) do
for col in (1..qrot.row_size-1) do
Based on the name SV_decomp
, I'll hazard a guess that this is supposed to be a Singular Value Decomposition (which I just discovered). A quick search turned up the Ruby-SVD gem, which could be an option.
I don't understand any of the math, or much of this gem's layout yet, but wanted to record my findings.