No hits with search threshold 0 on documents containing words with common root #911

fturmel · 2025-03-12T16:03:12Z

Describe the bug

When doing full text search with threshold 0 on a document that contains a few words with common roots, we don't get a hit until we've typed enough characters to disambiguate them.

To Reproduce

Search with threshold 0 the following test cases:

On the indexed value "Phone, phonogram":

search for "p", "ph", "pho" or "phon" -> no hits (we should get a hit obviously)
search for "phone" or "phono" -> 1 hit (as expected)

On the indexed value "Bet, better":

search for "b", "be" or "bet" -> no hits (we should get a hit, it's even worst than the previous case because "bet" is actually a full word match)
search for "bett", "bette" or ""better" -> 1 hit (as expected)
search for "bet hi" -> 1 hit (searching for an additional word now gives us a hit for "bet", puzzling...)

On the indexed value "Some random sentence"

search for "s" -> no hits (we have 2 words that start with s, should be getting a hit)
search for "r" -> 1 hit
search for "se" or "so" -> 1 hit

Expected behavior

see previous reproduction description

Environment Info

OS: macOS 15.3.2
Node: 22.14.0
Orama: 3.1.2

Affected areas

Search

Additional context

No response

fturmel · 2025-03-12T17:56:16Z

@micheleriva here are the unit tests to add to packages/orama/tests/threshold.test.ts. 8 out of 14 are failing at the moment.

t.test('should return results for words with same root if threshold is 0', async t => {
  // related issue: https://github.com/oramasearch/orama/issues/911

  const db = create({
    schema: {
      title: 'string'
    }
  })

  await insert(db, { title: 'Phone, phonogram' })
  await insert(db, { title: 'Bet, better' })
  await insert(db, { title: 'Some random sentence' })

  const testCases: [string, number][] = [
    ['p', 1],
    ['ph', 1],
    ['pho', 1],
    ['phone', 1],
    ['phono', 1],

    ['b', 1],
    ['be', 1],
    ['bet', 1],
    ['bett', 1],
    ['bet hi', 0], // the term "hi" is not in any document, there should be no hits with threshold 0

    ['s', 1],
    ['r', 1],
    ['se', 1],
    ['so', 1]
  ]

  t.plan(testCases.length)

  for (const [term, expectedCount] of testCases) {
    const result = await search(db, { term, threshold: 0 })
    t.same(
      result.count,
      expectedCount,
      `Search term "${term}" with threshold 0 should match ${expectedCount} record(s), but matched ${result.count}`
    )
  }
})

fturmel · 2025-03-19T19:52:09Z

I'll just add that as far as I can tell, this is a regression from Orama v2.

@micheleriva Is there any way you could confirm this is a bug and not a usage/comprehension issue on my end? I have to solve this for a project, which will require either going back to v2 or dropping Orama altogether. I don't think I have the time or sufficient understanding of the internals to work on a PR myself at the moment.

Let me know if any additional info would be helpful here. Thanks!

gaurav21r · 2025-03-24T10:19:20Z

@fturmel I can confirm this as well. Thanks for the suggestion! Backporting to 2.0.24 makes this work but that has issues too.

I have a feeling this error might be due to some mismatch between tolerance and threshold though I'm not a Search Algorithm expert so won't comment further without proper investigation.

I am using Orama for a large Food Dataset and 3.x is basically unusable for me regarding the same issue that @fturmel mentioned, @micheleriva I think its imperative to add what he's mentioned to the unit test. I'll also try to contrubute more. Since I have a proprietary database right out of a PhD lab, I'll need to do processing on the data / paperwork to present a small test case here.

micheleriva · 2025-03-24T15:48:11Z

Looking at this. Thanks for noticing the issue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No hits with search threshold 0 on documents containing words with common root #911

No hits with search threshold 0 on documents containing words with common root #911

fturmel commented Mar 12, 2025 •

edited

Loading

fturmel commented Mar 12, 2025

fturmel commented Mar 19, 2025

gaurav21r commented Mar 24, 2025 •

edited

Loading

micheleriva commented Mar 24, 2025

No hits with search threshold 0 on documents containing words with common root #911

No hits with search threshold 0 on documents containing words with common root #911

Comments

fturmel commented Mar 12, 2025 • edited Loading

Describe the bug

To Reproduce

Expected behavior

Environment Info

Affected areas

Additional context

fturmel commented Mar 12, 2025

fturmel commented Mar 19, 2025

gaurav21r commented Mar 24, 2025 • edited Loading

micheleriva commented Mar 24, 2025

fturmel commented Mar 12, 2025 •

edited

Loading

gaurav21r commented Mar 24, 2025 •

edited

Loading