🚸 Improve search #2135

sunnyosun · 2024-11-06T11:22:23Z

Key improvements:

prioritize startswith and isolated phrases (e.g. "naive B cell", "B cell, ..." over "club cell" when searching "b cell")
sort by shorter names first

Note: centrocyte appears in the "b cell" search because there's a perfect match of "B cell" in the description, same for "t cell" results.

LaminDB Before	LaminDB After	Hub

codecov · 2024-11-06T11:30:09Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.63%. Comparing base (57fbd29) to head (2ee90cb).
Report is 29 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2135      +/-   ##
==========================================
+ Coverage   92.53%   92.63%   +0.10%     
==========================================
  Files          55       55              
  Lines        6467     6508      +41     
==========================================
+ Hits         5984     6029      +45     
+ Misses        483      479       -4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

github-actions · 2024-11-06T11:34:10Z

🚀 Deployed on https://672d2c122dca6b5b514d2906--lamindb-qnwk.netlify.app

falexwolf · 2024-11-06T13:27:11Z

This single example looks fantastic!

I'm just worried that it will deteriorate other cases.

Can you run this against @Koncopd's benchmarking framework?

And then we have a before after comparison that includes a wider array of search cases and a report that underlies the decision published to LaminHub.

sunnyosun · 2024-11-07T12:01:41Z

Updated screenshots with the examples @Koncopd had, let me know if there's anything else I can test.

Koncopd · 2024-11-07T13:14:30Z

@sunnyosun @falexwolf here is the benchmark for this PR #2141
https://672cb784f7316d7437f2f69d--lamindb-qnwk.netlify.app/faq/benchmark-search

falexwolf · 2024-11-07T14:27:22Z

These are great improvements!

I wish there was good way in laminhub to document such changes but I guess there isn't for now.

falexwolf · 2024-11-07T14:28:10Z

@fredericenard, please also take a look here.

And then we'll discuss in the benchmarking PR how we proceed with organizing code across lamindb, bionty and laminhub.

falexwolf

One thing that's totally not covered is longer search phrases along the lines of "CD8-positie cytokine T cell" as highlighted at the very top of the original issue: #1708

So, I believe we should add such a case.

I also know that @sunnyosun had many more cases in her original benchmark: https://github.com/laminlabs/lamindb-benchmarks/blob/main/docs/2023/010-rapidfuzz-search.ipynb

Koncopd · 2024-11-07T17:01:44Z

Added more searches to the benchmark.
test_search_synonyms fails now, commented out in the benchmark.

sunnyosun · 2024-11-07T20:46:04Z

Fixed synonyms. The only thing is that it's getting a bit slower now 0.7-0.8s per search (before 0.2-0.3s) because of all the different layers. (us-west-2 instance)

Zethson

This is really cool!

I know that no review of me was requested but I thought I'd leave a couple of comments still. Feel free to ignore them!

Would it be possible to add 1-3 sentences of high level motivation and explanation of this new layered algorithm to the docstring or where appropriate, please?
We have the nice search benchmark now. Is there a way to add slightly more sophisticated tests to ensure that no changes of this code will lead to regressions of the search performance? @Koncopd benchmarking notebook is great but it's not regularly run, is it?

These things can also be added later if you think that they're useful...

Zethson · 2024-11-08T21:21:37Z

lamindb/_record.py

+        )
+
+    def tokenize_search_string(search_str: str) -> list[str]:
+        # Split the string


Suggested change

# Split the string

Self documenting code here.

Zethson · 2024-11-08T22:10:59Z

Ohh and can we get rid of this now?
https://docs.lamin.ai/query-search

🚀

falexwolf · 2024-11-09T08:23:42Z

Yes, removing will be the goal.

@Koncopd is making a push to consolidate all 3 search algorithms into one clearly documented and benchmarked solution.

Sunny's PR here has good ideas but it's not suitable for the hub due to performance. So, the hope is Sergei can replicate the UX with server-side code (essentially correctly using postgres plugins). We can still use Sunny's code for dataframes and sqlite; it'll give the same results but just run slower which is OK in that context.

⚡️ Improve search

177b2f8

github-actions bot temporarily deployed to pull request November 6, 2024 11:34 Inactive

sunnyosun force-pushed the improve-search branch from 9b5a18d to 177b2f8 Compare November 6, 2024 13:28

github-actions bot temporarily deployed to pull request November 6, 2024 13:39 Inactive

sunnyosun added 2 commits November 6, 2024 16:49

⚡️ 4 layers

68be421

Merge branch 'main' into improve-search

f8f2540

github-actions bot temporarily deployed to pull request November 7, 2024 10:00 Inactive

🎨 5 layers

1cd3a8c

github-actions bot temporarily deployed to pull request November 7, 2024 10:29 Inactive

🎨 Prioritize startswith

33cad29

sunnyosun requested review from falexwolf and Koncopd November 7, 2024 11:20

github-actions bot temporarily deployed to pull request November 7, 2024 11:20 Inactive

🎨 Sort by length

bb370ae

github-actions bot temporarily deployed to pull request November 7, 2024 11:57 Inactive

Merge branch 'main' into improve-search

f815064

github-actions bot temporarily deployed to pull request November 7, 2024 12:36 Inactive

Koncopd mentioned this pull request Nov 7, 2024

⚗️ Benchmark search #2141

Closed

Koncopd approved these changes Nov 7, 2024

View reviewed changes

sunnyosun linked an issue Nov 7, 2024 that may be closed by this pull request

Search is much better on the UI than in the open-source package #1708

Open

falexwolf changed the title ~~⚡️ Improve search~~ 🚸 Improve search Nov 7, 2024

falexwolf approved these changes Nov 7, 2024

View reviewed changes

falexwolf self-requested a review November 7, 2024 14:28

falexwolf requested changes Nov 7, 2024

View reviewed changes

sunnyosun added 2 commits November 7, 2024 16:06

🎨 Improve

eff6776

Merge branch 'main' into improve-search

e0cde46

🎨 Add synonym match

705dbd3

github-actions bot temporarily deployed to pull request November 7, 2024 20:07 Inactive

Merge branch 'main' into improve-search

9be3403

🎨 Clean up imports

2ee90cb

github-actions bot temporarily deployed to pull request November 7, 2024 21:07 Inactive

Zethson reviewed Nov 8, 2024

View reviewed changes

falexwolf closed this Nov 14, 2024

falexwolf mentioned this pull request Nov 14, 2024

⚡️ Improve speed and relevance of search #2163

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚸 Improve search #2135

🚸 Improve search #2135

sunnyosun commented Nov 6, 2024 •

edited by falexwolf

Loading

codecov bot commented Nov 6, 2024 •

edited

Loading

github-actions bot commented Nov 6, 2024 •

edited

Loading

falexwolf commented Nov 6, 2024

sunnyosun commented Nov 7, 2024

Koncopd commented Nov 7, 2024 •

edited

Loading

falexwolf commented Nov 7, 2024

falexwolf commented Nov 7, 2024

falexwolf left a comment •

edited

Loading

Koncopd commented Nov 7, 2024 •

edited

Loading

sunnyosun commented Nov 7, 2024 •

edited

Loading

Zethson left a comment

Zethson Nov 8, 2024

Zethson Nov 8, 2024

Zethson commented Nov 8, 2024 •

edited

Loading

falexwolf commented Nov 9, 2024

🚸 Improve search #2135

🚸 Improve search #2135

Conversation

sunnyosun commented Nov 6, 2024 • edited by falexwolf Loading

codecov bot commented Nov 6, 2024 • edited Loading

Codecov Report

github-actions bot commented Nov 6, 2024 • edited Loading

falexwolf commented Nov 6, 2024

sunnyosun commented Nov 7, 2024

Koncopd commented Nov 7, 2024 • edited Loading

falexwolf commented Nov 7, 2024

falexwolf commented Nov 7, 2024

falexwolf left a comment • edited Loading

Choose a reason for hiding this comment

Koncopd commented Nov 7, 2024 • edited Loading

sunnyosun commented Nov 7, 2024 • edited Loading

Zethson left a comment

Choose a reason for hiding this comment

Zethson Nov 8, 2024

Choose a reason for hiding this comment

Zethson Nov 8, 2024

Choose a reason for hiding this comment

Zethson commented Nov 8, 2024 • edited Loading

falexwolf commented Nov 9, 2024

sunnyosun commented Nov 6, 2024 •

edited by falexwolf

Loading

codecov bot commented Nov 6, 2024 •

edited

Loading

github-actions bot commented Nov 6, 2024 •

edited

Loading

Koncopd commented Nov 7, 2024 •

edited

Loading

falexwolf left a comment •

edited

Loading

Koncopd commented Nov 7, 2024 •

edited

Loading

sunnyosun commented Nov 7, 2024 •

edited

Loading

Zethson commented Nov 8, 2024 •

edited

Loading