-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to search for more than 38 ingredients #114
Comments
Naively it seems like it might be possible to |
(I think that support for multiple scores per document might provide a cleaner way to address those limitations, but that ability remains open as a pending feature request, and is a non-blocker) |
Although the number of ingredients in recipes shouldn't directly determine the choice of an upper number of ingredients that people can search for, it could be useful information to have as supporting context. The most frequent ingredient counts in our database seem to be for between 6-10 unique named ingredients per recipe: api=> with ingredient_counts (recipe_id, ingredient_count) as (select recipe_id, count(distinct product_name_id) from recipe_ingredients group by recipe_id) select ingredient_count, count(*) from ingredient_counts group by ingredient_count order by count(*) desc limit 10;
ingredient_count | count
------------------+-------
8 | 7585
7 | 7273
9 | 7235
6 | 6786
10 | 6547
11 | 5679
5 | 5661
12 | 4656
4 | 4041
13 | 3461
(10 rows) And in terms of recipes that contain the most unique-named ingredients: our upper limit -- and this probably includes a few outliers -- at the moment seems to be around thirty ingredients; these recipes generally involve a few sauces/condiments and therefore require a sizable list of herbs and spices: api=> with ingredient_counts (recipe_id, ingredient_count) as (select recipe_id, count(distinct product_name_id) from recipe_ingredients group by recipe_id) select ingredient_count, count(*) from ingredient_counts group by ingredient_count order by ingredient_count desc limit 10;
ingredient_count | count
------------------+-------
30 | 1
29 | 1
28 | 3
27 | 3
26 | 6
25 | 14
24 | 18
23 | 36
22 | 45
21 | 93
(10 rows) We can't guarantee that when a user searches for thirty ingredients that they'll find a recipe that matches all of those, but as long as least a few of the ingredients are popular ones, we should be able to return some results. Next up I'll have a think about how we could analyze our search result logs -- that include a |
I think I'd propose a five-step approach to resolving this:
I doubt many people will search for fifty ingredients. Currently the client interface isn't really optimal for entering that many items. But it may be valid in some situations, and it's good to provide functionality above-and-beyond what we think is usually expected. Longer-term resolutions Perhaps we could file a request for a Alternatively, we could refactor our code to avoid the bitset pattern entirely. I wasn't able to think of a way to do that using the existing functionality of either Elasticsearch or OpenSearch, however it is possible that something like opensearch-project/OpenSearch#3715 could help in future (there is an equivalent ticket open for Elasticsearch, too). Edit: add |
Mmm.. not quite. We would need to deploy updated translation resources for the |
Instead of multiplexing multiple results into a single
This means that the query would adjust dynamically based on the input query terms (the list of ingredients to include). That's OK, and that's already the case - the boolean match clauses in each query already vary based on the user's query terms, for example. This approach should also maintain the property that the sort-method script is static -- it shoudln't require a dynamic script for each query. I think that's likely to be a performance benefit, because it may mean that the JVM that evaluates the scoring can re-use the same already-compiled code module when evaluating each query's results. This may allow simplifying the |
The following
It's not performance-optimized, and even with tuning it's possible that it will perform worse than the existing The results of the above appear as:
|
Describe the bug
Searching for more than thirty-eight ingredients using the recipe search APIs (either
search
orexplore
) for more than thirty-eight ingredients currently fails.The reason is that we construct an ingredient match
boost
based on a power-of-ten and the numbered position of each ingredient within the query (and in fact we double that during exact-match searching).The search engine (currently
opensearch
, waselasticsearch
) is written in Java and expects theboost
to be an floating point value, with a supported value range approximately up to10e38
.Strictly speaking we don't need to use base-10 here. We need enough numeral-distance between each
boost
value to disambiguate between exact and partial matches as documented here, but I think we could achieve that in base-4 or something. The question is whether the (already complex) logic to implement that is worthwhile.If my calculations are correct, using a base-4 representation for the
boost
value would allow us up to 64 ingredients, and base-3 would allow up to 80.Alternatively we could simply decide that some arbitrary number of ingredients is a sensible upper limit. Thirty or so seems like it may be reasonable.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Results should be returned based on all query ingredients -- or with an explanatory refinement message if the query omitted some of the ingredients.
Screenshots
N/A
The text was updated successfully, but these errors were encountered: