Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve unified search content type results #2010

Merged
merged 13 commits into from
Jun 20, 2023

Conversation

zchsh
Copy link
Contributor

@zchsh zchsh commented Jun 19, 2023

πŸ”— Relevant links

πŸ—’οΈ What

This PR refactors the "unified" search result tabs to ensure that relevant results are shown for all content types.

Why

Upstream, we execute a single search query against Algolia for the "all" tab, and then use client-side logic to split "all" results into separate content type tabs.

While efficient, and perhaps intuitive-feeling since the "all" tab truly represents the sum of results in all other tabs, this approach comes with a very bad downside. For generic search terms that return many results, the split of results across content-types is difficult if not impossible to control.

For context, for performance and experience reasons, we do not seem to want "exhaustive" search. Rather than return every record that matches a given query, we return only the most relevant records up to some limit we set in Algolia configuration.

As a concrete example, imagine the limit is 100 record. The query vault will return a full page of results in the "all" tab, so 100 results. But when split by content type, there is no reason to expect ~ 33 results in each of the separate content tabs. Since results are ranked by relevance, regardless of content type, some content types by nature of their authoring practices and metadata structure will "clobber" results from other content types for particular queries. Here are two screenshots of this exact behaviour happening:

vault upstream waypoint upstream
vault-before waypoint-before

πŸ› οΈ How

In this PR, instead of executing a single query to Algolia, we execute four separate queries - one for the "all" tab, and one for each of the three separate content types.

This ensures that if there are any remotely relevant results in a particular content type, they'll show up in that content type tab.

Note: The "all" tab will still show all results fully merged and ranked by relevance. It will still show the same contents as it did previously, with sometimes skewed results for generic terms. The only way to mitigate this while maintaining a unified index is to refine our ranking criteria and authoring practices.

πŸ“Έ Design Screenshots

vault this PR waypoint this PR
vault-after waypoint-after

πŸ§ͺ Testing

  • Visit the preview
    • Unified search results should function as they do upstream
    • Generic queries such as vault and waypoint should return a relatively balanced number of results across content type tabs

πŸ’­ Anything else?

Not at the moment!

@vercel
Copy link

vercel bot commented Jun 19, 2023

The latest updates on your projects. Learn more about Vercel for Git β†—οΈŽ

Name Status Preview Comments Updated (UTC)
dev-portal βœ… Ready (Inspect) Visit Preview πŸ’¬ Add feedback Jun 20, 2023 8:32pm

@github-actions
Copy link

Some suggested prefixes and emojis that may help to write clear, actionable code review comments:

Praise πŸ™Œ Question πŸ™‹ Thought πŸ’­ Blocker 🚧 Future πŸ“Œ Optional 🎨 Nitpick ⛏️
Expand for comment prefix descriptions
Prefix+Emoji Description
Praise πŸ™Œ Use to highlight something positive. It's nice to try to leave one per review, but don't leave false praise just to leave one of these comments.
Question πŸ™‹ Use to gain clarity from the code author. The conversation could evolve into any one of these other categories. Only the reviewer should resolve these comment threads.
Thought πŸ’­ Use to share context, leave a breadcrumb, or share an idea that came up while reviewing.
Blocker 🚧 Use to request changes that block merging the current PR. Only the reviewer should resolve these comment threads.
Future πŸ“Œ Use to request changes that the code author can choose to address in the current PR or a follow-up one.
Optional 🎨 Use to suggest optional changes that you feel strongly about but ultimately defer to the code author to make a decision on. These can be comments that turn into valuable conversation starters for adopting new code styles, guidelines, or team practices.
Nitpick ⛏️ Use to suggest changes based on loose opinions or personal preferences. The difference between this and Optional 🎨  is how strong the code reviewer's opinion is.

@github-actions
Copy link

github-actions bot commented Jun 19, 2023

πŸ“¦ Next.js Bundle Analysis

This analysis was generated by the next.js bundle analysis action πŸ€–

This PR introduced no changes to the javascript bundle πŸ™Œ

@@ -4,6 +4,6 @@
"max_static_paths": 10
},
"flags": {
"enable_unified_search": false
"enable_unified_search": true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚧 Note: will set flags back to false before merging.

* TODO: for non-specific `resultType`, filter to only show `integration`
* results for products with integrations config flags on.
*/
export function getAlgoliaFilters(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function replaces get-algolia-product-filter-string. With separate content-type-specific search queries now running for each tab, I've expanded this function slightly to handle an optional contentType argument.

I've also expanded a bit on a TODO here related to the "integrations shown for a product that still have integrations flag off" issue identified in dev-portal#1986. Intent is to resolve that issue in a separate upcoming PR, which would be based off of the work in this PR.

@@ -22,7 +29,7 @@ import { useSetUpAndCleanUpCommandState } from 'components/command-bar/hooks'
* when in that product context, Would typing-based filters, such as
* `product:<productSlug>`, be preferable to the "product tag" approach?
*/
export function useCommandBarProductTag() {
export function useCommandBarProductTag(): CommandBarProductTag | null {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned up the return type here, bit of a "side quest", but was adjacent to some other work in this PR.

/**
* Render unified search results for the provided query input.
*/
function SearchResults({
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is where the bulk of the refactor occurred.

I've taken an approach that is in some ways similar to the existing separate-index tabs. This allows four separate searches to happen (one for each content type, plus on "global" without a content type filter), each returning filtered results up to the index's maximum single-page count.

The intent here is to thoroughly resolve the Skewed ranking on low-specificity queries issue mentioned at the end of the PR description of dev-portal#1986. That issue came up when executing a single search for the "all" tab, and then filtering down results client-side to populate every other tab.

Why this feels preferable

Apart from slightly worse performance (four queries instead of one), the only downside to this revised approach is that the results in the "All" tab are no longer simply the sum of the results in all other tabs. With this in mind, the count on the "All" tab has been removed.

These relatively minor compromises feel well worth avoiding the issue of relevant results for generic queries being lost to the abyss of skewed ranking criteria across content types.

Note: the "All" tab count removal, and this revised "separate search query per tab" approach was discussed with design last Friday, so I think we're good to go here on that front πŸ™‡β€β™‚οΈ

* Transform unified search results into tab data that the
* `UnifiedHitsContainer` component can render.
*/
export function gatherSearchTabsData(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slight refactor here to match some keys to make things easier, and to allow gather-search-tabs-content to accept a unified search results object rather than a single array of raw hits.

*
* Note: this component needs to be used within an `InstantSearch` container
* imported from 'react-instantsearch-hooks-web'. That container provides
* the context from which `rawHits` are pulled.
*/
export function UnifiedHitsContainer({
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With tabsData logic now outside of this component, it's become much more purely presentational, taking the prepped tabsData and rendering the tabbed search results.

Comment on lines -68 to +43
<TabHeadingWithCount heading={heading} count={hitCount} />
<TabHeadingWithCount
heading={heading}
count={type === 'global' ? undefined : hitCount}
/>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We hide the tab count for the global tab - it doesn't "add up" with the other tabs, so felt more confusing than helpful to include it.

import type { Hit } from 'instantsearch.js'

// TODO: set up an enum for this
export type UnifiedSearchableContentType =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this types file to hold some types I found myself re-using in a couple spots.

@zchsh zchsh marked this pull request as ready for review June 20, 2023 14:47
@zchsh zchsh requested a review from a team as a code owner June 20, 2023 14:47
@zchsh zchsh requested a review from braydenlove June 20, 2023 14:48
Copy link
Contributor

@BrandonRomano BrandonRomano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a totally reasonable approach. I'm not too concerned about performance RE: the 4 queries.

It looks like we already were managing these 4 indexes, and we didn't need to spin up any new indexes right? Just asking to ensure I shouldn't be looking anywhere else!

Copy link
Contributor

@braydenlove braydenlove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@zchsh
Copy link
Contributor Author

zchsh commented Jun 20, 2023

@BrandonRomano Re: the 4 queries, they're 4 queries all on the same unified index! So no new indexes needed πŸ‘

As an adjacent note: in theory, and hopefully in practice in the near future, we should be able to transition /tutorials/library to the new unified index as well. Once that is done (and Sentinel is migrated to Dev Dot) I believe we'll then be only using the unified index, which will open the door to making some significant simplifications and reducing maintenance load for our search indexing workflows πŸ”­

Copy link

@emilypersson1 emilypersson1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks Zach! this approach looks great to me πŸ™Œ

@zchsh zchsh added this pull request to the merge queue Jun 20, 2023
Merged via the queue into main with commit 9fbba1a Jun 20, 2023
@zchsh zchsh deleted the zs.unified-search-separate-filters branch June 20, 2023 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants