-
-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade Pagefind to 1.3.0 and configure Pagefind logging levels #2728
Conversation
🦋 Changeset detectedLatest commit: 8f5c689 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
✅ Deploy Preview for astro-starlight ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
size-limit report 📦
|
I’ve repeated @HiDeoo’s exercise from #1750 of comparing results for a selection of search terms in the Starlight docs. For results that changed rank, I’ve added the difference in ranking compared to Pagefind v1 to help highlight what changed. Bolded entries marked with 🟢 are results that I’ve subjectively decided are “good” or “best” results, to help track those across columns. I’ve also added a subjective “impact” assessment for each query to summarize my impressions. The git-scm.com config columns refer to Pagefind’s options as used on git-scm.com: pageLength: 0.1, // boost longer pages
termFrequency: 0.1, // do not favor short pages
termSaturation: 2, // look for pages with more matches
termSimilarity: 9, // prefer exact matches For reference, the Pagefind defaults are: pageLength: 0.75, // range: 0 – 1
termFrequency: 1, // range: 0 – 1
termSaturation: 1.4, // range: 0 – 2
termSimilarity: 1, // range: 0 – n Query dataQuery:
|
Rank | 1.0 | 1.3, termFrequency: 1 |
1.3, termFrequency: 0.5 |
1.3, termFrequency: 0 |
1.3, git-scm.com config |
---|---|---|---|---|---|
1 | 🟢 Manual Setup | +1 Plugins Reference |
+1 Plugins Reference |
+1 Plugins Reference |
+1 Plugins Reference |
2 | Plugins Reference | -1 🟢 Manual Setup |
-1 🟢 Manual Setup |
-1 🟢 Manual Setup |
-1 🟢 Manual Setup |
3 | Using components | Using components | Using components | +2 Customizing Starlight |
+2 Customizing Starlight |
4 | 🟢 Getting Started | 🟢 Getting Started | 🟢 Getting Started | -1 Using components |
-1 Using components |
5 | Customizing Starlight | Customizing Starlight | Customizing Starlight | -1 🟢 Getting Started |
-1 🟢 Getting Started |
Quality | 💚 Good | 💚 Good | 💚 Good | 💚 Good | 💚 Good |
Change | 🔻 Slightly worse | 🔻 Slightly worse | 🔻 Slightly worse | 🔻 Slightly worse |
Impact: 👎 Negative — slightly worse than before across the board, although not terribly
Query: installation
Rank | 1.0 | 1.3, termFrequency: 1 |
1.3, termFrequency: 0.5 |
1.3, termFrequency: 0 |
1.3, git-scm.com config |
---|---|---|---|---|---|
1 | Site Search | +1 Authoring Content in Markdown |
+1 Authoring Content in Markdown |
+1 Authoring Content in Markdown |
+1 Authoring Content in Markdown |
2 | Authoring Content in Markdown | -1 Site Search |
-1 Site Search |
+1 Customizing Starlight |
+1 Customizing Starlight |
3 | Customizing Starlight | Customizing Starlight | Customizing Starlight | -2 Site Search |
-2 Site Search |
4 | CSS & Styling | CSS & Styling | CSS & Styling | CSS & Styling | CSS & Styling |
5 | 🟢 Manual Setup | 🟢 Manual Setup | 🟢 Manual Setup | +1 Configuration Reference |
+1 Configuration Reference |
6 | -1 🟢 Manual Setup |
||||
Quality | 🛑 Bad | 🛑 Bad | 🛑 Bad | 🛑 Bad | 🛑 Bad |
Change | ⚪️ No change | ⚪️ No change | 🔻 Worse | 🔻 Worse |
Impact: Neutral — we don’t have clear content for this query, closest being “Manual Setup”. So although some of these perform slightly worse, they are all pretty bad.
Query: page
Rank | 1.0 | 1.3, termFrequency: 1 |
1.3, termFrequency: 0.5 |
1.3, termFrequency: 0 |
1.3, git-scm.com config |
---|---|---|---|---|---|
1 | 🟢 Pages | +15 Internationalization (i18n) |
+15 Internationalization (i18n) |
+5 Overrides Reference |
🟢 Pages |
2 | Site Search | +4 Overrides Reference |
+4 Overrides Reference |
+6 Frontmatter Reference |
+4 Overrides Reference |
3 | Overriding Components | -1 Site Search |
-1 Site Search |
-2 🟢 Pages |
+5 Frontmatter Reference |
4 | Customizing Starlight | +4 Frontmatter Reference |
+4 Frontmatter Reference |
+7 Configuration Reference |
Customizing Starlight |
5 | Eco-friendly docs | +6 Configuration Reference |
+6 Configuration Reference |
+11 Internationalization (i18n) |
+6 Configuration Reference |
6 | |||||
7 | |||||
8 | 🟢 Pages | 🟢 Pages | |||
Quality | 💚 Good | 🛑 Bad | 🛑 Bad | 🟡 OK | 💚 Good |
Change | 🔻 Worse | 🔻 Worse | 🔻 Worse | ⬆️ Better |
Impact: Mixed — wildly different results for basic config, which in the worst case removes the “Pages” result entirely. Only git-scm.com config works well.
Query: markdown
Rank | 1.0 | 1.3, termFrequency: 1 |
1.3, termFrequency: 0.5 |
1.3, termFrequency: 0 |
1.3, git-scm.com config |
---|---|---|---|---|---|
1 | 🟢 Authoring Content in Markdown | +9 Overrides Reference |
+9 Overrides Reference |
🟢 Authoring Content in Markdown | 🟢 Authoring Content in Markdown |
2 | Link Cards | -1 🟢 Authoring Content in Markdown |
-1 🟢 Authoring Content in Markdown |
+8 Overrides Reference |
+3 Pages |
3 | Steps | -1 Link Cards |
-1 Link Cards |
+2 Pages |
-1 Link Cards |
4 | Card Grids | +1 Pages |
+1 Pages |
Card Grids | +5 Overrides Reference |
5 | Pages | -1 Card Grids |
-1 Card Grids |
-4 Link Cards |
-1 Card Grids |
Quality | 💚 Good | 💚 Good | 💚 Good | 💚 Good | 💚 Good |
Change | 🔻 Worse | 🔻 Worse | ⚪️ Neutral | ⬆️ Better |
Impact: Neutral — downranks the obvious result with simple configs, but does OK in others.
Query: component
Rank | 1.0 | 1.3, termFrequency: 1 |
1.3, termFrequency: 0.5 |
1.3, termFrequency: 0 |
1.3, git-scm.com config |
---|---|---|---|---|---|
1 | 🟢 Using components | 🟢 Using components | 🟢 Using components | +2 Overrides Reference |
+2 Overrides Reference |
2 | 🟢 Overriding Components | +19 Pages |
+19 Pages |
-1 🟢 Using components |
-1 🟢 Using components |
3 | Overrides Reference | Overrides Reference | Overrides Reference | -1 🟢 Overriding Components |
-1 🟢 Overriding Components |
4 | Eco-friendly docs | -2 🟢 Overriding Components |
-2 🟢 Overriding Components |
+2 File Tree |
+2 File Tree |
5 | Code | +1 File Tree |
+1 File Tree |
+3 Card Grids |
+3 Card Grids |
Quality | 💚 Good | 💚 Good | 💚 Good | 💚 Good | 💚 Good |
Change | ⚪️ Neutral | ⚪️ Neutral | ⚪️ Neutral | ⚪️ Neutral |
Impact: 👍 Positive — surfaces the same key content, plus adds “Pages” which does include page component docs
Query: css
Rank | 1.0 | 1.3, termFrequency: 1 |
1.3, termFrequency: 0.5 |
1.3, termFrequency: 0 |
1.3, git-scm.com config |
---|---|---|---|---|---|
1 | 🟢 CSS & Styling | 🟢 CSS & Styling | 🟢 CSS & Styling | 🟢 CSS & Styling | 🟢 CSS & Styling |
2 | Icons | Icons | Icons | +2 Customizing Starlight |
+2 Customizing Starlight |
3 | Starlight Showcase | +1 Customizing Starlight |
+1 Customizing Starlight |
+2 Configuration Reference |
+2 Configuration Reference |
4 | Customizing Starlight | -1 Starlight Showcase |
-1 Starlight Showcase |
-2 Icons |
-2 Icons |
5 | Configuration Reference | Configuration Reference | Configuration Reference | -2 Starlight Showcase |
-2 Starlight Showcase |
Quality | 💚 Good | 💚 Good | 💚 Good | 💚 Good | 💚 Good |
Change | ⚪️ Neutral | ⚪️ Neutral | ⬆️ Better | ⬆️ Better |
Impact: Neutral — no significant changes for the most part, slightly in the last two columns as customization & configuration pages are probably more relevant here.
Query: language
Rank | 1.0 | 1.3, termFrequency: 1 |
1.3, termFrequency: 0.5 |
1.3, termFrequency: 0 |
1.3, git-scm.com config |
---|---|---|---|---|---|
1 | 🟢 Internationalization (i18n) | +4 Overrides Reference |
+4 Overrides Reference |
🟢 Internationalization (i18n) | 🟢 Internationalization (i18n) |
2 | Link Cards | -1 🟢 Internationalization (i18n) |
-1 🟢 Internationalization (i18n) |
+2 Configuration Reference |
+2 Configuration Reference |
3 | Make your docs shine with Starlight | -1 Link Cards |
+1 Configuration Reference |
+2 Overrides Reference |
+4 Authoring Content in Markdown |
4 | Configuration Reference | Configuration Reference | -2 Link Cards |
+3 Authoring Content in Markdown |
+1 Overrides Reference |
5 | Overrides Reference | +1 Pages |
+1 Pages |
+3 Sidebar Navigation |
+3 Sidebar Navigation |
Quality | 💚 Good | 💚 Good | 💚 Good | 💚 Good | 💚 Good |
Change | 🔻 Slightly worse | 🔻 Slightly worse | ⬆️ Better | ⬆️ Better |
Impact: Neutral — over promotes “Overrides Reference” in first two variations, but does better selecting secondary results in last two columns.
Query: sidebar
Rank | 1.0 | 1.3, termFrequency: 1 |
1.3, termFrequency: 0.5 |
1.3, termFrequency: 0 |
1.3, git-scm.com config |
---|---|---|---|---|---|
1 | 🟢 Sidebar Navigation | +4 Configuration Reference |
+4 Configuration Reference |
🟢 Sidebar Navigation | 🟢 Sidebar Navigation |
2 | Overrides Reference | Overrides Reference | Overrides Reference | Overrides Reference | Overrides Reference |
3 | Pages | Pages | Pages | +1 Frontmatter Reference |
+1 Frontmatter Reference |
4 | Frontmatter Reference | -3 🟢 Sidebar Navigation |
-3 🟢 Sidebar Navigation |
+1 Configuration Reference |
-1 Pages |
5 | Configuration Reference | -1 Frontmatter Reference |
-1 Frontmatter Reference |
-2 Pages |
Configuration Reference |
Quality | 💚 Good | 🟡 OK | 🟡 OK | 💚 Good | 💚 Good |
Change | 🔻 Worse | 🔻 Worse | ⚪️ Neutral | ⚪️ Neutral |
Impact: Mixed — basic configs do OK but still worse than Pagefind 1.0, fixed by later variations
Query: lastUpdated
Impact: Neutral — no change
Rank | 1.0 | 1.3, termFrequency: 1 |
1.3, termFrequency: 0.5 |
1.3, termFrequency: 0 |
1.3, git-scm.com config |
---|---|---|---|---|---|
1 | Overrides Reference | Overrides Reference | Overrides Reference | Overrides Reference | Overrides Reference |
2 | Frontmatter Reference | Frontmatter Reference | Frontmatter Reference | Frontmatter Reference | Frontmatter Reference |
3 | Configuration Reference | Configuration Reference | Configuration Reference | Configuration Reference | Configuration Reference |
Quality | 💚 Good | 💚 Good | 💚 Good | 💚 Good | 💚 Good |
Change | ⚪️ No change | ⚪️ No change | ⚪️ No change | ⚪️ No change |
Query: plugin
Rank | 1.0 | 1.3, termFrequency: 1 |
1.3, termFrequency: 0.5 |
1.3, termFrequency: 0 |
1.3, git-scm.com config |
---|---|---|---|---|---|
1 | 🟢 Plugins and Integrations | +1 🟢 Plugins Reference |
+1 🟢 Plugins Reference |
+1 🟢 Plugins Reference |
+1 🟢 Plugins Reference |
2 | 🟢 Plugins Reference | -1 🟢 Plugins and Integrations |
-1 🟢 Plugins and Integrations |
-1 🟢 Plugins and Integrations |
-1 🟢 Plugins and Integrations |
3 | Configuration Reference | +1 CSS & Styling |
+1 CSS & Styling |
Configuration Reference | Configuration Reference |
4 | CSS & Styling | +1 Site Search |
+1 Site Search |
CSS & Styling | CSS & Styling |
4 | Site Search | -2 Configuration Reference |
-2 Configuration Reference |
Site Search | Site Search |
Quality | 💚 Good | 💚 Good | 💚 Good | 💚 Good | 💚 Good |
Change | ⚪️ Neutral | ⚪️ Neutral | ⚪️ No change | ⚪️ No change |
Impact: Neutral — no significant change (reference page second would be nice, but hard to expect the search index to distinguish between guides and reference without assistance)
Assessment
Summary table
Aggregate counts over the 10 terms analyzed above. Row best values highlighted in bold.
Subjective result quality
Metric | 1.0 | 1.3, termFrequency: 1 |
1.3, termFrequency: 0.5 |
1.3, termFrequency: 0 |
1.3, git-scm.com config |
---|---|---|---|---|---|
good | 9 | 7 | 7 | 8 | 9 |
ok | 0 | 1 | 1 | 1 | 0 |
bad | 1 | 2 | 2 | 1 | 1 |
Subjective quality change compared to v1.0
Metric | 1.3, termFrequency: 1 |
1.3, termFrequency: 0.5 |
1.3, termFrequency: 0 |
1.3, git-scm.com config |
---|---|---|---|---|
improved | 0 | 0 | 2 | 4 |
worse | 5 | 5 | 3 | 2 |
net | -5 | -5 | -1 | +2 |
Commentary
I find it a bit hard to assess what this data means both for us and our docs results, and more broadly how it is likely to impact our users.
From one perspective results are largely unchanged for many of these queries, and while they may be subtly different, we may not have complained about these results if they had been presented to us out of context.
On the other hand, several of these queries (page
and sidebar
) seem to suffer from much worse results after the 1.3 update. In both those cases, only setting termFrequency
to 0
resulted in a more resonable result. However, there are other more marginal queries like component
and installation
where termFrequency: 0
seems to slightly worsen results. It’s a bit hard to know whether essentially disabling frequency weighting like this would be an extreme decision or not.
The configuration parameters used for git-scm.com do actually perform OK though. Tallying up the subjective assessment of each result, this config produces comparable quality for these specific queries. It also produced slightly better results more often than it produced slightly worse results.
Conclusion
I think it may be safe to use the git-scm.com configuration as a default starting point and upgrade to v1.3.
At the same time, I’d like to expose these configuration options to users so that people are free to adjust these weightings to fit their content.
Lunaria Status Overview🌕 This pull request will trigger status changes. Learn moreBy default, every PR changing files present in the Lunaria configuration's You can change this by adding one of the keywords present in the Tracked Files
Warnings reference
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great 🌟
Did a diff of the JS files between prod and this PR to see what changed exactly, and most large additions are related to the ability to configure ranking so directly in relation to the PR and also for highlighting support through URL parameters.
Some thoughts:
-
As I already had to do that in the past and had to resort to an override re-implementing the whole thing, I could see a world where users would like to customize
highlightParam
. This would enable users to implement their own highlighting or use the provided script for convenience. Here is what it looks like with this parameter configured and the provided highlighting script where I search forcomponent
and clicked a result:Definitely not a blocker, and could even be a follow-up PR but wanted to mention it.
-
I guess with the logging changes, users may see a lot more warning, e.g. when building the Starlight Docs:
Note: Pagefind doesn't support stemming for the language ko. Search will still work, but will not match across root words. Note: Pagefind doesn't support stemming for the language uk. Search will still work, but will not match across root words. Note: Pagefind doesn't support stemming for the language ja. Search will still work, but will not match across root words. Note: Pagefind doesn't support stemming for the language zh-cn. Search will still work, but will not match across root words. Note: Pagefind doesn't support stemming for the language ko. Search will still work, but will not match across root words. Note: Pagefind doesn't support stemming for the language uk. Search will still work, but will not match across root words. Note: Pagefind doesn't support stemming for the language ja. Search will still work, but will not match across root words. Note: Pagefind doesn't support stemming for the language zh-cn. Search will still work, but will not match across root words.
Should we consider adding a mention in the docs about this somehow to mention not all features are supported for all languages, e.g. the "Site Search" guide?
Co-authored-by: HiDeoo <[email protected]>
I also just finished looking at the Pagefind UI changes in versions since 1.0.3 (the version currently in use in the Starlight monorepo) to see what is included in the ~4.5 KB (gzipped) bundle size increase. The main component of this increase seemed me to be the expansion of language support as far as I could tell: Māori, Croatian, Hungarian, Bengali, Vietnamese, Polish, and Danish support was added in v1.0.4; Ukrainian, Romanian, Czech, and Korean added in v1.1.0; Swahili in v1.1.1; and Arabic, Farsi, and Hebrew in v1.2.0. 15 languages with unique strings that won’t compress very well. Kind of a shame in our context where we would have the ability in theory to only ship the current language’s strings, but definitely not worth blocking an update over.
Ah right. This is not strictly speaking our change, it’s just the change in Pagefind 1.3.0 — that warning is logged with a I think maybe it’s OK on balance and not too likely to impact people as docs in many languages are still reasonably rare, so you probably only hit one of these if any? Not quite sure why the logging is doubled though? Would be good to only log that once per language.
Do you think it might make sense to make this a default behaviour with built-in styling in Starlight? I don’t see many drawbacks and seems like a helpful thing to just enable by default and then see if people need to disable it? But then that would probably be best done in a separate PR. |
Interesting, I mainly focused on the changes in JS loaded by the browser by diffing loaded code. I assumed language support would not have a significant impact on this considering that the
I guess a potential drawback would be URL pollution that some users might not like. Definitely in a follow-up PR as if we decide to go with this by default, I guess we may want to do some research first. |
Ah yeah, me too, but I only looked at the core UI js file, forgot about other stuff. AFAICT all language strings for the default UI (like “Search”, “Load more...”, “5 results” etc.) are included in that bundle. Although now you mention it, it would be an interesting option if Pagefind could load those dynamically for the current language like they do for other data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the follow-up discussion and potential ideas for future improvements (both on the Starlight side and maybe on the Pagefind side (e.g. UI string)), I'm personally happy with the PR.
Amazing work and glad to finally see this shipping after all the effort put into the research and making sure the default settings are good 🎉 👏 🚀
Description
To-do