Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Pagefind to 1.3.0 and configure Pagefind logging levels #2728

Merged
merged 15 commits into from
Jan 13, 2025

Conversation

delucis
Copy link
Member

@delucis delucis commented Dec 18, 2024

Description

  • This PR updates Starlight’s Pagefind dependency to 1.3.0.
  • Thanks to Configurable logging levels CloudCannon/pagefind#745 we can now forward Astro’s logging level to Pagefind to reduce (or increase) log verbosity in line with user preference.
  • This PR tweaks Pagefind’s default ranking options to preserve result quality, based on testing with Starlight’s own docs site. We ended up going with the config developed for git-scm.com.
  • It also exposes Pagefind’s ranking configuration for users who want to tweak it.
To-do
  • Consider the bundle size changes

Copy link

changeset-bot bot commented Dec 18, 2024

🦋 Changeset detected

Latest commit: 8f5c689

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@astrojs/starlight Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions github-actions bot added the 🌟 core Changes to Starlight’s main package label Dec 18, 2024
Copy link

netlify bot commented Dec 18, 2024

Deploy Preview for astro-starlight ready!

Name Link
🔨 Latest commit 8f5c689
🔍 Latest deploy log https://app.netlify.com/sites/astro-starlight/deploys/6784ec0cde43a6000744d6c3
😎 Deploy Preview https://deploy-preview-2728--astro-starlight.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
Lighthouse
Lighthouse
1 paths audited
Performance: 100 (no change from production)
Accessibility: 100 (no change from production)
Best Practices: 100 (no change from production)
SEO: 100 (no change from production)
PWA: -
View the detailed breakdown and full score reports

To edit notification comments on pull requests, go to your Netlify site configuration.

@astrobot-houston
Copy link
Collaborator

astrobot-houston commented Dec 18, 2024

size-limit report 📦

Path Size
/index.html 6.93 KB (-0.02% 🔽)
/_astro/*.js 25.76 KB (+21.55% 🔺)
/_astro/*.css 13.75 KB (-0.01% 🔽)

@delucis
Copy link
Member Author

delucis commented Dec 18, 2024

I’ve repeated @HiDeoo’s exercise from #1750 of comparing results for a selection of search terms in the Starlight docs.

For results that changed rank, I’ve added the difference in ranking compared to Pagefind v1 to help highlight what changed. Bolded entries marked with 🟢 are results that I’ve subjectively decided are “good” or “best” results, to help track those across columns. I’ve also added a subjective “impact” assessment for each query to summarize my impressions.

The git-scm.com config columns refer to Pagefind’s options as used on git-scm.com:

pageLength:     0.1,  // boost longer pages
termFrequency:  0.1,  // do not favor short pages
termSaturation: 2,    // look for pages with more matches
termSimilarity: 9,    // prefer exact matches

For reference, the Pagefind defaults are:

pageLength:     0.75, // range: 0 – 1
termFrequency:  1,    // range: 0 – 1
termSaturation: 1.4,  // range: 0 – 2
termSimilarity: 1,    // range: 0 – n

Query data

Query: setup

Rank 1.0 1.3, termFrequency: 1 1.3, termFrequency: 0.5 1.3, termFrequency: 0 1.3, git-scm.com config
1 🟢 Manual Setup +1 Plugins Reference +1 Plugins Reference +1 Plugins Reference +1 Plugins Reference
2 Plugins Reference -1 🟢 Manual Setup -1 🟢 Manual Setup -1 🟢 Manual Setup -1 🟢 Manual Setup
3 Using components Using components Using components +2 Customizing Starlight +2 Customizing Starlight
4 🟢 Getting Started 🟢 Getting Started 🟢 Getting Started -1 Using components -1 Using components
5 Customizing Starlight Customizing Starlight Customizing Starlight -1 🟢 Getting Started -1 🟢 Getting Started
Quality 💚 Good 💚 Good 💚 Good 💚 Good 💚 Good
Change 🔻 Slightly worse 🔻 Slightly worse 🔻 Slightly worse 🔻 Slightly worse

Impact: 👎 Negative — slightly worse than before across the board, although not terribly

Query: installation

Rank 1.0 1.3, termFrequency: 1 1.3, termFrequency: 0.5 1.3, termFrequency: 0 1.3, git-scm.com config
1 Site Search +1 Authoring Content in Markdown +1 Authoring Content in Markdown +1 Authoring Content in Markdown +1 Authoring Content in Markdown
2 Authoring Content in Markdown -1 Site Search -1 Site Search +1 Customizing Starlight +1 Customizing Starlight
3 Customizing Starlight Customizing Starlight Customizing Starlight -2 Site Search -2 Site Search
4 CSS & Styling CSS & Styling CSS & Styling CSS & Styling CSS & Styling
5 🟢 Manual Setup 🟢 Manual Setup 🟢 Manual Setup +1 Configuration Reference +1 Configuration Reference
6 -1 🟢 Manual Setup
Quality 🛑 Bad 🛑 Bad 🛑 Bad 🛑 Bad 🛑 Bad
Change ⚪️ No change ⚪️ No change 🔻 Worse 🔻 Worse

Impact: Neutral — we don’t have clear content for this query, closest being “Manual Setup”. So although some of these perform slightly worse, they are all pretty bad.

Query: page

Rank 1.0 1.3, termFrequency: 1 1.3, termFrequency: 0.5 1.3, termFrequency: 0 1.3, git-scm.com config
1 🟢 Pages +15 Internationalization (i18n) +15 Internationalization (i18n) +5 Overrides Reference 🟢 Pages
2 Site Search +4 Overrides Reference +4 Overrides Reference +6 Frontmatter Reference +4 Overrides Reference
3 Overriding Components -1 Site Search -1 Site Search -2 🟢 Pages +5 Frontmatter Reference
4 Customizing Starlight +4 Frontmatter Reference +4 Frontmatter Reference +7 Configuration Reference Customizing Starlight
5 Eco-friendly docs +6 Configuration Reference +6 Configuration Reference +11 Internationalization (i18n) +6 Configuration Reference
6
7
8 🟢 Pages 🟢 Pages
Quality 💚 Good 🛑 Bad 🛑 Bad 🟡 OK 💚 Good
Change 🔻 Worse 🔻 Worse 🔻 Worse ⬆️ Better

Impact: Mixed — wildly different results for basic config, which in the worst case removes the “Pages” result entirely. Only git-scm.com config works well.

Query: markdown

Rank 1.0 1.3, termFrequency: 1 1.3, termFrequency: 0.5 1.3, termFrequency: 0 1.3, git-scm.com config
1 🟢 Authoring Content in Markdown +9 Overrides Reference +9 Overrides Reference 🟢 Authoring Content in Markdown 🟢 Authoring Content in Markdown
2 Link Cards -1 🟢 Authoring Content in Markdown -1 🟢 Authoring Content in Markdown +8 Overrides Reference +3 Pages
3 Steps -1 Link Cards -1 Link Cards +2 Pages -1 Link Cards
4 Card Grids +1 Pages +1 Pages Card Grids +5 Overrides Reference
5 Pages -1 Card Grids -1 Card Grids -4 Link Cards -1 Card Grids
Quality 💚 Good 💚 Good 💚 Good 💚 Good 💚 Good
Change 🔻 Worse 🔻 Worse ⚪️ Neutral ⬆️ Better

Impact: Neutral — downranks the obvious result with simple configs, but does OK in others.

Query: component

Rank 1.0 1.3, termFrequency: 1 1.3, termFrequency: 0.5 1.3, termFrequency: 0 1.3, git-scm.com config
1 🟢 Using components 🟢 Using components 🟢 Using components +2 Overrides Reference +2 Overrides Reference
2 🟢 Overriding Components +19 Pages +19 Pages -1 🟢 Using components -1 🟢 Using components
3 Overrides Reference Overrides Reference Overrides Reference -1 🟢 Overriding Components -1 🟢 Overriding Components
4 Eco-friendly docs -2 🟢 Overriding Components -2 🟢 Overriding Components +2 File Tree +2 File Tree
5 Code +1 File Tree +1 File Tree +3 Card Grids +3 Card Grids
Quality 💚 Good 💚 Good 💚 Good 💚 Good 💚 Good
Change ⚪️ Neutral ⚪️ Neutral ⚪️ Neutral ⚪️ Neutral

Impact: 👍 Positive — surfaces the same key content, plus adds “Pages” which does include page component docs

Query: css

Rank 1.0 1.3, termFrequency: 1 1.3, termFrequency: 0.5 1.3, termFrequency: 0 1.3, git-scm.com config
1 🟢 CSS & Styling 🟢 CSS & Styling 🟢 CSS & Styling 🟢 CSS & Styling 🟢 CSS & Styling
2 Icons Icons Icons +2 Customizing Starlight +2 Customizing Starlight
3 Starlight Showcase +1 Customizing Starlight +1 Customizing Starlight +2 Configuration Reference +2 Configuration Reference
4 Customizing Starlight -1 Starlight Showcase -1 Starlight Showcase -2 Icons -2 Icons
5 Configuration Reference Configuration Reference Configuration Reference -2 Starlight Showcase -2 Starlight Showcase
Quality 💚 Good 💚 Good 💚 Good 💚 Good 💚 Good
Change ⚪️ Neutral ⚪️ Neutral ⬆️ Better ⬆️ Better

Impact: Neutral — no significant changes for the most part, slightly in the last two columns as customization & configuration pages are probably more relevant here.

Query: language

Rank 1.0 1.3, termFrequency: 1 1.3, termFrequency: 0.5 1.3, termFrequency: 0 1.3, git-scm.com config
1 🟢 Internationalization (i18n) +4 Overrides Reference +4 Overrides Reference 🟢 Internationalization (i18n) 🟢 Internationalization (i18n)
2 Link Cards -1 🟢 Internationalization (i18n) -1 🟢 Internationalization (i18n) +2 Configuration Reference +2 Configuration Reference
3 Make your docs shine with Starlight -1 Link Cards +1 Configuration Reference +2 Overrides Reference +4 Authoring Content in Markdown
4 Configuration Reference Configuration Reference -2 Link Cards +3 Authoring Content in Markdown +1 Overrides Reference
5 Overrides Reference +1 Pages +1 Pages +3 Sidebar Navigation +3 Sidebar Navigation
Quality 💚 Good 💚 Good 💚 Good 💚 Good 💚 Good
Change 🔻 Slightly worse 🔻 Slightly worse ⬆️ Better ⬆️ Better

Impact: Neutral — over promotes “Overrides Reference” in first two variations, but does better selecting secondary results in last two columns.

Query: sidebar

Rank 1.0 1.3, termFrequency: 1 1.3, termFrequency: 0.5 1.3, termFrequency: 0 1.3, git-scm.com config
1 🟢 Sidebar Navigation +4 Configuration Reference +4 Configuration Reference 🟢 Sidebar Navigation 🟢 Sidebar Navigation
2 Overrides Reference Overrides Reference Overrides Reference Overrides Reference Overrides Reference
3 Pages Pages Pages +1 Frontmatter Reference +1 Frontmatter Reference
4 Frontmatter Reference -3 🟢 Sidebar Navigation -3 🟢 Sidebar Navigation +1 Configuration Reference -1 Pages
5 Configuration Reference -1 Frontmatter Reference -1 Frontmatter Reference -2 Pages Configuration Reference
Quality 💚 Good 🟡 OK 🟡 OK 💚 Good 💚 Good
Change 🔻 Worse 🔻 Worse ⚪️ Neutral ⚪️ Neutral

Impact: Mixed — basic configs do OK but still worse than Pagefind 1.0, fixed by later variations

Query: lastUpdated

Impact: Neutral — no change
Rank 1.0 1.3, termFrequency: 1 1.3, termFrequency: 0.5 1.3, termFrequency: 0 1.3, git-scm.com config
1 Overrides Reference Overrides Reference Overrides Reference Overrides Reference Overrides Reference
2 Frontmatter Reference Frontmatter Reference Frontmatter Reference Frontmatter Reference Frontmatter Reference
3 Configuration Reference Configuration Reference Configuration Reference Configuration Reference Configuration Reference
Quality 💚 Good 💚 Good 💚 Good 💚 Good 💚 Good
Change ⚪️ No change ⚪️ No change ⚪️ No change ⚪️ No change

Query: plugin

Rank 1.0 1.3, termFrequency: 1 1.3, termFrequency: 0.5 1.3, termFrequency: 0 1.3, git-scm.com config
1 🟢 Plugins and Integrations +1 🟢 Plugins Reference +1 🟢 Plugins Reference +1 🟢 Plugins Reference +1 🟢 Plugins Reference
2 🟢 Plugins Reference -1 🟢 Plugins and Integrations -1 🟢 Plugins and Integrations -1 🟢 Plugins and Integrations -1 🟢 Plugins and Integrations
3 Configuration Reference +1 CSS & Styling +1CSS & Styling Configuration Reference Configuration Reference
4 CSS & Styling +1 Site Search +1 Site Search CSS & Styling CSS & Styling
4 Site Search -2 Configuration Reference -2 Configuration Reference Site Search Site Search
Quality 💚 Good 💚 Good 💚 Good 💚 Good 💚 Good
Change ⚪️ Neutral ⚪️ Neutral ⚪️ No change ⚪️ No change

Impact: Neutral — no significant change (reference page second would be nice, but hard to expect the search index to distinguish between guides and reference without assistance)

Assessment

Summary table

Aggregate counts over the 10 terms analyzed above. Row best values highlighted in bold.

Subjective result quality

Metric 1.0 1.3, termFrequency: 1 1.3, termFrequency: 0.5 1.3, termFrequency: 0 1.3, git-scm.com config
good 9 7 7 8 9
ok 0 1 1 1 0
bad 1 2 2 1 1

Subjective quality change compared to v1.0

Metric 1.3, termFrequency: 1 1.3, termFrequency: 0.5 1.3, termFrequency: 0 1.3, git-scm.com config
improved 0 0 2 4
worse 5 5 3 2
net -5 -5 -1 +2

Commentary

I find it a bit hard to assess what this data means both for us and our docs results, and more broadly how it is likely to impact our users.

From one perspective results are largely unchanged for many of these queries, and while they may be subtly different, we may not have complained about these results if they had been presented to us out of context.

On the other hand, several of these queries (page and sidebar) seem to suffer from much worse results after the 1.3 update. In both those cases, only setting termFrequency to 0 resulted in a more resonable result. However, there are other more marginal queries like component and installation where termFrequency: 0 seems to slightly worsen results. It’s a bit hard to know whether essentially disabling frequency weighting like this would be an extreme decision or not.

The configuration parameters used for git-scm.com do actually perform OK though. Tallying up the subjective assessment of each result, this config produces comparable quality for these specific queries. It also produced slightly better results more often than it produced slightly worse results.

Conclusion

I think it may be safe to use the git-scm.com configuration as a default starting point and upgrade to v1.3.

At the same time, I’d like to expose these configuration options to users so that people are free to adjust these weightings to fit their content.

@github-actions github-actions bot added the 📚 docs Documentation website changes label Jan 8, 2025
@astrobot-houston
Copy link
Collaborator

astrobot-houston commented Jan 8, 2025

Lunaria Status Overview

🌕 This pull request will trigger status changes.

Learn more

By default, every PR changing files present in the Lunaria configuration's files property will be considered and trigger status changes accordingly.

You can change this by adding one of the keywords present in the ignoreKeywords property in your Lunaria configuration file in the PR's title (ignoring all files) or by including a tracker directive in the merged commit's description.

Tracked Files

Locale File Note
en reference/configuration.mdx Source changed, localizations will be marked as outdated.
Warnings reference
Icon Description
🔄️ The source for this localization has been updated since the creation of this pull request, make sure all changes in the source have been applied.

@delucis delucis marked this pull request as ready for review January 8, 2025 12:27
@delucis delucis added the 🌟 minor Change that triggers a minor release label Jan 8, 2025
Copy link
Member

@HiDeoo HiDeoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great 🌟

Did a diff of the JS files between prod and this PR to see what changed exactly, and most large additions are related to the ability to configure ranking so directly in relation to the PR and also for highlighting support through URL parameters.

Some thoughts:

  • As I already had to do that in the past and had to resort to an override re-implementing the whole thing, I could see a world where users would like to customize highlightParam. This would enable users to implement their own highlighting or use the provided script for convenience. Here is what it looks like with this parameter configured and the provided highlighting script where I search for component and clicked a result:

    image

    Definitely not a blocker, and could even be a follow-up PR but wanted to mention it.

  • I guess with the logging changes, users may see a lot more warning, e.g. when building the Starlight Docs:

    Note: Pagefind doesn't support stemming for the language ko.
    Search will still work, but will not match across root words.
    Note: Pagefind doesn't support stemming for the language uk.
    Search will still work, but will not match across root words.
    Note: Pagefind doesn't support stemming for the language ja.
    Search will still work, but will not match across root words.
    Note: Pagefind doesn't support stemming for the language zh-cn.
    Search will still work, but will not match across root words.
    Note: Pagefind doesn't support stemming for the language ko.
    Search will still work, but will not match across root words.
    Note: Pagefind doesn't support stemming for the language uk.
    Search will still work, but will not match across root words.
    Note: Pagefind doesn't support stemming for the language ja.
    Search will still work, but will not match across root words.
    Note: Pagefind doesn't support stemming for the language zh-cn.
    Search will still work, but will not match across root words.
    

    Should we consider adding a mention in the docs about this somehow to mention not all features are supported for all languages, e.g. the "Site Search" guide?

Co-authored-by: HiDeoo <[email protected]>
@delucis
Copy link
Member Author

delucis commented Jan 8, 2025

most large additions are related to the ability to configure ranking so directly in relation to the PR and also for highlighting support through URL parameters

I also just finished looking at the Pagefind UI changes in versions since 1.0.3 (the version currently in use in the Starlight monorepo) to see what is included in the ~4.5 KB (gzipped) bundle size increase.

The main component of this increase seemed me to be the expansion of language support as far as I could tell: Māori, Croatian, Hungarian, Bengali, Vietnamese, Polish, and Danish support was added in v1.0.4; Ukrainian, Romanian, Czech, and Korean added in v1.1.0; Swahili in v1.1.1; and Arabic, Farsi, and Hebrew in v1.2.0. 15 languages with unique strings that won’t compress very well. Kind of a shame in our context where we would have the ability in theory to only ship the current language’s strings, but definitely not worth blocking an update over.

with the logging changes, users may see a lot more warning

Ah right. This is not strictly speaking our change, it’s just the change in Pagefind 1.3.0 — that warning is logged with a v_warn() method which was changed from LogLevel::Verbose to LogLevel::Quiet in 1.3.0.

I think maybe it’s OK on balance and not too likely to impact people as docs in many languages are still reasonably rare, so you probably only hit one of these if any? Not quite sure why the logging is doubled though? Would be good to only log that once per language.

I could see a world where users would like to customize highlightParam

Do you think it might make sense to make this a default behaviour with built-in styling in Starlight? I don’t see many drawbacks and seems like a helpful thing to just enable by default and then see if people need to disable it? But then that would probably be best done in a separate PR.

@HiDeoo
Copy link
Member

HiDeoo commented Jan 8, 2025

The main component of this increase seemed me to be the expansion of language support as far as I could tell

Interesting, I mainly focused on the changes in JS loaded by the browser by diffing loaded code.

I assumed language support would not have a significant impact on this considering that the pf_meta file and WASM file are localized, e.g. by visiting the English version the browser loads the wasm.en.pagefind & pagefind.en_4391141799.pf_meta files while visiting the French version the browser loads the wasm.fr.pagefind & pagefind.fr_9e8fe13141.pf_meta files.

Do you think it might make sense to make this a default behaviour with built-in styling in Starlight?

I guess a potential drawback would be URL pollution that some users might not like. Definitely in a follow-up PR as if we decide to go with this by default, I guess we may want to do some research first.

@delucis
Copy link
Member Author

delucis commented Jan 8, 2025

I mainly focused on the changes in JS loaded by the browser by diffing loaded code.

Ah yeah, me too, but I only looked at the core UI js file, forgot about other stuff. AFAICT all language strings for the default UI (like “Search”, “Load more...”, “5 results” etc.) are included in that bundle. Although now you mention it, it would be an interesting option if Pagefind could load those dynamically for the current language like they do for other data.

Copy link
Member

@HiDeoo HiDeoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the follow-up discussion and potential ideas for future improvements (both on the Starlight side and maybe on the Pagefind side (e.g. UI string)), I'm personally happy with the PR.

Amazing work and glad to finally see this shipping after all the effort put into the research and making sure the default settings are good 🎉 👏 🚀

@delucis delucis added this to the v0.31 milestone Jan 9, 2025
@delucis delucis merged commit e187383 into main Jan 13, 2025
16 checks passed
@delucis delucis deleted the chris/pagefind-logging branch January 13, 2025 10:47
@astrobot-houston astrobot-houston mentioned this pull request Jan 13, 2025
trueberryless added a commit to trueberryless/withastro-starlight that referenced this pull request Feb 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🌟 core Changes to Starlight’s main package 📚 docs Documentation website changes 🌟 minor Change that triggers a minor release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants