Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: improve searchibility with algolia #8325

Closed
wants to merge 3 commits into from

Conversation

ncclementi
Copy link
Contributor

@ncclementi ncclementi commented Feb 12, 2024

This is a work in progress, not working yet we have an issue with the records size.

I was trying to run the .github/workflows/upload-algolia.py locally and run into this kind of error

algoliasearch.exceptions.RequestException: Record at the position 459 objectID=reference/expression
tables.html#methods is too big size=224921/10000 bytes. Please have a look at https://www.algolia.com
doc/guides/sending-and-managing-data/prepare-your-data/in-depth/index-and-records-size-and
usage-limitations/#record-size-limits

We are generating massive objects that go into the search.json
For example the object generated by this section https://ibis-project.org/reference/expression-tables#methods is ~225KB (this one is entry 459) you can take a look at it here: https://ibis-project.org/search.json

Not even the paid planned would not allow this.
For Build plans:

  • 10KB for any individual record
    For Standard, Premium and Grow plans:
  • 100 KB for any individual record
  • 10 KB average record size across all records

I think a solution could be instead of using the search.json from quarto, try to see what kind of index does the algolia crawler generates. According to this docs it sounds like we could get it for free if we have a Netlify account, which I believe we have?

Sweet, there is already a crawler GHA https://github.com/algolia/algoliasearch-crawler-github-actions.

But I'm not sure if that's the right way to go about this. Opening this to discussion.

Closes: #7995

cc: @gforsyth

@ncclementi
Copy link
Contributor Author

I decided to start fiddling with the search.json and it's very tricky, because it's not only the records that have "Examples" on them, the ones that will generate big entries, we have other cases too.

We have ~68 cases where the objects are big, and the examples strip only takes care of ~15.
I think if we come up with very specific rules to take care of this, it won't be sustainable.

We (with @cpcloud and @gforsyth ) tried for a bit connecting the algolia crawler via the netlify interface and we couldn't get the crawler kicking.

The last thing we could try is handling everything via GHA, and see what happens. Try to follow this,
https://github.com/algolia/algoliasearch-crawler-github-actions/blob/main/examples/netlify.yml and see what happens

@ncclementi
Copy link
Contributor Author

Something got messed up on the rebase, I'll close and open a new PR.

@ncclementi ncclementi closed this Feb 21, 2024
@ncclementi ncclementi deleted the algolia branch February 21, 2024 16:04
@ncclementi ncclementi restored the algolia branch February 21, 2024 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

docs - improve search functionality
1 participant