-
Notifications
You must be signed in to change notification settings - Fork 8
Add Typesense #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Typesense #44
Conversation
ca70475
to
859dd13
Compare
Thank you @ruslandoga! I am currently on holidays but I will try to carve some time sooner than later to give you feedback. /cc @wojtekmach |
Erlang uses ExDoc now, so they have the exact same structure. However, we will need to either poll them or ask them to ping us once they publish a new version or ask them to push their docs to Hexdocs! I'd say we can postpone this to a follow up pull request. On the other hand, I believe Gleam does not have the data in the format we need (we created our |
@ruslandoga I'll try to review this sooner than later but I have a lot on my plate until October 15th, after that it's gonna be my number one priority to see this through. Sorry about that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very exiting! I left some comments below.
ce7af3b
to
e18e317
Compare
👋 I think it's ready for review now :) Updates since the last review:
|
search_data_js = | ||
Enum.find_value(files, fn {path, content} -> | ||
case Path.basename(path) do | ||
"search_data-" <> _digest -> content |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot trust this data since it's user provided, can they do anything dangerous by providing something we don't expect? Maybe we should do some rudimentary validation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They can provide long strings like https://github.com/cloudpods-dev/docker-engine-api-elixir/blob/813cc557da483f623a8f484db04efc7e58db0376/lib/docker_engine_api/api/container.ex#L67, but Typesense seems to handle it fine. We can check for content size, maybe. I think if Typesense doesn't like the payload, it would simply reject it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test that checks that invalid fields in search items (like type
being a map instead of a string, or doc
being a list) are rejected: 8d58e4f
Co-authored-by: Eric Meadows-Jönsson <[email protected]>
5d9f67d
to
bfc0539
Compare
config/runtime.exs
Outdated
@@ -5,6 +5,9 @@ if config_env() == :prod do | |||
port: System.fetch_env!("HEXDOCS_PORT"), | |||
hexpm_url: System.fetch_env!("HEXDOCS_HEXPM_URL"), | |||
hexpm_secret: System.fetch_env!("HEXDOCS_HEXPM_SECRET"), | |||
typesense_url: System.fetch_env!("TYPESENSE_URL"), | |||
typesense_api_key: System.fetch_env!("TYPESENSE_API_KEY"), | |||
typesense_collection: System.fetch_env!("TYPESENSE_COLLECTION"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created hexdocs-test
collection on Typesense Cloud, https://cloud.typesense.org/clusters/ent97o5sv4dzx2f0p/collections
I think we can use it during alpha/beta testing.
config/runtime.exs
Outdated
@@ -5,6 +5,9 @@ if config_env() == :prod do | |||
port: System.fetch_env!("HEXDOCS_PORT"), | |||
hexpm_url: System.fetch_env!("HEXDOCS_HEXPM_URL"), | |||
hexpm_secret: System.fetch_env!("HEXDOCS_HEXPM_SECRET"), | |||
typesense_url: System.fetch_env!("TYPESENSE_URL"), | |||
typesense_api_key: System.fetch_env!("TYPESENSE_API_KEY"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be the "Admin API Key" for https://cloud.typesense.org/clusters/ent97o5sv4dzx2f0p cluster, it can be downloaded from the dashboard.
config/runtime.exs
Outdated
@@ -5,6 +5,9 @@ if config_env() == :prod do | |||
port: System.fetch_env!("HEXDOCS_PORT"), | |||
hexpm_url: System.fetch_env!("HEXDOCS_HEXPM_URL"), | |||
hexpm_secret: System.fetch_env!("HEXDOCS_HEXPM_SECRET"), | |||
typesense_url: System.fetch_env!("TYPESENSE_URL"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this would be https://ent97o5sv4dzx2f0p.a1.typesense.net
👋 Just wanted to check if there’s anything else we'd need to address before it's merged? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. We plan to test this out in our staging server before merging. Thank you for all the work!
4c553de
to
a4ba5c3
Compare
This reverts commit 26169bb.
@ruslandoga this is now deployed to https://staging.hexdocs.pm and I have published a test package and it was correctly indexed into hexdocs-test collection. Is this ready to publish to prod? What are the next steps? Thank you for all of your work on this and apologies for delays. |
Yes, I think it's ready for prod! I think the next step would be integrating global search into ex_doc. I can open a PR! |
We can create a new collection if needed or we can continue using I think it's possible to clone / fork collections in Typesense cloud, and if not, it's pretty easy to move the data around, since I guess by the nature of Hexdocs, the indexed data is immutable. So it should be OK either way. |
Got it, excellent! Could you create hexdocs-staging and hexdocs-prod collections for us? We tend to follow that particular convention for external services. Should we backfill some data? Latest versions of all packages? All versions of all packages? We can also do nothing for now and make a decision when all pieces are ready. cc @josevalim |
I've created Regarding backfilling, it definitely helps during development but I tend to use a local Typesense instance. It's a bit outdated by now (e.g. missing proglang), but here're some docs I used before: https://hexdocs-artifacts.s3.eu-central-003.backblazeb2.com/docs_from_tarballs_all_versions.jsonl.zst (it's 190MB compressed, 4.4G uncompressed) $ curl https://hexdocs-artifacts.s3.eu-central-003.backblazeb2.com/docs_from_tarballs_all_versions.jsonl.zst -O
$ zstd docs_from_tarballs_all_versions.jsonl.zst -d
# add proglang=elixir to all entries
$ jq '. + {proglang: "elixir"}' docs_from_tarballs_all_versions.jsonl > docs.jsonl
$ docker compose up typesense -d
# https://typesense.org/docs/27.1/api/collections.html#with-pre-defined-schema
$ curl "http://localhost:8108/collections" \
-X POST \
-H "Content-Type: application/json" \
-H "X-TYPESENSE-API-KEY: hexdocs" \
-d '{"fields": [
{"facet": true, "name": "proglang", "type": "string"},
{"facet": true, "name": "type", "type": "string"},
{"name": "title", "type": "string"},
{"name": "doc", "type": "string"},
{"facet": true, "name": "package", "type": "string"}
],
"name": "hexdocs-local",
"token_separators": [".", "_", "-", " ", ":", "@", "/"]
}'
# https://typesense.org/docs/27.1/api/documents.html#import-a-jsonl-file
$ curl "http://localhost:8108/collections/hexdocs-local/documents/import?action=create" \
-X POST \
-T docs.jsonl \
-H "X-TYPESENSE-API-KEY: hexdocs"
# sanity check
$ curl -H "X-TYPESENSE-API-KEY: hexdocs" "http://localhost:8108/collections/hexdocs-local/documents/777777" |
@ericmj @wojtekmach @ruslandoga let's definitely backfill. We only support recent ExDoc versions anyway, which will act as a filter. A couple things to figure out:
Overall, my next step suggestion is to build a home page for searching within a given set of packages. This thread on ElixirForum has a good example: https://elixirforum.com/t/hexdocs-search-engine-for-us-devs/46814/1 We could use it at least in two places:
For ExDoc, we would need to work on the related packages feature: elixir-lang/ex_doc#1811 My idea is that we would be able to store this information as a .json file as well. So |
This one looks good to merge to me. We can continue in the other issues. |
Sounds good, I'll finish infrastructure setup and deploy this to prod soon. |
This is now running on prod, so far so good! |
Amazing work @ruslandoga ! |
This PR integrates Typesense into Hexdocs.
TODOs:
Logging excessively for now
Indexing in the same step for now
proglang
The format is the same: Add Typesense #44 (comment)
Commit reverted, approach needs discussion with Gleam team
CI results: ruslandoga#1