Skip to content

(EAI-428) versioned docs #699

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Conversation

yakubova92
Copy link
Collaborator

@yakubova92 yakubova92 commented May 5, 2025

Jira: https://jira.mongodb.org/browse/EAI-428

Changes

  • ingest multiple versions
  • query multiple versions, defaulting to current version

Notes

  • This is a feature branch. All code has been reviewed in PRs merged into this branch
  • Added DO NOT MERGE flag to this until EAI-1001 can be added to this PR, and the code example dataset filter can replace regex on 'snooty' in the sourcename with a content type match
  • When promoting to staging and prod, update indexes manually

* remove snooty prefix

* ingesting pages for all branches on each data source

* do not ingest (and delete if already exists) pages on inactive branches

* handle current version override

* cleanup unused code from previous version override implementation, tests

* update SnootyDataSource tests

* remove override for docs current version
* nearest neighbor search accepts filters, defaults to current version

* parse filters to mdb query
* get versions of a data source

* get versions for multiple data sources
* exclude old versions from dataset

* add test case

* fix other tests
@yakubova92 yakubova92 requested review from mongodben and nlarew May 6, 2025 19:18
@yakubova92 yakubova92 changed the title (EAI-428) feature versioned docs (EAI-428) versioned docs May 6, 2025
Comment on lines +31 to +34
$or: [
{ "metadata.version.isCurrent": { $exists: false } },
{ "metadata.version.isCurrent": true },
],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to do the same type of thing for the code example dataset? https://huggingface.co/datasets/mongodb-eai/code-examples

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do need to add that to the filter for the code example dataset, thanks for catching. Code examples will be complicated by the fact that we're filtering for source names that contain snooty or devcenter, and we removed the snooty prefix - https://github.com/mongodb/chatbot/blob/EAI-428-feature-versioned-docs/packages/datasets/src/mongoDbDatasetConstants.ts#L19-L22

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best solution is to filter by content type field, which will be done in EAI-1001. Adding a DO NOT MERGE flag to this until I can add that ticket to this PR, and update the filter for the code example dataset.

Copy link
Collaborator

@mongodben mongodben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. just few small things to fix before merge

@yakubova92 yakubova92 added the DO NOT MERGE Not yet ready for merge label May 7, 2025
yakubova92 added 3 commits May 9, 2025 14:39
* add sourceType to pages and embedded_content and ability to filter by it

* test case
@yakubova92 yakubova92 removed the DO NOT MERGE Not yet ready for merge label May 9, 2025
@yakubova92 yakubova92 requested a review from mongodben May 9, 2025 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants