Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track subsets in larger dataset #173

Open
ccstan99 opened this issue Aug 30, 2023 · 1 comment
Open

Track subsets in larger dataset #173

ccstan99 opened this issue Aug 30, 2023 · 1 comment

Comments

@ccstan99
Copy link
Collaborator

We have consolidated lots of smaller subsets into larger, logically grouped subsets like blogs. However, it'd still be nice pull sources from a smaller subset that can be used in with pinecone metadata. Consider adding a column in MySQL 'domain' based on the 'url' to easily find smaller subsets.

@mruwnik
Copy link
Collaborator

mruwnik commented Sep 3, 2023

Will this be:

  • the domain of the url that is displayed to the user
  • the domain of the source url (if provided)
  • the domain of the place where the article was first found (e.g. from the alignment newsletter)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

2 participants