Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Embedding Database for MSFT Container Images Repo #22

Open
nkkko opened this issue Sep 26, 2024 · 2 comments
Open

Create Embedding Database for MSFT Container Images Repo #22

nkkko opened this issue Sep 26, 2024 · 2 comments

Comments

@nkkko
Copy link
Member

nkkko commented Sep 26, 2024

Is your feature request related to a problem? Please describe.
The current tool can benefit from having a pre-built embedding database of devcontainer.json files from the Microsoft VSCode Dev Containers repository. This can help improve context accuracy and speed up the generation process by using predefined embeddings.

Describe the solution you'd like

  • Crawl the Microsoft VSCode Dev Containers repository at https://github.com/microsoft/vscode-dev-containers/tree/main/containers.
  • Extract the content from the README.md files and the devcontainer.json files within each sub-folder.
  • Use the devcontainer schema to guide the extraction process.
  • Create embeddings for this content and store them in the SQLite database used by the tool.

Describe alternatives you've considered

  • Continuously fetching content from the MSFT repository on-demand, but this would be less efficient.
  • Not using predefined embeddings, which would slow down the generation process and reduce context accuracy for similar projects.

Additional context

  • Use the devcontainer.json schema to ensure accurate extraction and structuring of the content.
  • Ensure that the strategy for picking the best container image is efficient, leveraging the README.md content to infer the most appropriate image.

Steps:

  1. Fetch and parse the README.md and devcontainer.json files from the specified Microsoft repository.
  2. Generate embeddings for the content using the configured embedding model.
  3. Store the resulting embeddings in the existing SQLite database inside the data/ directory.
  4. Update the main.py logic to utilize these pre-built embeddings, improving the efficiency and accuracy of the devcontainer.json generation process.
@nkkko
Copy link
Member Author

nkkko commented Oct 9, 2024

/bounty $50

Copy link

algora-pbc bot commented Oct 9, 2024

💎 $50 bounty • Daytona

Steps to solve:

  1. Start working: Comment /attempt #22 with your implementation plan
  2. Submit work: Create a pull request including /claim #22 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

If no one is assigned to the issue, feel free to tackle it, without confirmation from us, after registering your attempt. In the event that multiple PRs are made from different people, we will generally accept those with the cleanest code.

Please respect others by working on PRs that you are allowed to submit attempts to.

e.g. If you reached the limit of active attempts, please wait for the ability to do so before submitting a new PR.

If you can not submit an attempt, you will not receive your payout.

Thank you for contributing to daytonaio/devcontainer-generator!

Add a bountyShare on socials

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant