Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AI-GCS: Create AI-generated code test dataset. Reuse existing dataset where relevant #7

Open
1 task
pombredanne opened this issue Dec 19, 2024 · 1 comment
Assignees

Comments

@pombredanne
Copy link
Member

pombredanne commented Dec 19, 2024

For proper evaluation, we need to create a reusable test data set mixing AI-generated code examples with more traditional code snippets code examples.

The data set should be tagged so that we know the origin of the code snippets -- when possible -- as well as the origin tools used this data set to create and the people or group.

It should be designed for large scale evaluation of the search algos.

Eventually we should publish:

  • An open data dataset with its basic documentation, and published on Zenodo.

See some links:

@JonoYang
Copy link
Member

JonoYang commented Jan 3, 2025

We have started a repo to store and run extended matchcode tests at https://github.com/aboutcode-org/matchcode-tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

2 participants