AI-GCS: Create AI-generated code test dataset. Reuse existing dataset where relevant #7

pombredanne · 2024-12-19T18:31:49Z

For proper evaluation, we need to create a reusable test data set mixing AI-generated code examples with more traditional code snippets code examples.

The data set should be tagged so that we know the origin of the code snippets -- when possible -- as well as the origin tools used this data set to create and the people or group.

It should be designed for large scale evaluation of the search algos.

Eventually we should publish:

An open data dataset with its basic documentation, and published on Zenodo.

See some links:

JonoYang · 2025-01-03T00:01:32Z

We have started a repo to store and run extended matchcode tests at https://github.com/aboutcode-org/matchcode-tests

pombredanne added this to 06-AI-generated Code Search Oct 3, 2024

pombredanne converted this from a draft issue Dec 19, 2024

pombredanne moved this to In Progress in 06-AI-generated Code Search Dec 23, 2024

pombredanne assigned JonoYang Dec 23, 2024

pombredanne added this to the 3-Development and Datasets milestone Dec 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AI-GCS: Create AI-generated code test dataset. Reuse existing dataset where relevant #7

AI-GCS: Create AI-generated code test dataset. Reuse existing dataset where relevant #7

pombredanne commented Dec 19, 2024 •

edited

Loading

JonoYang commented Jan 3, 2025

AI-GCS: Create AI-generated code test dataset. Reuse existing dataset where relevant #7

AI-GCS: Create AI-generated code test dataset. Reuse existing dataset where relevant #7

Comments

pombredanne commented Dec 19, 2024 • edited Loading

JonoYang commented Jan 3, 2025

pombredanne commented Dec 19, 2024 •

edited

Loading