Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gpt synthetic dataset generator and update poetry #198

Merged
merged 4 commits into from
Dec 28, 2023

Conversation

qidanrui
Copy link
Collaborator

Firstly, install dbgpt_hub with pip install dbgpt-hub

Secondly, put Spider dataset under the dbgpt_hub/data folder
Screen Shot 2023-12-28 at 4 51 20 PM

Then, run the following code:

from dbgpt_hub.data_generator import generate_dataset_with_gpt

args = {
            "model": "gpt-3.5-turbo-16k", 
            "prompt": COT_PROMPT,
            "num_text2sql_pair_each_db": 1,
            "table_file_path": os.path.join(ROOT_PATH, "dbgpt_hub/data/spider/tables.json"),
            "output_path": os.path.join(ROOT_PATH, "dbgpt_hub/data/spider/synthetic_data_with_gpt.json")
}

generate_dataset_with_gpt(args)

You can also run the generation code directly without any arguments:

from dbgpt_hub.data_generator import generate_dataset_with_gpt
generate_dataset_with_gpt()

Finally, we can generate synthetic dataset in the following figure:
Screen Shot 2023-12-28 at 4 52 45 PM

Copy link
Member

@wangzaistone wangzaistone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wangzaistone wangzaistone merged commit d6a8bb9 into eosphoros-ai:main Dec 28, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants