Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update latest exp res #99

Merged
merged 1 commit into from
Oct 16, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,9 @@ DB-GPT-Hub is an experimental project utilizing LLMs (Large Language Models) to

So far, we have successfully integrated multiple large models and established a complete workflow, including data processing, model SFT (Supervised Fine-Tuning) training, prediction output, and evaluation. The code is readily reusable within this project.

As of October 10, 2023, by fine-tuning an open-source model of 13 billion parameters using this project, **the execution accuracy on the Spider evaluation dataset has surpassed that of GPT-4!**
As of October 10, 2023, by fine-tuning an open-source model of 13 billion parameters using this project, **the execution accuracy on the Spider evaluation dataset has surpassed that of GPT-4!**

Part of the experimental results have been compiled into the [document](docs/eval_llm_result.md) in this project. By utilizing this project and combining more related data, the execution accuracy on the Spider evaluation set has already reached **0.825**.

## 2. Fine-tuning Text-to-SQL

Expand Down Expand Up @@ -204,7 +206,7 @@ Run the following command:
```bash
python dbgpt_hub/eval/evaluation.py --plug_value --input Your_model_pred_file
```
You can find the results of our latest review [here](docs/eval_llm_result.md)
You can find the results of our latest review and part of experiment results [here](docs/eval_llm_result.md)

## 4. RoadMap

Expand Down
5 changes: 3 additions & 2 deletions README.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,8 @@

DB-GPT-Hub是一个利用LLMs实现Text-to-SQL解析的实验项目,主要包含数据集收集、数据预处理、模型选择与构建和微调权重等步骤,通过这一系列的处理可以在提高Text-to-SQL能力的同时降低模型训练成本,让更多的开发者参与到Text-to-SQL的准确度提升工作当中,最终实现基于数据库的自动问答能力,让用户可以通过自然语言描述完成复杂数据库的查询操作等工作。
目前我们已经基于多个大模型打通从数据处理、模型SFT训练、预测输出和评估的整个流程,**代码在本项目中均可以直接复用**。
截止20231010,我们利用本项目基于开源的13B大小的模型微调后,在Spider的评估集上的执行准确率,**已经超越GPT-4!**
截止20231010,我们利用本项目基于开源的13B大小的模型微调后,在Spider的评估集上的执行准确率,**已经超越GPT-4!**
部分实验结果已汇总到了本项目的相关[文档](docs/eval_llm_result.md) ,利用本项目结合更多相关数据在Spider评估集上的执行准确率已经可以达到**0.825**.

## 二、Text-to-SQL微调

Expand Down Expand Up @@ -190,7 +191,7 @@ sh ./dbgpt_hub/scripts/export_merge.sh
```bash
python dbgpt_hub/eval/evaluation.py --plug_value --input Your_model_pred_file
```
你可以在[这里](docs/eval_llm_result.md)找到我们最新的评估结果
你可以在[这里](docs/eval_llm_result.md)找到我们最新的评估和实验结果
## 四、发展路线

整个过程我们会分为三个阶段:
Expand Down
9 changes: 5 additions & 4 deletions docs/eval_llm_result.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,14 @@ This doc aims to summarize the performance of publicly available big language mo
| ------------------------------ | ------------------ | ---------------------------------------------------------------------------------- |
| **GPT-4** | **0.762** | [numbersstation-eval-res](https://www.numbersstation.ai/post/nsql-llama-2-7b) |
| ChatGPT | 0.728 | [numbersstation-eval-res](https://www.numbersstation.ai/post/nsql-llama-2-7b)|
| **CodeLlama-13b-Instruct-hf_lora** | **0.789** | sft train by our this project,only used spider train dataset ,the same eval way in this project with lora SFT |
| CodeLlama-13b-Instruct-hf_qlora | 0.774 | sft train by our this project,only used spider train dataset ,the same eval way in this project with qlora and nf4,bit4 SFT |
| **CodeLlama-13b-Instruct-hf_lora** | **0.789** | sft train by our this project,only used spider train dataset ,the same eval way in this project with lora SFT. |
| **CodeLlama-13b-Instruct-hf_qlora** | **0.825** | sft train by our this project, used around 50 thousand pieces of text-to-sql data. ,the same eval way in this project with lora SFT,and we make sure the training set has filtered the spider eval dataset. |
| CodeLlama-13b-Instruct-hf_qlora | 0.774 | sft train by our this project,only used spider train dataset ,the same eval way in this project with qlora and nf4,bit4 SFT.|
| wizardcoder | 0.610 | [text-to-sql-wizardcoder](https://github.com/cuplv/text-to-sql-wizardcoder/tree/main) |
|CodeLlama-13b-Instruct-hf| 0.556 | eval in this project default param|
|Baichuan2-13B-Chat|0.392| eval in this project default param|
| llama2_13b_hf | xxx | run in this project,default param set |
| llama2_13b_hf_lora_best | 0.744 | sft train by our this project,only used spider train dataset ,the same eval way in this project |
| llama2_13b_hf | xxx | run in this project,default param set. |
| llama2_13b_hf_lora_best | 0.744 | sft train by our this project,only used spider train dataset ,the same eval way in this project. |



Expand Down