diff --git a/README.md b/README.md index b58ab25..0c4f568 100644 --- a/README.md +++ b/README.md @@ -25,7 +25,9 @@ DB-GPT-Hub is an experimental project utilizing LLMs (Large Language Models) to So far, we have successfully integrated multiple large models and established a complete workflow, including data processing, model SFT (Supervised Fine-Tuning) training, prediction output, and evaluation. The code is readily reusable within this project. -As of October 10, 2023, by fine-tuning an open-source model of 13 billion parameters using this project, **the execution accuracy on the Spider evaluation dataset has surpassed that of GPT-4!** +As of October 10, 2023, by fine-tuning an open-source model of 13 billion parameters using this project, **the execution accuracy on the Spider evaluation dataset has surpassed that of GPT-4!** + +Part of the experimental results have been compiled into the [document](docs/eval_llm_result.md) in this project. By utilizing this project and combining more related data, the execution accuracy on the Spider evaluation set has already reached **0.825**. ## 2. Fine-tuning Text-to-SQL @@ -204,7 +206,7 @@ Run the following command: ```bash python dbgpt_hub/eval/evaluation.py --plug_value --input Your_model_pred_file ``` -You can find the results of our latest review [here](docs/eval_llm_result.md) +You can find the results of our latest review and part of experiment results [here](docs/eval_llm_result.md) ## 4. RoadMap diff --git a/README.zh.md b/README.zh.md index afd5918..0ece1ce 100644 --- a/README.zh.md +++ b/README.zh.md @@ -23,7 +23,8 @@ DB-GPT-Hub是一个利用LLMs实现Text-to-SQL解析的实验项目,主要包含数据集收集、数据预处理、模型选择与构建和微调权重等步骤,通过这一系列的处理可以在提高Text-to-SQL能力的同时降低模型训练成本,让更多的开发者参与到Text-to-SQL的准确度提升工作当中,最终实现基于数据库的自动问答能力,让用户可以通过自然语言描述完成复杂数据库的查询操作等工作。 目前我们已经基于多个大模型打通从数据处理、模型SFT训练、预测输出和评估的整个流程,**代码在本项目中均可以直接复用**。 -截止20231010,我们利用本项目基于开源的13B大小的模型微调后,在Spider的评估集上的执行准确率,**已经超越GPT-4!** +截止20231010,我们利用本项目基于开源的13B大小的模型微调后,在Spider的评估集上的执行准确率,**已经超越GPT-4!** +部分实验结果已汇总到了本项目的相关[文档](docs/eval_llm_result.md) ,利用本项目结合更多相关数据在Spider评估集上的执行准确率已经可以达到**0.825**. ## 二、Text-to-SQL微调 @@ -190,7 +191,7 @@ sh ./dbgpt_hub/scripts/export_merge.sh ```bash python dbgpt_hub/eval/evaluation.py --plug_value --input Your_model_pred_file ``` -你可以在[这里](docs/eval_llm_result.md)找到我们最新的评估结果。 +你可以在[这里](docs/eval_llm_result.md)找到我们最新的评估和实验结果。 ## 四、发展路线 整个过程我们会分为三个阶段: diff --git a/docs/eval_llm_result.md b/docs/eval_llm_result.md index 623d5ba..a91f5a2 100644 --- a/docs/eval_llm_result.md +++ b/docs/eval_llm_result.md @@ -7,13 +7,14 @@ This doc aims to summarize the performance of publicly available big language mo | ------------------------------ | ------------------ | ---------------------------------------------------------------------------------- | | **GPT-4** | **0.762** | [numbersstation-eval-res](https://www.numbersstation.ai/post/nsql-llama-2-7b) | | ChatGPT | 0.728 | [numbersstation-eval-res](https://www.numbersstation.ai/post/nsql-llama-2-7b)| -| **CodeLlama-13b-Instruct-hf_lora** | **0.789** | sft train by our this project,only used spider train dataset ,the same eval way in this project with lora SFT | -| CodeLlama-13b-Instruct-hf_qlora | 0.774 | sft train by our this project,only used spider train dataset ,the same eval way in this project with qlora and nf4,bit4 SFT | +| **CodeLlama-13b-Instruct-hf_lora** | **0.789** | sft train by our this project,only used spider train dataset ,the same eval way in this project with lora SFT. | +| **CodeLlama-13b-Instruct-hf_qlora** | **0.825** | sft train by our this project, used around 50 thousand pieces of text-to-sql data. ,the same eval way in this project with lora SFT,and we make sure the training set has filtered the spider eval dataset. | +| CodeLlama-13b-Instruct-hf_qlora | 0.774 | sft train by our this project,only used spider train dataset ,the same eval way in this project with qlora and nf4,bit4 SFT.| | wizardcoder | 0.610 | [text-to-sql-wizardcoder](https://github.com/cuplv/text-to-sql-wizardcoder/tree/main) | |CodeLlama-13b-Instruct-hf| 0.556 | eval in this project default param| |Baichuan2-13B-Chat|0.392| eval in this project default param| -| llama2_13b_hf | xxx | run in this project,default param set | -| llama2_13b_hf_lora_best | 0.744 | sft train by our this project,only used spider train dataset ,the same eval way in this project | +| llama2_13b_hf | xxx | run in this project,default param set. | +| llama2_13b_hf_lora_best | 0.744 | sft train by our this project,only used spider train dataset ,the same eval way in this project. |