Update README.zh.md

Add citation part
eosphoros-ai · Nov 15, 2023 · 50a583b · 50a583b
1 parent d8d8cae
commit 50a583b
Showing 1 changed file with 21 additions and 12 deletions.
diff --git a/README.zh.md b/README.zh.md
@@ -11,10 +11,7 @@
     <a href="https://opensource.org/licenses/MIT">
       <img alt="License: MIT" src="https://img.shields.io/badge/License-MIT-yellow.svg" />
     </a>
-    <a href="https://opensource.org/licenses/MIT">
-      <img alt="License: MIT" src="https://img.shields.io/badge/License-MIT-yellow.svg" />
-    </a>
-     <a href="https://github.com/eosphoros-ai/DB-GPT-Hub/releases">
+    <a href="https://github.com/eosphoros-ai/DB-GPT-Hub/releases">
       <img alt="Release Notes" src="https://img.shields.io/github/release/eosphoros-ai/DB-GPT-Hub" />
     </a>
     <a href="https://github.com/eosphoros-ai/DB-GPT-Hub/issues">
@@ -26,7 +23,7 @@
   </p>
 
 
-[**英文**](README.md) |[**Discord**](https://discord.gg/nASQyBjvY)|[**Wechat**](https://github.com/eosphoros-ai/DB-GPT/blob/main/README.zh.md#%E8%81%94%E7%B3%BB%E6%88%91%E4%BB%AC)|[**Huggingface**](https://huggingface.co/eosphoros)|[**Community**](https://github.com/eosphoros-ai/community)
+[**英文**](README.md) | [**Discord**](https://discord.gg/nASQyBjvY) | [**Wechat**](https://github.com/eosphoros-ai/DB-GPT/blob/main/README.zh.md#%E8%81%94%E7%B3%BB%E6%88%91%E4%BB%AC) | [**Huggingface**](https://huggingface.co/eosphoros) | [**Community**](https://github.com/eosphoros-ai/community)
 </div>
 
 ## Contents
@@ -121,7 +118,7 @@ DB-GPT-Hub使用的是信息匹配生成法进行数据准备，即结合表信
 数据预处理部分，**只需运行如下脚本**即可：
 ```bash
 ## 生成train数据 和dev(eval)数据,
-sh dbgpt_hub/scripts/gen_train_eval_data.sh
+poetry run sh dbgpt_hub/scripts/gen_train_eval_data.sh
 ```
 在`dbgpt_hub/data/`目录你会得到新生成的训练文件example_text2sql_train.json 和测试文件example_text2sql_dev.json ，数据量分别为8659和1034条。 对于后面微调时的数据使用在dbgpt_hub/data/dataset_info.json中将参数`file_name`值给为训练集的文件名，如example_text2sql_train.json。
 
@@ -144,7 +141,7 @@ sh dbgpt_hub/scripts/gen_train_eval_data.sh
 默认QLoRA微调，运行命令：
 
 ```bash
-sh dbgpt_hub/scripts/train_sft.sh
+poetry run sh dbgpt_hub/scripts/train_sft.sh
 ```
 微调后的模型权重会默认保存到adapter文件夹下面，即dbgpt_hub/output/adapter目录中。  
 **如果使用多卡训练，想要用deepseed** ，则将train_sft.sh中默认的内容进行更改，
@@ -200,7 +197,7 @@ deepspeed --num_gpus 2  dbgpt_hub/train/sft_train.py \
 项目目录下`./dbgpt_hub/`下的`output/pred/`，此文件路径为关于模型预测结果默认输出的位置(如果没有则建上)。   
 预测运行命令：
 ```bash
-sh ./dbgpt_hub/scripts/predict_sft.sh
+poetry run sh ./dbgpt_hub/scripts/predict_sft.sh
 ```   
 脚本中默认带着参数`--quantization_bit `为QLoRA的预测，去掉即为LoRA的预测方式。  
 其中参数 `--predicted_out_filename` 的值为模型预测的结果文件名，结果在`dbgpt_hub/output/pred`目录下可以找到。
@@ -212,7 +209,7 @@ sh ./dbgpt_hub/scripts/predict_sft.sh
 #### 3.5.1 模型和微调权重合并
 如果你需要将训练的基础模型和微调的Peft模块的权重合并，导出一个完整的模型。则运行如下模型导出脚本：  
 ```bash
-sh ./dbgpt_hub/scripts/export_merge.sh
+poetry run sh ./dbgpt_hub/scripts/export_merge.sh
 ```
 注意将脚本中的相关参数路径值替换为你项目所对应的路径。      
 
@@ -222,7 +219,7 @@ sh ./dbgpt_hub/scripts/export_merge.sh
 运行以下命令来：
 
 ```bash
-python dbgpt_hub/eval/evaluation.py --plug_value --input  Your_model_pred_file
+poetry run python dbgpt_hub/eval/evaluation.py --plug_value --input  Your_model_pred_file
 ```
 你可以在[这里](docs/eval_llm_result.md)找到我们最新的评估和实验结果。
 **注意**： 默认的代码中指向的数据库为从[Spider官方网站](https://yale-lily.github.io/spider)下载的大小为95M的database，如果你需要使用基于Spider的[test-suite](https://github.com/taoyds/test-suite-sql-eval)中的数据库(大小1.27G)，请先下载链接中的数据库到自定义目录，并在上述评估命令中增加参数和值，形如`--db Your_download_db_path`。
@@ -278,11 +275,23 @@ python dbgpt_hub/eval/evaluation.py --plug_value --input  Your_model_pred_file
 非常感谢所有的contributors! 
  **20231104** ,尤其感谢 @[JBoRu](https://github.com/JBoRu) 提的[issue](https://github.com/eosphoros-ai/DB-GPT-Hub/issues/119)， 指出我们的之前按照官方网站的95M的数据库去评估的方式的不足，如论文《SQL-PALM: IMPROVED LARGE LANGUAGE MODEL ADAPTATION FOR TEXT-TO-SQL》 指出的 "We consider two commonly-used evaluation metrics: execution accuracy (EX) and test-suite accuracy (TS) [32]. EX measures whether SQL execution outcome matches ground truth (GT), whereas TS measures whether the SQL passes all EX evaluation for multiple tests, generated by database-augmentation. Since EX contains false positives, we consider TS as a more reliable evaluation metric" 。
 
-## 七、Licence
+## 七、引用
+如果您觉得我们的项目对您的科研项目或者实际生产项目有帮助，请考虑在您的参考文献里引用`DB-GPT-Hub`:
+
+```bibtex
+@software{db-gpt-hub,
+    author = {DB-GPT-Hub Team},
+    title = {{DB-GPT-Hub}},
+    url = {https://github.com/eosphoros-ai/DB-GPT-Hub},
+    year = {2023}
+}
+```
+
+## 八、Licence
 
 The MIT License (MIT)
 
-## 八、Contact Information
+## 九、我们的联系方式
 我们是一个社区一起合作，如果你对我们的社区工作有任何建议，随时可以联系我们。如果你对DB-GPT-Hub子项目的深入实验和优化感兴趣，可以联系微信群里的wangzai，我们欢迎大家共同努力，使它变得更好。
 [![](https://dcbadge.vercel.app/api/server/nASQyBjvY?compact=true&style=flat)](https://discord.gg/nASQyBjvY)