Skip to content

Commit

Permalink
docs: readme update & contact (#1097)
Browse files Browse the repository at this point in the history
  • Loading branch information
csunny authored Jan 22, 2024
1 parent 4f83363 commit 1484981
Show file tree
Hide file tree
Showing 6 changed files with 93 additions and 196 deletions.
192 changes: 50 additions & 142 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,42 +33,71 @@
</p>


[**简体中文**](README.zh.md) | [**Discord**](https://discord.gg/7uQnPuveTY) | [**Documents**](https://docs.dbgpt.site) | [**Wechat**](https://github.com/eosphoros-ai/DB-GPT/blob/main/README.zh.md#%E8%81%94%E7%B3%BB%E6%88%91%E4%BB%AC) | [**Community**](https://github.com/eosphoros-ai/community) | [**Paper**](https://arxiv.org/pdf/2312.17449.pdf)
[**简体中文**](README.zh.md) | [**Discord**](https://discord.gg/7uQnPuveTY) | [**Documents**](https://docs.dbgpt.site) | [**微信**](https://github.com/eosphoros-ai/DB-GPT/blob/main/README.zh.md#%E8%81%94%E7%B3%BB%E6%88%91%E4%BB%AC) | [**Community**](https://github.com/eosphoros-ai/community) | [**Paper**](https://arxiv.org/pdf/2312.17449.pdf)

</div>

## What is DB-GPT?

DB-GPT is an open-source framework designed for the realm of large language models (LLMs) within the database field. Its primary purpose is to provide infrastructure that simplifies and streamlines the development of database-related applications. This is accomplished through the development of various technical capabilities, including:
DB-GPT is an open-source, data-domain large model framework. Its purpose is to build the infrastructure for the large model domain by developing a variety of technical capabilities, including multi-model management, Text2SQL performance optimization, RAG framework and optimization, and Multi-Agents framework collaboration. These capabilities aim to simplify and facilitate the construction of large model applications around databases.

1. **SMMF(Service-oriented Multi-model Management Framework)**
2. **Text2SQL Fine-tuning**
3. **RAG(Retrieval Augmented Generation) framework and optimization**
4. **Data-Driven Agents framework collaboration**
5. **GBI(Generative Business intelligence)**

DB-GPT simplifies the creation of these applications based on large language models (LLMs) and databases.

In the era of Data 3.0, enterprises and developers can take the ability to create customized applications with minimal coding, which harnesses the power of large language models (LLMs) and databases.
In the Data 3.0 era, based on models and databases, enterprises and developers can build their own bespoke applications with less code.

### Data Agents
![data agents](https://github.com/eosphoros-ai/DB-GPT/assets/17919400/ced393b4-9180-437a-90c5-b43633cda8cb)

## Contents
- [Install](#install)
- [Demo](#demo)
- [Introduction](#introduction)
- [Install](#install)
- [Features](#features)
- [Contribution](#contribution)
- [Roadmap](#roadmap)
- [Contact](#contact-information)

[DB-GPT Youtube Video](https://www.youtube.com/watch?v=f5_g0OObZBQ)
## Introduction
The architecture of DB-GPT is shown in the following figure:

<p align="center">
<img src="./assets/dbgpt.png" width="800" />
</p>

The core capabilities include the following parts:

- **RAG (Retrieval Augmented Generation)**: RAG is currently the most practically implemented and urgently needed domain. DB-GPT has already implemented a framework based on RAG, allowing users to build knowledge-based applications using the RAG capabilities of DB-GPT.

- **GBI (Generative Business Intelligence)**: Generative BI is one of the core capabilities of the DB-GPT project, providing the foundational data intelligence technology to build enterprise report analysis and business insights.

- **Fine-tuning Framework**: Model fine-tuning is an indispensable capability for any enterprise to implement in vertical and niche domains. DB-GPT provides a complete fine-tuning framework that integrates seamlessly with the DB-GPT project. In recent fine-tuning efforts, an accuracy rate based on the Spider dataset has been achieved at 82.5%.

- **Data-Driven Multi-Agents Framework**: DB-GPT offers a data-driven self-evolving fine-tuning framework, aiming to continuously make decisions and execute based on data.

- **Data Factory**: The Data Factory is mainly about cleaning and processing trustworthy knowledge and data in the era of large models.

- **Data Sources**: Integrating various data sources to seamlessly connect production business data to the core capabilities of DB-GPT.

### SubModule
- [DB-GPT-Hub](https://github.com/eosphoros-ai/DB-GPT-Hub) Text-to-SQL workflow with high performance by applying Supervised Fine-Tuning (SFT) on Large Language Models (LLMs).

#### Text2SQL Finetune
- support llms
- [x] LLaMA
- [x] LLaMA-2
- [x] BLOOM
- [x] BLOOMZ
- [x] Falcon
- [x] Baichuan
- [x] Baichuan2
- [x] InternLM
- [x] Qwen
- [x] XVERSE
- [x] ChatGLM2

- SFT Accuracy
As of October 10, 2023, through the fine-tuning of an open-source model with 13 billion parameters using this project, we have achieved execution accuracy on the Spider dataset that surpasses even GPT-4!

## Demo
##### Chat Data
![chatdata](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/1f77079e-d018-4eee-982b-9b6a66bf1063)
[More Information about Text2SQL finetune](https://github.com/eosphoros-ai/DB-GPT-Hub)

##### Chat Excel
![excel](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/3044e83b-a71e-41fe-a1e2-98e479e0ab59)
- [DB-GPT-Plugins](https://github.com/eosphoros-ai/DB-GPT-Plugins) DB-GPT Plugins that can run Auto-GPT plugin directly
- [GPT-Vis](https://github.com/eosphoros-ai/GPT-Vis) Visualization protocol

## Install
![Docker](https://img.shields.io/badge/docker-%230db7ed.svg?style=for-the-badge&logo=docker&logoColor=white)
Expand Down Expand Up @@ -120,26 +149,7 @@ At present, we have introduced several key features to showcase our current capa
- Support Datasources
- [Datasources](http://docs.dbgpt.site/docs/modules/connections)

## Introduction
The architecture of DB-GPT is shown in the following figure:

<p align="center">
<img src="./assets/DB-GPT.png" width="800" />
</p>

The core capabilities primarily consist of the following components:
1. Multi-Models: We support multiple Large Language Models (LLMs) such as LLaMA/LLaMA2, CodeLLaMA, ChatGLM, QWen, Vicuna, and proxy models like ChatGPT, Baichuan, Tongyi, Wenxin, and more.
2. Knowledge-Based QA: Our system enables high-quality intelligent Q&A based on local documents such as PDFs, Word documents, Excel files, and other data sources.
3. Embedding: We offer unified data vector storage and indexing. Data is embedded as vectors and stored in vector databases, allowing for content similarity search.
4. Multi-Datasources: This feature connects different modules and data sources, facilitating data flow and interaction.
5. Multi-Agents: Our platform provides Agent and plugin mechanisms, empowering users to customize and enhance the system's behaviour.
6. Privacy & Security: Rest assured that there is no risk of data leakage, and your data is 100% private and secure.
7. Text2SQL: We enhance Text-to-SQL performance through Supervised Fine-Tuning (SFT) applied to Large Language Models (LLMs).

### SubModule
- [DB-GPT-Hub](https://github.com/eosphoros-ai/DB-GPT-Hub) Text-to-SQL workflow with high performance by applying Supervised Fine-Tuning (SFT) on Large Language Models (LLMs).
- [DB-GPT-Plugins](https://github.com/eosphoros-ai/DB-GPT-Plugins) DB-GPT Plugins that can run Auto-GPT plugin directly
- [DB-GPT-Web](https://github.com/eosphoros-ai/DB-GPT-Web) ChatUI for DB-GPT

## Image
🌐 [AutoDL Image](https://www.codewithgpu.com/i/eosphoros-ai/DB-GPT/dbgpt)
Expand All @@ -151,106 +161,8 @@ The core capabilities primarily consist of the following components:
## Contribution

- Please run `black .` before submitting the code.
- To check detailed guidelines for new contributions, please refer [how to contribute](https://github.com/csunny/DB-GPT/blob/main/CONTRIBUTING.md)

## RoadMap

<p align="left">
<img src="./assets/roadmap.jpg" width="800px" />
</p>

### KBQA RAG optimization
- [x] Multi Documents
- [x] PDF
- [x] Excel, CSV
- [x] Word
- [x] Text
- [x] MarkDown
- [ ] Code
- [ ] Images

- [x] RAG
- [ ] Graph Database
- [ ] Neo4j Graph
- [ ] Nebula Graph
- [x] Multi-Vector Database
- [x] Chroma
- [x] Milvus
- [x] Weaviate
- [x] PGVector
- [ ] Elasticsearch
- [ ] ClickHouse
- [ ] Faiss

- [ ] Testing and Evaluation Capability Building
- [ ] Knowledge QA datasets
- [ ] Question collection [easy, medium, hard]:
- [ ] Scoring mechanism
- [ ] Testing and evaluation using Excel + DB datasets

### Multi Datasource Support

- Multi Datasource Support
- [x] MySQL
- [x] PostgreSQL
- [x] Spark
- [x] DuckDB
- [x] Sqlite
- [x] MSSQL
- [x] ClickHouse
- [ ] Oracle
- [ ] Redis
- [ ] MongoDB
- [ ] HBase
- [x] Doris
- [ ] DB2
- [ ] Couchbase
- [ ] Elasticsearch
- [ ] OceanBase
- [ ] TiDB
- [ ] StarRocks

### Multi-Models And vLLM
- [x] [Cluster Deployment](https://docs.dbgpt.site/docs/installation/model_service/cluster)
- [x] [Fastchat Support](https://github.com/lm-sys/FastChat)
- [x] [vLLM Support](https://docs.dbgpt.site/docs/installation/advanced_usage/vLLM_inference)
- [ ] Cloud-native environment and support for Ray environment
- [ ] Service Registry(eg:nacos)
- [ ] Compatibility with OpenAI's interfaces
- [ ] Expansion and optimization of embedding models

### Agents market and Plugins
- [x] multi-agents framework
- [x] custom plugin development
- [x] plugin market
- [ ] Integration with CoT
- [ ] Enrich plugin sample library
- [ ] Support for AutoGPT protocol
- [ ] Integration of multi-agents and visualization capabilities, defining LLM+Vis new standards

### Cost and Observability
- [x] [debugging](https://docs.dbgpt.site/docs/application_manual/advanced_tutorial/debugging)
- [ ] Observability
- [ ] cost & budgets

### Text2SQL Finetune
- support llms
- [x] LLaMA
- [x] LLaMA-2
- [x] BLOOM
- [x] BLOOMZ
- [x] Falcon
- [x] Baichuan
- [x] Baichuan2
- [x] InternLM
- [x] Qwen
- [x] XVERSE
- [x] ChatGLM2
- To check detailed guidelines for new contributions, please refer [how to contribute](https://github.com/eosphoros-ai/DB-GPT/blob/main/CONTRIBUTING.md)

- SFT Accuracy
As of October 10, 2023, through the fine-tuning of an open-source model with 13 billion parameters using this project, we have achieved execution accuracy on the Spider dataset that surpasses even GPT-4!

[More Information about Text2SQL finetune](https://github.com/eosphoros-ai/DB-GPT-Hub)

## Licence
The MIT License (MIT)
Expand All @@ -272,8 +184,4 @@ If you find `DB-GPT` useful for your research or development, please cite the fo
We are working on building a community, if you have any ideas for building the community, feel free to contact us.
[![](https://dcbadge.vercel.app/api/server/7uQnPuveTY?compact=true&style=flat)](https://discord.gg/7uQnPuveTY)

<p align="center">
<img src="./assets/wechat.jpg" width="300px" />
</p>

[![Star History Chart](https://api.star-history.com/svg?repos=csunny/DB-GPT&type=Date)](https://star-history.com/#csunny/DB-GPT)
95 changes: 42 additions & 53 deletions README.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,19 +8,19 @@
<div align="center">
<p>
<a href="https://github.com/eosphoros-ai/DB-GPT">
<img alt="stars" src="https://img.shields.io/github/stars/csunny/db-gpt?style=social" />
<img alt="stars" src="https://img.shields.io/github/stars/eosphoros-ai/db-gpt?style=social" />
</a>
<a href="https://github.com/eosphoros-ai/DB-GPT">
<img alt="forks" src="https://img.shields.io/github/forks/csunny/db-gpt?style=social" />
<img alt="forks" src="https://img.shields.io/github/forks/eosphoros-ai/db-gpt?style=social" />
</a>
<a href="https://opensource.org/licenses/MIT">
<img alt="License: MIT" src="https://img.shields.io/badge/License-MIT-yellow.svg" />
</a>
<a href="https://github.com/eosphoros-ai/DB-GPT/releases">
<img alt="Release Notes" src="https://img.shields.io/github/release/csunny/DB-GPT" />
<img alt="Release Notes" src="https://img.shields.io/github/release/eosphoros-ai/DB-GPT" />
</a>
<a href="https://github.com/eosphoros-ai/DB-GPT/issues">
<img alt="Open Issues" src="https://img.shields.io/github/issues-raw/csunny/DB-GPT" />
<img alt="Open Issues" src="https://img.shields.io/github/issues-raw/eosphoros-ai/DB-GPT" />
</a>
<a href="https://discord.gg/7uQnPuveTY">
<img alt="Discord" src="https://dcbadge.vercel.app/api/server/7uQnPuveTY?compact=true&style=flat" />
Expand All @@ -33,39 +33,56 @@
</a>
</p>

[**English**](README.md) | [**Discord**](https://discord.gg/7uQnPuveTY) | [**文档**](https://www.yuque.com/eosphoros/dbgpt-docs/bex30nsv60ru0fmx) | [**微信**](https://github.com/csunny/DB-GPT/blob/main/README.zh.md#%E8%81%94%E7%B3%BB%E6%88%91%E4%BB%AC) | [**社区**](https://github.com/eosphoros-ai/community) | [**Paper**](https://arxiv.org/pdf/2312.17449.pdf)
[**English**](README.md) | [**Discord**](https://discord.gg/7uQnPuveTY) | [**文档**](https://www.yuque.com/eosphoros/dbgpt-docs/bex30nsv60ru0fmx) | [**微信**](https://github.com/eosphoros-ai/DB-GPT/blob/main/README.zh.md#%E8%81%94%E7%B3%BB%E6%88%91%E4%BB%AC) | [**社区**](https://github.com/eosphoros-ai/community) | [**Paper**](https://arxiv.org/pdf/2312.17449.pdf)
</div>

## DB-GPT 是什么?
DB-GPT是一个开源的数据库领域大模型框架。目的是构建大模型领域的基础设施,通过开发多模型管理、Text2SQL效果优化、RAG框架以及优化、Multi-Agents框架协作等多种技术能力,让围绕数据库构建大模型应用更简单,更方便。

DB-GPT是一个开源的数据域大模型框架。目的是构建大模型领域的基础设施,通过开发多模型管理、Text2SQL效果优化、RAG框架以及优化、Multi-Agents框架协作等多种技术能力,让围绕数据库构建大模型应用更简单,更方便。
数据3.0 时代,基于模型、数据库,企业/开发者可以用更少的代码搭建自己的专属应用。

## 目录
## 效果演示

- [安装](#安装)
- [效果演示](#效果演示)
### Data Agents
![data agents](https://github.com/eosphoros-ai/DB-GPT/assets/17919400/ced393b4-9180-437a-90c5-b43633cda8cb)


## 目录
- [架构方案](#架构方案)
- [安装](#安装)
- [特性简介](#特性一览)
- [贡献](#贡献)
- [路线图](#路线图)
- [联系我们](#联系我们)

[DB-GPT视频介绍](https://www.bilibili.com/video/BV1au41157bj/?spm_id_from=333.337.search-card.all.click&vd_source=7792e22c03b7da3c556a450eb42c8a0f)
## 架构方案

## 效果演示
<p align="center">
<img src="./assets/dbgpt.png" width="800px" />
</p>

##### Chat Data
![chatdata](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/1f77079e-d018-4eee-982b-9b6a66bf1063)
核心能力主要有以下几个部分:
- **RAG(Retrieval Augmented Generation)**,RAG是当下落地实践最多,也是最迫切的领域,DB-GPT目前已经实现了一套基于RAG的框架,用户可以基于DB-GPT的RAG能力构建知识类应用。

##### Chat Excel
![excel](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/3044e83b-a71e-41fe-a1e2-98e479e0ab59)
- **GBI**:生成式BI是DB-GPT项目的核心能力之一,为构建企业报表分析、业务洞察提供基础的数智化技术保障。

#### 根据自然语言对话生成分析图表
<p align="left">
<img src="./assets/dashboard.png" width="800px" />
- **微调框架**: 模型微调是任何一个企业在垂直、细分领域落地不可或缺的能力,DB-GPT提供了完整的微调框架,实现与DB-GPT项目的无缝打通,在最近的微调中,基于spider的准确率已经做到了82.5%

- **数据驱动的Multi-Agents框架**: DB-GPT提供了数据驱动的自进化微调框架,目标是可以持续基于数据做决策与执行。

- **数据工厂**: 数据工厂主要是在大模型时代,做可信知识、数据的清洗加工。

- **数据源**: 对接各类数据源,实现生产业务数据无缝对接到DB-GPT核心能力。

### RAG生产落地实践架构
<p align="center">
<img src="./assets/RAG-IN-ACTION.jpg" width="800px" />
</p>

### 子模块
- [DB-GPT-Hub](https://github.com/eosphoros-ai/DB-GPT-Hub) 通过微调来持续提升Text2SQL效果
- [DB-GPT-Plugins](https://github.com/eosphoros-ai/DB-GPT-Plugins) DB-GPT 插件仓库, 兼容Auto-GPT
- [GPT-Vis](https://github.com/eosphoros-ai/DB-GPT-Web) 可视化协议

## 安装

![Docker](https://img.shields.io/badge/docker-%230db7ed.svg?style=for-the-badge&logo=docker&logoColor=white)
Expand All @@ -84,7 +101,7 @@ DB-GPT是一个开源的数据库领域大模型框架。目的是构建大模
- [**Excel对话**](https://www.yuque.com/eosphoros/dbgpt-docs/prugoype0xd2g4bb)
- [**数据库对话**](https://www.yuque.com/eosphoros/dbgpt-docs/wswpv3zcm2c9snmg)
- [**报表分析**](https://www.yuque.com/eosphoros/dbgpt-docs/vsv49p33eg4p5xc1)
- [**插件**](https://www.yuque.com/eosphoros/dbgpt-docs/pom41m7oqtdd57hm)
- [**Agents**](https://www.yuque.com/eosphoros/dbgpt-docs/pom41m7oqtdd57hm)
- [**模型服务部署**](https://www.yuque.com/eosphoros/dbgpt-docs/vubxiv9cqed5mc6o)
- [**单机部署**](https://www.yuque.com/eosphoros/dbgpt-docs/kwg1ed88lu5fgawb)
- [**集群部署**](https://www.yuque.com/eosphoros/dbgpt-docs/gmbp9619ytyn2v1s)
Expand Down Expand Up @@ -137,34 +154,6 @@ DB-GPT是一个开源的数据库领域大模型框架。目的是构建大模
- [支持数据源](https://www.yuque.com/eosphoros/dbgpt-docs/rc4r27ybmdwg9472)


## 架构方案
整个DB-GPT的架构,如下图所示
<p align="center">
<img src="./assets/DB-GPT_zh.png" width="800px" />
</p>

核心能力主要有以下几个部分:
- **RAG(Retrieval Augmented Generation)**,RAG是当下落地实践最多,也是最迫切的领域,DB-GPT目前已经实现了一套基于RAG的框架,用户可以基于DB-GPT的RAG能力构建知识类应用。

- **GBI**:生成式BI是DB-GPT项目的核心能力之一,为构建企业报表分析、业务洞察提供基础的数智化技术保障。

- **微调框架**: 模型微调是任何一个企业在垂直、细分领域落地不可或缺的能力,DB-GPT提供了完整的微调框架,实现与DB-GPT项目的无缝打通,在最近的微调中,基于spider的准确率已经做到了82.5%

- **数据驱动的Multi-Agents框架**: DB-GPT提供了数据驱动的自进化微调框架,目标是可以持续基于数据做决策与执行。

- **数据工厂**: 数据工厂主要是在大模型时代,做可信知识、数据的清洗加工。

- **数据源**: 对接各类数据源,实现生产业务数据无缝对接到DB-GPT核心能力。

### RAG生产落地实践架构
<p align="center">
<img src="./assets/RAG-IN-ACTION.jpg" width="800px" />
</p>

### 子模块
- [DB-GPT-Hub](https://github.com/csunny/DB-GPT-Hub) 通过微调来持续提升Text2SQL效果
- [DB-GPT-Plugins](https://github.com/csunny/DB-GPT-Plugins) DB-GPT 插件仓库, 兼容Auto-GPT
- [DB-GPT-Web](https://github.com/csunny/DB-GPT-Web) 多端交互前端界面

## Image

Expand All @@ -180,7 +169,11 @@ DB-GPT是一个开源的数据库领域大模型框架。目的是构建大模

### 多模型使用

[使用指南](https://www.yuque.com/eosphoros/dbgpt-docs/huzgcf2abzvqy8uv)
- [使用指南](https://www.yuque.com/eosphoros/dbgpt-docs/huzgcf2abzvqy8uv)

### 数据Agents使用

- [数据Agents](https://www.yuque.com/eosphoros/dbgpt-docs/gwz4rayfuwz78fbq)

# 贡献
> 提交代码前请先执行 `black .`
Expand All @@ -193,10 +186,6 @@ The MIT License (MIT)

# 路线图

<p align="left">
<img src="./assets/roadmap.jpg" width="800px" />
</p>

### 知识库RAG检索优化

- [x] Multi Documents
Expand Down
Binary file added assets/dbgpt.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed assets/roadmap.jpg
Binary file not shown.
Binary file modified assets/wechat.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 1484981

Please sign in to comment.