Feat rdb summary wide table #2035

FOkvj · 2024-09-22T09:50:46Z

Description

When the relational database table is a wide table, there are a lot of fields. In addition to many redundant fields, retrieving the full number of fields may also exceed the maximum sequence length accepted by the embedding model when performing summary embedding. As a result, the generated embedding cannot accurately reflect the semantic information of the summary. Therefore, for wide tables, I split the fields and the basic information of the table. If the number of fields in the table is too large, the fields will be divided into multiple chunks during summary, and the length of a chunk does not exceed the maximum sequence length of the embedding model. If the table is not wide, then the summary is the same as the original, and the table name and the table description and fields are in the same chunk. In the retrieval, the table name is retrieved first, then the table name (id) is used as filter, and the query is used for vector retrieval, and then the table name and table description are assembled with the field as the final result.

How Has This Been Tested?

Test summary of wide table and retrieve respectively in dbgpt/rag/assembler/tests/test_db_struct_assembler.py and dbgpt/rag/assembler/tests/test_embedding_assembler.py

Snapshots:

Checklist:

My code follows the style guidelines of this project
I have already rebased the commits and make the commit message conform to the project standard.
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
Any dependent changes have been merged and published in downstream modules

…ables in relational databases

fangyinc · 2024-10-18T05:07:54Z

@Aries-ckt Please review it.

intfish123 · 2024-12-11T07:07:37Z

@Aries-ckt 有什么进展吗，啥时候能合进去，我目前也遇到了同样的问题

FOkvj · 2024-12-11T09:51:58Z

@Aries-ckt 有什么进展吗，啥时候能合进去，我目前也遇到了同样的问题

先用该分支试试看

Aries-ckt · 2024-12-11T11:09:09Z

你好, 这边pr功能没啥问题，主要有个顾虑，可能老用户的表schema 向量数据就找不到了

FOkvj · 2024-12-12T10:09:36Z

你好, 这边pr功能没啥问题，主要有个顾虑，可能老用户的表schema 向量数据就找不到了

针对该问题，已对代码作出了修改，可保证用户原有的向量数据正常被检索

Aries-ckt · 2024-12-17T16:32:09Z

@FOkvj , hi why do you remove field_vector_connector

FOkvj · 2024-12-17T23:43:09Z

@Aries-ckt here，I make it created by default, so I think those code is not necessary.

Aries-ckt · 2024-12-18T01:49:16Z

But when get_summary -> _similarity_search -> _retrieve_field it will meet self._field_vector_store_connector = None error.

…error

FOkvj · 2024-12-18T09:38:28Z

@Aries-ckt fixed

Aries-ckt · 2024-12-18T09:55:50Z

Test Success

Aries-ckt

LGTM.

Aries-ckt · 2024-12-18T10:07:20Z

@FOkvj Are you in our WeChat group? and what's your WeChat alias?

FOkvj · 2024-12-18T11:57:17Z

@Aries-ckt Yep，I go by Wooop！ 😊

csunny

LGTM~

dongzhancai1 and others added 3 commits September 17, 2024 18:04

feat(rdb_summary): Support summary generation and retrieval of wide t…

ae61874

…ables in relational databases

Merge branch 'eosphoros-ai:main' into feat-rdb_summary-wide_table

76368e6

feat:unit test for wide table summary and retrival

6f948e1

Aries-ckt mentioned this pull request Oct 7, 2024

[Bug]When running the project locally, the mysql query table structure is too long, causing the fields to be truncated characters truncated #2052

Open

Merge branch 'main' into feat-rdb_summary-wide_table

a66a68d

dongzhancai1 added 5 commits December 12, 2024 14:29

feat(rdbsummary): Ensure that old data is available

c4f2bd9

feat(rdbsummary): Ensure that old data is available

b536a99

feat(rdbsummary-wide-table): surport chatdashboard

d78acc1

feat(rdbsummary-wide-table): add wide table case

aa33e71

Merge branch 'feat-rdb_summary-wide_table'

a7f4ac6

dongzhancai1 added 4 commits December 12, 2024 19:25

chore

2149e2a

Merge branch 'main' into feat-rdb_summary-wide_table

d817230

chore(rdb_summary-wide_table))

fa4a988

fix(rdb_summary-wide_table): fix tests

750e3e8

dongzhancai1 added 2 commits December 18, 2024 11:16

fix(rdb_summary-wide_table): self._field_vector_store_connector None …

767539d

…error

fix(rdb_summary-wide_table): delete database profile

c6318c8

Aries-ckt approved these changes Dec 18, 2024

View reviewed changes

csunny approved these changes Dec 18, 2024

View reviewed changes

csunny merged commit 9b0161e into eosphoros-ai:main Dec 18, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat rdb summary wide table #2035

Feat rdb summary wide table #2035

FOkvj commented Sep 22, 2024

fangyinc commented Oct 18, 2024

intfish123 commented Dec 11, 2024

FOkvj commented Dec 11, 2024

Aries-ckt commented Dec 11, 2024

FOkvj commented Dec 12, 2024

Aries-ckt commented Dec 17, 2024

FOkvj commented Dec 17, 2024

Aries-ckt commented Dec 18, 2024

FOkvj commented Dec 18, 2024 •

edited

Loading

Aries-ckt commented Dec 18, 2024

Aries-ckt left a comment

Aries-ckt commented Dec 18, 2024

FOkvj commented Dec 18, 2024

csunny left a comment

Feat rdb summary wide table #2035

Feat rdb summary wide table #2035

Conversation

FOkvj commented Sep 22, 2024

Description

How Has This Been Tested?

Snapshots:

Checklist:

fangyinc commented Oct 18, 2024

intfish123 commented Dec 11, 2024

FOkvj commented Dec 11, 2024

Aries-ckt commented Dec 11, 2024

FOkvj commented Dec 12, 2024

Aries-ckt commented Dec 17, 2024

FOkvj commented Dec 17, 2024

Aries-ckt commented Dec 18, 2024

FOkvj commented Dec 18, 2024 • edited Loading

Aries-ckt commented Dec 18, 2024

Aries-ckt left a comment

Choose a reason for hiding this comment

Aries-ckt commented Dec 18, 2024

FOkvj commented Dec 18, 2024

csunny left a comment

Choose a reason for hiding this comment

FOkvj commented Dec 18, 2024 •

edited

Loading