Add corpus label for identifying groups of documents #269

nenb · 2024-01-16T00:56:13Z

This PR addresses some aspects of #256. There are 3 changes:

A corpus_id kwarg has been added to core components like the source_storages and the Chat class
source_storage tables are now identified by the corpus_id label rather than a chat_id label
All use of the prepared flag has been removed.

More details
A corpus_id label allows easy reference to embeddings that have already been computed. Removing the convention of using chat_id to name the tables means that embeddings are no longer coupled to chats and can be re-used. Removal of the prepared flag is necessary to prevent inconsistencies when re-using embeddings across chats (eg it would be inconsistent for the same corpus to have prepared values of True and False for different chats, which may occur for a variety of reasons)

Not in scope for this PR
The UI will also require some UX work before it can make proper use of this PR. For now, I have simply hardcoded the corpus_id label to be equal to the chat_id value in the UI section of the code. This should give functionally equivalent behaviour as to what currently exists.

nenb · 2024-01-16T14:02:32Z

Closing as it turns out that it is possible to implement similar functionality by designing a custom Document extension and a custom source_storage extension.

nenb mentioned this pull request Jan 16, 2024

Decouple prepare from chat #263

Closed

Add corpus label for identifying groups of documents

acc828f

nenb force-pushed the add-corpus branch from 6b3b89f to acc828f Compare January 16, 2024 00:59

nenb marked this pull request as ready for review January 16, 2024 01:15

nenb requested a review from pmeier January 16, 2024 01:15

nenb closed this Jan 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add corpus label for identifying groups of documents #269

Add corpus label for identifying groups of documents #269

nenb commented Jan 16, 2024 •

edited

Loading

nenb commented Jan 16, 2024

Add corpus label for identifying groups of documents #269

Add corpus label for identifying groups of documents #269

Conversation

nenb commented Jan 16, 2024 • edited Loading

nenb commented Jan 16, 2024

nenb commented Jan 16, 2024 •

edited

Loading