Add corpus label for identifying groups of documents #269
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR addresses some aspects of #256. There are 3 changes:
corpus_id
kwarg has been added to core components like thesource_storage
s and theChat
classsource_storage
tables are now identified by thecorpus_id
label rather than achat_id
labelprepared
flag has been removed.More details
A
corpus_id
label allows easy reference to embeddings that have already been computed. Removing the convention of usingchat_id
to name the tables means that embeddings are no longer coupled to chats and can be re-used. Removal of theprepared
flag is necessary to prevent inconsistencies when re-using embeddings across chats (eg it would be inconsistent for the same corpus to haveprepared
values ofTrue
andFalse
for different chats, which may occur for a variety of reasons)Not in scope for this PR
The UI will also require some UX work before it can make proper use of this PR. For now, I have simply hardcoded the
corpus_id
label to be equal to thechat_id
value in the UI section of the code. This should give functionally equivalent behaviour as to what currently exists.