-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compute text-embeddings for incoming meassges via HF feature-extraction pipeline #507
Comments
I would take a look at this! Would it make sense to save the embeddings in a new table? My thinking is that with a new table with the columns |
I think I could also take a look at this one, as it is related to classification of the messages in HF. |
Having a new table would make sense to me to minimise schema changes on new models |
@SummerSigh see this issue. Similar to embedders we are building for safety. Let's all keep in contact re this so we can cross use stuff @jojopirker. |
I'll ping you guys in the discord channel :) @ontocord |
@ontocord Ok! Sounds good! |
if I understand correctly, this was solved in #540 |
We want to store an embedding together with each message in the DB to measure similarity and diversity (e.g. to detect (near-)duplicates).
<short_modelname>_embedding
column to store the embedding of message-text, create alembic update scriptscripts/backend_development/run-local.sh
script.(Non-collaborators: Please leave a comment if you want to work on this task. Someone will then assign the task to you.)
The text was updated successfully, but these errors were encountered: