-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store Message embedding #540
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey, thank you. I've left a few comments. my biggest worry here is about the question what happens if the text is too long. it might not "fail", i.e. we might get a result, but the model might not be trained for that length, so it might actually be garbage.
@@ -41,6 +51,10 @@ async def post(self, input: str) -> Any: | |||
async with session.post(self.api_url, headers=self.headers, json=payload) as response: | |||
# If we get a bad response | |||
if response.status != 200: | |||
from loguru import logger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import things at top level
@@ -95,6 +95,7 @@ services: | |||
- DEBUG_SKIP_API_KEY_CHECK=True | |||
- DEBUG_USE_SEED_DATA=True | |||
- MAX_WORKERS=1 | |||
- DEBUG_SKIP_EMBEDDING_COMPUTATION=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you also add this to the ansible playbook?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When deployed via ansible the fetching of the embeddings should not be skipped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When should be skipped? Only in local-run.sh?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why I asked again... probably also won't be necessary for Frontend development..
@@ -41,6 +51,10 @@ async def post(self, input: str) -> Any: | |||
async with session.post(self.api_url, headers=self.headers, json=payload) as response: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we know what happens if the text is longer than the intended text length of the model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about this.
If there is a certain sentence length pre-defined in the model, the extra words will be truncated.
And the arch of our model is the following:
SentenceTransformer(
(0): Transformer({
'max_seq_length': 128,
'do_lower_case': False
}) with Transformer model: BertModel
(1): Pooling({
'word_embedding_dimension': 384,
'pooling_mode_cls_token': False,
'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False,
'pooling_mode_mean_sqrt_len_tokens': False})
)
So would have a sentence length of 128 words. The impact on us will depend then if there is an important amount of messages that have > 128 words.
Related Issue here.
877c744
to
ef7bd89
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Added my opinion regarding model-id
recommended changes:
- remove 2x space
- upper camel case enum name
HfUrl
Yes, all of the text processing is done by the endpoint. And we are also storing which is the version that we are using of that model.
Okay, have applied also those changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good from my side 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks a lot! :-)
Thankss! |
Create message embeddings and store them in the DB.
Related to this issue.