Store Message embedding #540

nil-andreu · 2023-01-08T13:12:55Z

Create message embeddings and store them in the DB.

Related to this issue.

yk

hey, thank you. I've left a few comments. my biggest worry here is about the question what happens if the text is too long. it might not "fail", i.e. we might get a result, but the model might not be trained for that length, so it might actually be garbage.

yk · 2023-01-09T07:28:17Z

backend/oasst_backend/utils/hugging_face.py

@@ -41,6 +51,10 @@ async def post(self, input: str) -> Any:
            async with session.post(self.api_url, headers=self.headers, json=payload) as response:
                # If we get a bad response
                if response.status != 200:
+                    from loguru import logger


import things at top level

yk · 2023-01-09T07:29:45Z

docker-compose.yaml

@@ -95,6 +95,7 @@ services:
      - DEBUG_SKIP_API_KEY_CHECK=True
      - DEBUG_USE_SEED_DATA=True
      - MAX_WORKERS=1
+      - DEBUG_SKIP_EMBEDDING_COMPUTATION=True


could you also add this to the ansible playbook?

When deployed via ansible the fetching of the embeddings should not be skipped?

When should be skipped? Only in local-run.sh?

That's why I asked again... probably also won't be necessary for Frontend development..

yk · 2023-01-09T07:30:38Z

backend/oasst_backend/utils/hugging_face.py

@@ -41,6 +51,10 @@ async def post(self, input: str) -> Any:
            async with session.post(self.api_url, headers=self.headers, json=payload) as response:


do we know what happens if the text is longer than the intended text length of the model?

I am not sure about this.
If there is a certain sentence length pre-defined in the model, the extra words will be truncated.

And the arch of our model is the following:

SentenceTransformer( (0): Transformer({ 'max_seq_length': 128, 'do_lower_case': False }) with Transformer model: BertModel (1): Pooling({ 'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False}) )

So would have a sentence length of 128 words. The impact on us will depend then if there is an important amount of messages that have > 128 words.

Related Issue here.

backend/oasst_backend/models/message_embedding.py

backend/oasst_backend/utils/hugging_face.py

andreaskoepf

Added my opinion regarding model-id

recommended changes:

remove 2x space
upper camel case enum name HfUrl

backend/oasst_backend/api/v1/tasks.py

backend/oasst_backend/models/message_embedding.py

backend/oasst_backend/utils/hugging_face.py

nil-andreu · 2023-01-09T09:11:19Z

Added my opinion regarding model-id

Yes, all of the text processing is done by the endpoint. And we are also storing which is the version that we are using of that model.

recommended changes:

remove 2x space

upper camel case enum name HfUrl

Added my opinion regarding model-id

recommended changes:

remove 2x space

upper camel case enum name HfUrl

Okay, have applied also those changes.

yk

good from my side 👍

andreaskoepf

Looks good, thanks a lot! :-)

nil-andreu · 2023-01-09T21:33:47Z

Looks good, thanks a lot! :-)

Thankss!

jojopirker and others added 6 commits January 8, 2023 12:28

message embeddings in Messages table

11d55d5

[NEW] Except OasstError

34e7d1d

insert embedding now to new table

a677e40

[NEW] Message embedding created_date

7101e0e

[NEW] Solving merge conflicts

fe4d265

[NEW] Removing embedding param in function Store Text Reply

19eee6b

nil-andreu changed the title ~~Message embedding~~ Store Message embedding Jan 8, 2023

nil-andreu added 4 commits January 8, 2023 20:58

[NEW] Refactor name of message_embedding object

225a136

[NEW] insert_message_embedding: documentation

412736f

[NEW] Adding consistency in the URLs

e241a8b

[NEW] Created date

7062052

nil-andreu marked this pull request as ready for review January 8, 2023 20:30

nil-andreu requested review from yk and andreaskoepf as code owners January 8, 2023 20:30

fozziethebeat added the backend label Jan 8, 2023

yk requested changes Jan 9, 2023

View reviewed changes

nil-andreu added 2 commits January 9, 2023 09:03

[FIX] Import on top

b39b863

[NEW] ansible: DEBUG_SKIP_EMBEDDING_COMPUTATION

ef7bd89

nil-andreu force-pushed the messageEmbeddings branch from 877c744 to ef7bd89 Compare January 9, 2023 08:10

andreaskoepf reviewed Jan 9, 2023

View reviewed changes

backend/oasst_backend/api/v1/tasks.py Outdated Show resolved Hide resolved

backend/oasst_backend/models/message_embedding.py Show resolved Hide resolved

backend/oasst_backend/utils/hugging_face.py Outdated Show resolved Hide resolved

[NEW] Camelcase & 2x space

4b4a564

yk approved these changes Jan 9, 2023

View reviewed changes

andreaskoepf approved these changes Jan 9, 2023

View reviewed changes

yk merged commit ba12c35 into LAION-AI:main Jan 9, 2023

yk mentioned this pull request Jan 10, 2023

Compute text-embeddings for incoming meassges via HF feature-extraction pipeline #507

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store Message embedding #540

Store Message embedding #540

nil-andreu commented Jan 8, 2023

yk left a comment

yk Jan 9, 2023

yk Jan 9, 2023

jojopirker Jan 9, 2023

nil-andreu Jan 9, 2023

jojopirker Jan 9, 2023

yk Jan 9, 2023

nil-andreu Jan 9, 2023 •

edited

Loading

andreaskoepf left a comment

nil-andreu commented Jan 9, 2023

yk left a comment

andreaskoepf left a comment

nil-andreu commented Jan 9, 2023

		@@ -41,6 +51,10 @@ async def post(self, input: str) -> Any:
		async with session.post(self.api_url, headers=self.headers, json=payload) as response:

Store Message embedding #540

Store Message embedding #540

Conversation

nil-andreu commented Jan 8, 2023

yk left a comment

Choose a reason for hiding this comment

yk Jan 9, 2023

Choose a reason for hiding this comment

yk Jan 9, 2023

Choose a reason for hiding this comment

jojopirker Jan 9, 2023

Choose a reason for hiding this comment

nil-andreu Jan 9, 2023

Choose a reason for hiding this comment

jojopirker Jan 9, 2023

Choose a reason for hiding this comment

yk Jan 9, 2023

Choose a reason for hiding this comment

nil-andreu Jan 9, 2023 • edited Loading

Choose a reason for hiding this comment

andreaskoepf left a comment

Choose a reason for hiding this comment

nil-andreu commented Jan 9, 2023

yk left a comment

Choose a reason for hiding this comment

andreaskoepf left a comment

Choose a reason for hiding this comment

nil-andreu commented Jan 9, 2023

nil-andreu Jan 9, 2023 •

edited

Loading