-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Article about solving contento.me #103
base: master
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for condescending-goldwasser-91acf0 ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
weight: 8 | ||
author: Andrei Vasnetsov | ||
author_link: https://blog.vasnetsov.com/ | ||
date: 2022-06-28T08:57:07.604Z |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
old date
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, the description sounds even like a better title than "... practical implications"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
too long for title
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we think of rewriting the approach, but with Qdrant as a backend for similarity search? I think even the recommendation API might be a great idea with positive and negative examples. That would also simplify exposing a demo so everybody can open contexto.me in one tab, our demo in the second one and solve the riddle on their own.
weight: 8 | ||
author: Andrei Vasnetsov | ||
author_link: https://blog.vasnetsov.com/ | ||
date: 2022-06-28T08:57:07.604Z |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, the description sounds even like a better title than "... practical implications"
It tends to get stuck in clusters of words that are similar to each other, forcing us to retrieve many words from the model. | ||
|
||
Additionally, using linear algebra techniques to evaluate the exact vector based on distances to given points does not look feasible in this scenario. | ||
This is because the exact word2vec model used to sort the words is unknown, as is the exact distance to the secret word. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was unclear to me what "exact word2vec model" means until I read the prompt put into ChatGPT. Maybe we could replace it with the "original word2vec model"?
Based on the previous section, we can conclude, that using only the most similar word found so far is not enough to generate efficient guesses. | ||
The more efficient solution must also consider the words deemed dissimilar. | ||
|
||
Let's consider the simplest case: we have guessed 2 words `house` and `blue` and received feedback on their similarity to the secret word. | ||
|
||
One of the words is closer to the secret word than the other, so we can make some assumptions about the secret word. | ||
We understand that the secret word is more likely to be similar to `house` than `blue`, but we only have the informaion about its relative similarity to these two words. | ||
|
||
Let's assign a score to each word in the vocabulary based on this observation: | ||
|
||
{{< figure src=/articles_data/solving-contexto/scoring-1.png caption="Scoring words based on 2 guesses">}} | ||
|
||
We assign +1 score to those words that are closer to `house` than `blue` and -1 score to those words that are closer to `blue` than `house`. | ||
|
||
Now, we can use this score to rank the words in the vocabulary and use word with the highest score as our next guess. | ||
|
||
Let's see how scores change after we make a third guess: | ||
|
||
{{< figure src=/articles_data/solving-contexto/scoring-2.png caption="Ranking words based on next 2 guesses">}} | ||
|
||
We can generalize this approach to any number of guesses. | ||
The simpliest way to do this is to sample pairs of guesses and update the score iteratively. | ||
|
||
That's it! We can use this approach to suggest words one by one and extend guess list accordingly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually expected the approach would be to use the recommendation API, or even just search with all the embeddings stored in Qdrant. That might be an interesting case to promote, but there is no reference to Qdrant, while there could be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is reference in the last section
Co-authored-by: George <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: George <[email protected]>
@generall I agree with @kacperlukawski that we can increase users engagement if we provide some interactive tool like a website to let them easily solve a game on their own Speaking about the PR, I consider it ready to publication except the unresolved merge conflict |
Ok, done https://solve-contexto.qdrant.tech/ |
cool! Word #86 occurred to be a tough one Maybe we can set constant random seed and also forbid anything besides numbers in the input form? |
No description provided.