Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Article about solving contento.me #103

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

Article about solving contento.me #103

wants to merge 11 commits into from

Conversation

generall
Copy link
Member

@generall generall commented Dec 9, 2022

No description provided.

@netlify
Copy link

netlify bot commented Dec 9, 2022

Deploy Preview for condescending-goldwasser-91acf0 ready!

Name Link
🔨 Latest commit 0156bf0
🔍 Latest deploy log https://app.netlify.com/sites/condescending-goldwasser-91acf0/deploys/6495dea061232a0008cdfcac
😎 Deploy Preview https://deploy-preview-103--condescending-goldwasser-91acf0.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

weight: 8
author: Andrei Vasnetsov
author_link: https://blog.vasnetsov.com/
date: 2022-06-28T08:57:07.604Z
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

old date

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, the description sounds even like a better title than "... practical implications"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too long for title

qdrant-landing/content/articles/solving-contexto.md Outdated Show resolved Hide resolved
qdrant-landing/content/articles/solving-contexto.md Outdated Show resolved Hide resolved
qdrant-landing/content/articles/solving-contexto.md Outdated Show resolved Hide resolved
qdrant-landing/content/articles/solving-contexto.md Outdated Show resolved Hide resolved
qdrant-landing/content/articles/solving-contexto.md Outdated Show resolved Hide resolved
qdrant-landing/content/articles/solving-contexto.md Outdated Show resolved Hide resolved
qdrant-landing/content/articles/solving-contexto.md Outdated Show resolved Hide resolved
Copy link
Member

@kacperlukawski kacperlukawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we think of rewriting the approach, but with Qdrant as a backend for similarity search? I think even the recommendation API might be a great idea with positive and negative examples. That would also simplify exposing a demo so everybody can open contexto.me in one tab, our demo in the second one and solve the riddle on their own.

weight: 8
author: Andrei Vasnetsov
author_link: https://blog.vasnetsov.com/
date: 2022-06-28T08:57:07.604Z
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, the description sounds even like a better title than "... practical implications"

It tends to get stuck in clusters of words that are similar to each other, forcing us to retrieve many words from the model.

Additionally, using linear algebra techniques to evaluate the exact vector based on distances to given points does not look feasible in this scenario.
This is because the exact word2vec model used to sort the words is unknown, as is the exact distance to the secret word.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was unclear to me what "exact word2vec model" means until I read the prompt put into ChatGPT. Maybe we could replace it with the "original word2vec model"?

Comment on lines 99 to 122
Based on the previous section, we can conclude, that using only the most similar word found so far is not enough to generate efficient guesses.
The more efficient solution must also consider the words deemed dissimilar.

Let's consider the simplest case: we have guessed 2 words `house` and `blue` and received feedback on their similarity to the secret word.

One of the words is closer to the secret word than the other, so we can make some assumptions about the secret word.
We understand that the secret word is more likely to be similar to `house` than `blue`, but we only have the informaion about its relative similarity to these two words.

Let's assign a score to each word in the vocabulary based on this observation:

{{< figure src=/articles_data/solving-contexto/scoring-1.png caption="Scoring words based on 2 guesses">}}

We assign +1 score to those words that are closer to `house` than `blue` and -1 score to those words that are closer to `blue` than `house`.

Now, we can use this score to rank the words in the vocabulary and use word with the highest score as our next guess.

Let's see how scores change after we make a third guess:

{{< figure src=/articles_data/solving-contexto/scoring-2.png caption="Ranking words based on next 2 guesses">}}

We can generalize this approach to any number of guesses.
The simpliest way to do this is to sample pairs of guesses and update the score iteratively.

That's it! We can use this approach to suggest words one by one and extend guess list accordingly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually expected the approach would be to use the recommendation API, or even just search with all the embeddings stored in Qdrant. That might be an interesting case to promote, but there is no reference to Qdrant, while there could be.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is reference in the last section

@generall generall requested a review from joein December 12, 2022 12:48
@joein
Copy link
Member

joein commented Dec 12, 2022

@generall I agree with @kacperlukawski that we can increase users engagement if we provide some interactive tool like a website to let them easily solve a game on their own

Speaking about the PR, I consider it ready to publication except the unresolved merge conflict

@generall
Copy link
Member Author

Ok, done https://solve-contexto.qdrant.tech/

@joein
Copy link
Member

joein commented Dec 13, 2022

cool!

Word #86 occurred to be a tough one
I tried to solve it with the demo, made 64 attempts, then accidentally entered a string into input field and it broke the demo.
I refreshed the page, the progress was removed and I couldn't restore it.

Maybe we can set constant random seed and also forbid anything besides numbers in the input form?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants