-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
word2vec new results! #36
Comments
@peparedes @geneyoo @xih |
This is super super good! I think we are into something with the Google Anyways, we should test with the new dataset and see what we get. We Good job Pierre! P On Fri, May 1, 2015 at 9:08 PM, Pierre Karashchuk [email protected]
|
Awesome. The training over our LiveJournal dataset finished, and you can find the models in: Let me know if you prefer the binary version for the google format, in which case I can convert it to a .bin file. We can start comparing results from google news vs. our "user-id tagged" model now. I'll write a script to test some word similarities. |
Hey Pablo, If I run a few queries for each life event and give you 100 matches, could --Pierre On Fri, May 1, 2015 at 9:18 PM, Pablo Paredes [email protected]
|
Sure, I can put them to run over night... the only problem is that the P On Fri, May 1, 2015 at 9:49 PM, Pierre Karashchuk [email protected]
|
@geneyoo Actually, google's c-text format looks like the bin format! @peparedes Oh, we can run it during the day tomorrow if that's better. |
Yeah... it is amazing how much noise "bad" turkers enter... but just send P On Fri, May 1, 2015 at 9:55 PM, Pierre Karashchuk [email protected]
|
@peparedes You can find it in: There is a file called "life_events.csv", which has a list of life events with their ids. I would give the turkers the life event description (e.g. major change in sleeping habits) and then the list of sentences, and ask them to copy the ones which correspond to the description. |
So I built the multi query system that we talked about, where we would query with multiple phrases and see which sentences match the best.
This works really really well!!
Here are the phrases used to query the system (these are phrases Pablo, Dennis, and I grabbed from the query results on "illness"):
Here are the top 100 results from the multi-query:
The text was updated successfully, but these errors were encountered: