Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude retweeted_user_name/quoted_user_name from mentioned_user_names ? #20

Open
boogheta opened this issue Feb 24, 2021 · 6 comments
Open

Comments

@boogheta
Copy link
Member

To be discussed

(identified within the fakers corpus while questionning why the proportion of tweets with mentioned_user_name was over 70%)

cc @bmaz @Yomguithereal

@Yomguithereal
Copy link
Member

It seems mistake-inducing to me that those would be in mentioned_user_names indeed. I understand from the name that we are just speaking about people being "@" in the text of the tweet no?

This said a case could be made to have a column with a different name containing all involved participant as it is sometimes useful to build networks. But then I feel we can also build this list from the data we already have so I would lean in favor of not adding columns to stay lean and clear, no?

@boogheta
Copy link
Member Author

Yes that's exactly how I see it as well

@Yomguithereal
Copy link
Member

So, for the reference, things happen as is right now because the API returned entitites.user_mentions contain those users, not because the fallback extraction of mentions from the tweet's text fails. This means that we basically have a semantic argument with Twitter themselves here. This requires a bit more thorough discussion.

@boogheta
Copy link
Member Author

boogheta commented Mar 1, 2021

Yep indeed
Although, I believe that for data analysis usecases, what makes most sense is being able to rebuild this full list from retweeted_user_name, quoted_user_name and mentionned_user_names, rather than having it within mentionned_user_names and having to filter out retweeted and quoted when we want only those

@bmaz
Copy link
Collaborator

bmaz commented Mar 1, 2021

Do we consider that replies and mentions are the same thing ?

@boogheta
Copy link
Member Author

boogheta commented Mar 2, 2021

Good question indeed, I didn't think about this, but we also have the replied_to_user_name as well to consider. If we decide to remove quoted and retweeted from mentioned, it would make sense to also remove replied_to maybe indeed!

@boogheta boogheta transferred this issue from medialab/gazouilloire May 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants