Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doesn't understand context #29

Open
hwsamuel opened this issue Nov 27, 2020 · 1 comment
Open

Doesn't understand context #29

hwsamuel opened this issue Nov 27, 2020 · 1 comment

Comments

@hwsamuel
Copy link

The library seems to be working more like a dictionary look up for swear words. For example, it can correctly tag "fucking idiot" as negative, but also tags "fucking awesome!" as negative. Maybe the training set's features were uni-grams?

@menkotoglou
Copy link

From my point of view, that happens because of the learning algorithm the library uses. By tokenizing each word, "fucking" gets a huge probability of being profane, since it is profane in any context. For example, you cannot say "fucking awesome!" in a professional environment. If you place "fucking awesome!" in clean_data.csv, you will label as 1 (profane), not 0(not profane).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants