Skip to content

Commit

Permalink
remove some verbs from tweets
Browse files Browse the repository at this point in the history
  • Loading branch information
ahangarha committed Mar 1, 2020
1 parent 9e4129a commit cfe28c3
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions twc.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ def clean_tweet(tweet):
tweet = remove_emoji(tweet)
normalizer = Normalizer()
tweet = normalizer.normalize(tweet)
tweet = re.sub(r'ن?می[‌]\S+','',tweet) # removes verbs such as می‌شود or نمی‌گویند
tokens = word_tokenize(tweet)
tokens = [token for token in tokens if token not in stopwords.persian]
tokens = [token for token in tokens if token not in stopwords.english]
Expand Down

0 comments on commit cfe28c3

Please sign in to comment.