Analyze sentiment of tweets i.e. positive, negative and neutral by applying convolution neural network on vector representations of words using Word2Vec. US Airline data is used in the demonstration.
Twitter data was scraped from February of 2015 and contributors were asked to first classify positive, negative, and neutral tweets, followed by categorizing negative reasons (such as “late flight” or “rude service”).
- lasagne - Create conv-net
- nltk - Data pre-processing
- sklearn - Provide useful tools e.g. stratified cross-validation
Begin by creating a directory e.g. twitter_sentiment for stroing training data, Word2Vec model and CNN model, and set the FILE_PATH to this directory.
- data: contains training data (airline data in this case) and test data.
- wordvec: word embedding model is saved here.
- model: cnn model is saved here.
train cnn using model_airline,
jupyter notebook model_airline.ipynb
it could take some time to finish, and when it is done a cnn.npz file would be created.
make predictions on twitter data,
jupyter notebook predictions.ipynb
airline_data = Data('Airline-Sentiment-2-w-AA.csv', FILE_PATH)
airline_df = airline_data.csv_df(['airline_sentiment', 'text']) # load data
airline_data.pre_process(airline_df) # pre-process data
airline_df.head()
airline_sentiment | text | tokenized | |
---|---|---|---|
0 | neutral | What said | [said] |
1 | positive | plus youve added commercials to the experienc... | [plus, youve, added, commercials, experience, ... |
2 | neutral | I didnt today Must mean I need to take anothe... | [didnt, today, must, mean, need, take, another... |
3 | negative | its really aggressive to blast obnoxious ente... | [really, aggressive, blast, obnoxious, enterta... |
4 | negative | and its a really big bad thing about it | [really, big, bad, thing] |
model gets trained
train freq [1890 2479 2479]
val freq [473 620 620]
Extracting ...
Extracting ...
Training cv 1 ...
[LibSVM]0.880910683012
train freq [1890 2479 2479]
val freq [473 620 620]
Extracting ...
Extracting ...
Training cv 2 ...
[LibSVM]0.879743140689
train freq [1890 2479 2479]
val freq [473 620 620]
Extracting ...
Extracting ...
Training cv 3 ...
[LibSVM]0.877408056042
then use it to predict on collected airline tweets.
Group the tweets based on the sentiment classified by CNN model, and we can find the most frequent words from each group,
ALL = Prediction(FILE_PATH, 'FOUR_AIRLINES.csv', max_len_train=19)
ALL.prepare_data(['text', 'airline'], wv_size=600)
ALL.get_result(n_preview=10, n_top = 20, name='ALL_result',verbose=False)
===Positive===
[('thanks', 170), ('thank', 139), ('great', 136), ('flight', 120), ('service', 65),
('love', 48), ('fly', 44), ('crew', 39), ('leggings', 38), ('best', 38),
('flying', 35), ('much', 34), ('night', 34), ('good', 34), ('always', 32),
('us', 31), ('home', 31), ('time', 30), ('last', 30), ('got', 30)]
===Negative===
[('flight', 400), ('get', 155), ('stop', 144), ('tickets', 137), ('time', 128),
('seaworld', 127), ('selling', 123), ('via', 122), ('urge', 121), ('service', 108),
('customer', 100), ('still', 95), ('one', 92), ('delayed', 91), ('flights', 83),
('us', 78), ('bag', 75), ('flying', 74), ('hours', 72), ('hour', 70)]
Take a look of the context of some of the most frequent word used in negative grouped tweets,
ALL.check(word='flight', sentiment=3, n_view=10)
"@AmericanAir with that totally random flight cancellation <ed><U+00A0><U+00BD><ed><U+00B1><U+0080>"
"Baffled by @AmericanAir boarding passengers with full knowledge that the captain is still on an inbound flight. SMH. #FAIL"
"#TFW you finally get off your @AmericanAir flight that taxied for over an hour. <ed><U+00A0><U+00BD><ed><U+00B9><U+008C><ed><U+00A0><U+00BC><ed><U+00BF><U+00BE><ed><U+00A0><U+00BC><ed><U+00BE><U+0089><ed><U+00A0><U+00BC><ed><U+00BE><U+0089><ed><U+00A0><U+00BD><ed><U+00B1><U+008F><ed><U+00A0><U+00BC><ed><U+00BF><U+00BD> https://t.co/TyAlTpAWFC"
"@AmericanAir unfortunately made the mistake of booking @united, who put me on an @aircanada flight. Next time! <ed><U+00A0><U+00BD><ed><U+00B8><U+0093>"
"@AmericanAir the flight attend made me stow iPad for landing b/c it "has a keypad". Travel weekly. Never had to stow. Is this new? #AA1164"
"@AmericanAir I have a question about my seats on upcoming flight."
"@AmericanAir I joined Twitter to tell people how bad you are! Second time missing a connecting flight in LA but somehow isn't your fault!"
"RT @RandyStillinger: World War II veterans get a hero's welcome by the #AATeam upon arrival on @AmericanAir #SoaringValor charter flight wi�"
"At LAS with a colleague waiting for our golf clubs because they missed our flight. Might not make our tee time. Unacceptable @AmericanAir"
"@AmericanAir I'm not Ralph ;) That's your awesome flight attendant."
This could help airline to improve on relevant services.