Skip to content

Use KMeans to categorize @WhiteHouse tweet topics and d3.steamgraph to visualize

Notifications You must be signed in to change notification settings

dereklieu/whitehouse-tweet-topics

Repository files navigation

Happening Now

What is this?

I used k-means clustering, a machine learning algorithm, to group tweets from the Obama administration. As this is an unsupservised learning algorithm, I had next to no control over how the clusters would form, aside from some basic normalization (removing URL's, replacing hashtags with words, stemming, etc).

Once I had these clusters, I gave them group titles based on what I observed to be the most common concept.

I visualized the results using d3. You can see the work at http://lieu.io/whitehouse-tweet-topics/.

The raw data dumps from Twitter are available in the repo here. The scripts I ultimately used to do the classification, in addition to a few dead-ends, are here.

Who am I?

I'm an engineer and designer working at DevelopmentSeed in Washington DC. You can find me on Twitter.

About

Use KMeans to categorize @WhiteHouse tweet topics and d3.steamgraph to visualize

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published