Skip to content

HongHuangNeu/TweetClustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

Text Clustering program in Scala, used for the "sentiment radar" project of Information Retrieval course. K-Means algorithm is used for clustering.

idAssign.scala transform a text file(one line per file) into a new file with key-text pairs

parser.scala use the output of idAssign.scala and remove non-alphabetic and non-digit characters in each line of text. It also performs stemming on the text.

tfidf.scala takes the output of parser.scala, extract tf-idf features and perform k-Means clustering using Apache Spark MLlib.

sampleTweets.scala takes a random sample of the tweets

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages