Homework assignments for CS657, mining massive datasets. Assignments are in Spark and Hadoop using the Python API. Assignments include wordcount stuff, association rule mining, linear regression, and recommender systems. Final project is not in this repo but in my NOVA HTI personal repo. The final project included scraping craigslist ads under similar topics to pull verbiage from the ads and use that verbiage to cluster the ads and see how similar those topics were. Set up an automatic scrapper that pushed results to database. Pulled the data from the database and performed NLP and topic modeling in Spark to do the clustering.
-
Notifications
You must be signed in to change notification settings - Fork 1
Homework assignments for CS657 mining massive datasets. Assignments are in Spark and Hadoop using the Python API. Assignments include wordcount stuff, association rule mining, linear regression, and recommender systems.
License
sarmstr5/cs657_mining_massive_datasets
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Homework assignments for CS657 mining massive datasets. Assignments are in Spark and Hadoop using the Python API. Assignments include wordcount stuff, association rule mining, linear regression, and recommender systems.
Topics
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published