GitHub - sarmstr5/cs657_mining_massive_datasets: Homework assignments for CS657 mining massive datasets. Assignments are in Spark and Hadoop using the Python API. Assignments include wordcount stuff, association rule mining, linear regression, and recommender systems.

Homework assignments for CS657, mining massive datasets. Assignments are in Spark and Hadoop using the Python API. Assignments include wordcount stuff, association rule mining, linear regression, and recommender systems. Final project is not in this repo but in my NOVA HTI personal repo. The final project included scraping craigslist ads under similar topics to pull verbiage from the ads and use that verbiage to cluster the ads and see how similar those topics were. Set up an automatic scrapper that pushed results to database. Pulled the data from the database and performed NLP and topic modeling in Spark to do the clustering.

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
association_rule_mining		association_rule_mining
craigslist_clustering		craigslist_clustering
hadoop_wordcount		hadoop_wordcount
linear_regression		linear_regression
movie_recommender_system		movie_recommender_system
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
eigen.ipynb		eigen.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

License

sarmstr5/cs657_mining_massive_datasets

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages