Hadoop-ApacheSpark-Analysis

This is the project done in collaboration with my colleagues Roberta Pappolla and Lorenzo Ferri. The scope of the project was a simulation of a machine learning/data science project on a big dataset. Thus, a cluster computing framework was used: Hadoop/Apache Spark. Various ML techniques were deployed: Classification, Clustering, Regression, DImensionality Reduction, Feature Engineering, etc.

NOTE: Most notebooks have the comments added in Italian language, sorry for that! I'm available to clarify anything, just get in touch.

Contrbutions are more than welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
_Files		_Files
.gitattributes		.gitattributes
.gitignore		.gitignore
1_DataPreparation_Clustering_FeaturesEngineering.ipynb		1_DataPreparation_Clustering_FeaturesEngineering.ipynb
2_DataUnderstanding_DimensionalityReduction.ipynb		2_DataUnderstanding_DimensionalityReduction.ipynb
3_Classification_Regression.ipynb		3_Classification_Regression.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hadoop-ApacheSpark-Analysis

Show some 💚 by starring this repository!

About

Releases

Packages

Languages

emailic/Hadoop-ApacheSpark-Clustering-Classification-Dimensionality-Reduction

Folders and files

Latest commit

History

Repository files navigation

Hadoop-ApacheSpark-Analysis

Show some 💚 by starring this repository!

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages