This is the project done in collaboration with my colleagues Roberta Pappolla and Lorenzo Ferri. The scope of the project was a simulation of a machine learning/data science project on a big dataset. Thus, a cluster computing framework was used: Hadoop/Apache Spark. Various ML techniques were deployed: Classification, Clustering, Regression, DImensionality Reduction, Feature Engineering, etc.
NOTE: Most notebooks have the comments added in Italian language, sorry for that! I'm available to clarify anything, just get in touch.
Contrbutions are more than welcome!