The aim of this project is to do a deep analysis on a dataset in order to achieve a scope. In our case we have chosen the "Facebook Comment Volume" dataset, a set containing a lot of useful information regarding a Facebook post (like number of comments, number of shares, etc.).
The aim of our analysis is to answer at the following question: May the length of a post influence its reading and consequently also the number of comments and shares it will receive?
For answering to this problem, we walked through two different preliminary phases:
- Data Cleaning of the dataset,
- Exploratory analysis.
After these we performed two Machine Learning technics named Supervised Learning and Unsupervised Learning to predict which will be the shares for a post in the future.
First of all, we considered a significant number of datasets, each of which very interesting as in terms of topics covered as for their attribute’s configuration. We gradually discarded some of them, keeping all sets that at the same time had a better structure and aroused our interest.
In the end we opted for "Facebook Comment Volume".
The most important reason we have chosen the Facebook dataset was the argument factor; we were looking for a set in which we could apply all concepts learned during the course, but at the same time the dataset had to capture our attention and it had to be stimulant for subsequent studies and analysis. A second criterion was dimension; we were aiming for a sufficiently large dataset, both regarding entries and attributes. In fact, working with a large number of information, we are able to produce more consistent results for a large number of studies, like explorative analysis, supervised learning and unsupervised learning. A third parameter for the choice was the structure; the dataset on which we desired to apply our work had to contain significant values, namely attributes able to adequately describe the application reality of interest.