This project uses the data from the UCI Machine Learning Repository, in particular the dataset "Individual Household Electric Power Consumption". A real example related to the electricity consumption of a house for 4 years is analyzed. A normal population will be generated in which different samples will be extracted and their differences will be studied. Then, we will carry out a hypothesis test of equality of variances between two attributes of the dataset.
In addition, a study of these data will be carried out using regression techniques, which will allow us to determine if there are linear relationships between some of the given variables. Then we will reduce the dimension of the data through principal component analysis (PCA), clusters and classification trees, also making an interpretation of them. Finally, we will carry out an analysis of variance or ANOVA to compare the means of a characteristic in several populations.
The dataset can be found in: https://archive.ics.uci.edu/ml/datasets/individual+household+electric+power+consumption
Documentation about Phase1 and Phase2 of this project can be found at phase1-report and phase2-report