This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).
- | - |
---|---|
Data Set Characteristics | Multivariate |
Attribute Characteristics | Integer |
Number of Attributes | 649 |
Number of Instances | 33 |
Associated Tasks | Classification, Regression |
Missing Values? | N/A |
Paulo Cortez, University of Minho, Guimarães, Portugal, http://www3.dsi.uminho.pt/pcortez
Paper - Using Data Mining to Predict Secondary School Student Performance
Use "Math Course" Dataset
Use the following attribute to predict final grade in five categories (A, B, C, D, E, F) (traveltime, studytime, failures, schoolsup, famsup, activities, paid, internet, nursery, higher, romantic, freetime, goout, Walc, Dalc, health)
Measure the accuracy of the test subset (30% of instances)
Model | Accuracy |
---|---|
AdaBoost Scikit Learn | 0.3277 |
Use the following attribute to predict final grade in two categories (Pass(A, B, C, D), Fail(F)) (traveltime, studytime, failures, schoolsup, famsup, activities, paid, internet, nursery, higher, romantic, freetime, goout, Walc, Dalc, health)
Measure the accuracy of the test subset (30% of instances)
Model | Decision Stump | Accuracy |
---|---|---|
AdaBoost From Scratch | 16 | 0.6471 |