Skip to content

Latest commit

 

History

History
48 lines (31 loc) · 2.36 KB

Student_Performance.md

File metadata and controls

48 lines (31 loc) · 2.36 KB

Student Performance Data Set

Dataset

Student Performance Data Set

Data Set Information

This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).

Abstract

- -
Data Set Characteristics Multivariate
Attribute Characteristics Integer
Number of Attributes 649
Number of Instances 33
Associated Tasks Classification, Regression
Missing Values? N/A

Source

Paulo Cortez, University of Minho, Guimarães, Portugal, http://www3.dsi.uminho.pt/pcortez

Result

Paper - Using Data Mining to Predict Secondary School Student Performance

Use "Math Course" Dataset

Use the following attribute to predict final grade in five categories (A, B, C, D, E, F) (traveltime, studytime, failures, schoolsup, famsup, activities, paid, internet, nursery, higher, romantic, freetime, goout, Walc, Dalc, health)

Measure the accuracy of the test subset (30% of instances)

Model Accuracy
AdaBoost Scikit Learn 0.3277

Use the following attribute to predict final grade in two categories (Pass(A, B, C, D), Fail(F)) (traveltime, studytime, failures, schoolsup, famsup, activities, paid, internet, nursery, higher, romantic, freetime, goout, Walc, Dalc, health)

Measure the accuracy of the test subset (30% of instances)

Model Decision Stump Accuracy
AdaBoost From Scratch 16 0.6471