This is the Harvard EDX Data Science capstone project.
The data for the project is in a .csv in the data folder.
The .R file will read the .csv and also create a .rda file in RDA folder.
The .Rmd file will read from the RDA folder, but it already there for your convenience.
The final report is saved via PDF.
These are the directions for this project:
For this project, you will be creating a supervised machine learning algorithm using techniques that go beyond standard linear regression. You will have the opportunity to use a publicly available dataset to solve the problem of your choice. The UCI Machine Learning Repository and Kaggle are good places to seek out a dataset. Kaggle also maintains a curated list of datasets that are cleaned and ready for machine learning analyses.
The ability to clearly communicate the process and insights gained from an analysis is an important skill for data scientists. You will submit a report that documents your analysis and presents your findings, with supporting statistics and figures. The report must be written in English and uploaded as both a PDF document and an Rmd file. Although the exact format is up to you, the report should include the following at a minimum:
an introduction/overview/executive summary section that describes the dataset and summarizes the goal of the project and key steps that were performed; a methods/analysis section that explains the process and techniques used, such as data cleaning, data exploration and visualization, any insights gained, and your modeling approach; a results section; and a conclusion section. Your project submission will be graded both by your peers and by a staff member. The peer grading will give you an opportunity to check out the projects done by other learners.