Linear regression, classification, and resampling

Content

Description
Learning Outcomes
Assignments
Contacts
Delivery of the Learning Module
Schedule
Requirements
Resources
- How to get help
Folder Structure

Description

This module introduces the skills required to design, implement, and test basic statistical learning methods, including regression, classification, and clustering, as well as validating models with resampling techniques. It compares the differences between modeling for prediction purposes and inference, exploring the trade-offs between prediction accuracy, model interpretability, and the bias-variance trade-off. Participants also gain exposure to key tools such as Pandas, NumPy, and scikit-learn.

Learning Outcomes

By the end of the module, participants will be able to:

Implement and interpret the results from several supervised learning approaches for classification and regression, using libraries like pandas, numpy, and scikit-learn.
Use resampling methods such as cross-validation and bootstrapping to select and evaluate models.
Determine the requirements for reproducible machine learning and ensure consistency across model implementations.
Analyze the uncertainties and limitations associated with model results and understand the ethical implications of applying these models in real-world decision-making.
Effectively explain the trade-offs and considerations for various statistical methods to both technical and non-technical audiences.
Apply pandas, numpy, and scikit-learn for data manipulation, model implementation, and evaluation.

Assignments

Participants should review the Assignment Submission Guide for instructions on how to complete assignments in this module.

Assignment 1

Assignment 2

Assignment 3

Assignment Due-dates

Assessment	Content	Due Date
Assignment 1	Classification (Sessions 1, 2)	Jan 12
Assignment 2	Regression (Sessions 3, 4)	Jan 19
Assignment 3	Clustering & Resampling (Sessions 5, 6)	Jan 26

Contacts

Questions can be submitted to the #cohort-5-help channel on Slack

Technical Facilitator: Julia. Questions can be sent via Slack
Learning Support Staff: Kasra. Questions can be sent via Slack
Learning Support Staff: Vishakh. Questions can be sent via Slack
Learning Support Staff: Dmytro. Questions can be sent via Slack

Delivery of the Learning Module

This module will include live learning sessions and optional, asynchronous work periods. During live learning sessions, the Technical Facilitator will introduce and explain key concepts and demonstrate core skills. Learning is facilitated during this time. Before and after each live learning session, the instructional team will be available for questions related to the core concepts of the module. Optional work periods are to be used to seek help from peers, the Learning Support team, and to work through the homework and assignments in the learning module, with access to live help. Content is not facilitated, but rather this time should be driven by participants. We encourage participants to come to these work periods with questions and problems to work through. Participants are encouraged to engage actively during the learning module. They key to developing the core skills in each learning module is through practice. The more participants engage in coding along with the instructional team, and applying the skills in each module, the more likely it is that these skills will solidify.

The technical facilitator will introduce the concepts through a collaborative live coding session using the Python notebooks found under /01_materials/notebooks/. Slides can be found under /01_materials/slides/.

Schedule

Week 1 will focus on intro and classification methods
Week 2 will focus on regression methods
Week 3 will focus on clustering and statistical inference topics

Requirements

Participants are expected to have completed Shell, Git, and Python learning modules.
Participants are encouraged to ask questions, and collaborate with others to enhance learning.
Participants must have a computer and an internet connection to participate in online activities.
Participants must not use generative AI such as ChatGPT to generate code in order to complete assignments. It should be use as a supportive tool to seek out answers to questions you may have.
We expect participants to have completed the steps in the onboarding repo.
We encourage participants to default to having their camera on at all times, and turning the camera off only as needed. This will greatly enhance the learning experience for all participants and provides real-time feedback for the instructional team.

Resources

Feel free to use the following as resources:

Documents

Textbook: Data Science: A First Introduction
Introduction to Statistical Learning with Python Documentation (ISLP)

Videos

Introduction to Statistical Learning with Python Video Playlist

Simple Linear Regression

Linear Regression, explained in 2 minutes
Linear Regression, Clearly Explained!!!

Multiple linear regression, interactions, qualitative predictors

Multiple Regression, Clearly Explained!!!

Classification (logistic regression, generative models)

StatQuest: K-nearest neighbors, Clearly Explained

Resampling methods (CV, bootstrap) and Linear model selection and regularization

Machine Learning Fundamentals: Cross Validation
Bootstrapping Main Ideas!!!
Bootstrapping Method
$k$-fold Cross Validation

Alternative Textbook: Data Science: A First Introduction (Chapters 5-10)

How to get help

Folder Structure

.
├── .github
├── .gitignore
├── 01_materials
├── 02_activities
├── 03_instructional_team
├── 04_this_cohort
├── LICENSE
├── README.md
└── steps_to_ask_for_help.png

.github: Contains issue templates and pull request templates for the repository.
materials: Module slides (.pdf) used during learning sessions.
activities: Contains assignments.
instructional_team: Resources for the instructional team.
this_cohort: Additional materials and resources for this cohort.
.gitignore: Files to exclude from this folder, specified by the Technical Facilitator
LICENSE: The license for this repository.
README.md: This file.
steps_to_ask_for_help.png: Guide on how to ask for help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Linear regression, classification, and resampling

Content

Description

Learning Outcomes

Assignments

Contacts

Delivery of the Learning Module

Schedule

Requirements

Resources

Documents

Videos

Simple Linear Regression

Multiple linear regression, interactions, qualitative predictors

Classification (logistic regression, generative models)

Resampling methods (CV, bootstrap) and Linear model selection and regularization

How to get help

Folder Structure

Files

README.md

Latest commit

History

README.md

File metadata and controls

Linear regression, classification, and resampling

Content

Description

Learning Outcomes

Assignments

Contacts

Delivery of the Learning Module

Schedule

Requirements

Resources

Documents

Videos

Simple Linear Regression

Multiple linear regression, interactions, qualitative predictors

Classification (logistic regression, generative models)

Resampling methods (CV, bootstrap) and Linear model selection and regularization

How to get help

Folder Structure