Skip to content

Latest commit

 

History

History
147 lines (110 loc) · 12 KB

course_descriptions_session_2.md

File metadata and controls

147 lines (110 loc) · 12 KB

Foundational Technical Skills I

All participants in the Data Science and Machine Learning Software Foundations Certificates are required to complete two Foundational Skills modules. Foundational Technical Skills I will teach the following topics:

Introduction to Unix Shell, Git, and GitHub

This module topic provides a comprehensive introduction to Unix shell language, covering file and directory navigation, command usage, script creation, and basic functions involving pipes, filters, and loops. Participants will get started with version control and GitHub, exploring the ethical implications regarding reproducibility. Subtopics include Git setup, repository management (recording, viewing, and undoing changes), branch creation, and collaborative workflows. Advanced commands, debugging, and history editing will be introduced. Participants will learn effective problem-solving techniques with Google and Stackflow, emphasizing reproducibility and documentation. This module topic emphasizes ethics and equity considerations in projects, fostering discussion-based learning with pre-class readings and live coding exercises.

Learning Outcomes:

  • Develop the ability to comfortably access the terminal and proficiently write scripts using basic commands, variables, pipes, filters, and loops.
  • Understand how to utilize version control systems effectively for preserving personal work, accessing and editing previous code versions, collaborating with peers, and identifying and debugging errors in code.
  • Develop the skills to independently troubleshoot issues by identifying problems, conducting research, and formulating questions using components of reproducibility.
  • Identify ethical considerations within the field, including scrutinizing the composition of datasets for biases and considering the historical context of power abuses.

Module Delivery: Technical Facilitator-led live webinars.

Introduction to Python

This module topic will focus on the essentials of coding in Python and ethical considerations of using algorithms. Participants will learn how to design functions, repeat code using loops, store data in lists, conduct code testing and debugging, and manipulate data using various data analysis and visualization tools such as numpy, pandas, matplotlib, seaborn, and plotly. Participants will participate in a facilitated discussion about the Tuskegee experiment, its long-term effects, and the trustworthiness of AI applications in disparate social systems.

Learning Outcomes:

  • Understand various Python data types and their role in coding.
  • Implement the Function Design Recipe to create functions in Python and reduce duplication.
  • Utilize numpy and pandas to analyze a dataset and manipulate numerical and tabular data in Python.
  • Interact with databases using Python, using visualization techniques like matplotlib, seaborn, and plotly.
  • Learn debugging and testing techniques to troubleshoot errors and ensure code correctness.
  • Understand the ethical issues with software and be prepared to confidently answer technical job interview questions.

Module Delivery: Technical Facilitator-led live webinars.

Building Research Software

Much research these days is done using software. Researchers need to develop comfort with building, maintaining and improving high-quality software. This module topic focuses on equipping students with the skills to build robust software that can be used to answer research questions. It focuses on how to effectively write short programs, as part of a small team, in a reproducible way. Research software that is built correctly can be used by other teams, not just the researcher who originally wrote it.

Learning Outcomes:

  • Learn how to work as a team within a Git/GitHub setting, including branching, merging, conflicts, and pull requests.
  • Know how to create bug reports and prioritize requests.
  • Develop comfort with using makefiles and configuring programs.
  • Proficiently test software, handle errors, and track provenance.
  • Know how to create Python packages.
  • Acquire comfort in calling APIs.
  • Develop comfort with Docker.

Module Delivery: Technical Facilitator-led live webinars.

Foundational Technical Skills II

All participants in the Data Science and Machine Learning Software Foundations Certificates are required to complete two Foundational Skills modules. Foundational Technical Skills II will teach the following topics:

SQL

SQL is used across the machine learning pipeline, and is a fundamental skill for data scientists to master. This module topic will focus on the technical skills needed for working with SQL, including flat-file datasets (JSON, CSV) ingestion, query design, and relational database management. Additionally, it will examine common data management concerns, data access management, and data privacy adherence. Learners will be introduced to principles around reproducibility, sharing data, and data ethics (for example, respecting those whose data we use). This module topic will also cover professional skills such as communication (with a variety of stakeholders) and documentation

Learning Outcomes:

  • Develop a better understanding of the structure of databases.
  • Save and transport data in CSV and JSON file formats.
  • Familiarity with querying and manipulating data in SQL.
  • Familiarity with the legal framework around sharing data.
  • Analyze data requirements and work with different stakeholders such as analysts and managers.

Module Delivery: Technical Facilitator-led live webinars.

Estimation, Machine Learning, and Testing

This module topic provides the skills required to design, implement, test and validate a variety of supervised learning models. The basics of statistical learning including modelling with the goal of prediction versus inference, prediction accuracy and model interpretability trade-off, and the all-important bias-variance trade-off will be covered. Each section of this module topic will address a unique set of methods used for supervised learning on real data sets.

Learning Outcomes:

  • Understand, implement, and interpret the results from several supervised learning approaches for regression and classification.
  • Utilize resampling methods to extract more information from a data set and to choose the best model.
  • Perform exploratory data analysis for unsupervised learning.
  • Understand what is required for reproducible learning.
  • Appreciate the uncertainties associated with model results and the ethical consequences of acting on these results.

Module Delivery: Technical Facilitator-led live webinars.

Production

Building a model differs significantly from creating a model that is usable by others. This module topic focuses on everything that happens after the model has been put together, specifically addressing machine learning system requirements such as: reliability, scalability, maintainability, and adaptability; feature engineering; model development and deployment; monitoring; and infrastructure and tooling.

Learning Outcomes:

  • Design machine learning systems that are reliable, scalable, maintainable, and adaptable.
  • Apply feature engineering techniques to optimize machine learning models.
  • Deploy machine learning models in production using various strategies.
  • Implement monitoring and alerting systems to troubleshoot and diagnose issues in production.

Module Delivery: Technical Facilitator-led live webinars.

Data Science Skills (following completion of foundational modules)

Participants pursuing the Data Science Certificate will complete the Data Science Skills module. Data Science Skills will teach the following two topics:

Sampling

This module topic will introduce the fundamentals of sampling, probability and survey methodology. It covers various subtopics, including; simple probability samples, stratified sampling, cluster sampling, addressing non-response, estimating and survey quality. Participants will consider the theoretical foundations of different sampling approaches, as well as practical applications of this knowledge in contexts such as market research, political polling and the Canadian census.

Learning Outcomes:

  • Gain proficiency in executing simple probability samples.
  • Understand complex sampling procedures and the tradeoffs involved.
  • Acquire the skills to identify and address sources of error or inaccuracies in data stemming from sampling strategies.
  • Develop an intuition around survey quality.

Module Delivery: Technical Facilitator-led live webinars.

Visualization

Regardless of the quality of your analyses and data-related findings, if you cannot effectively communicate them, their impact will be severely limited. Technical skills in this module topic will focus on a step-by-step walkthrough of choosing, creating and modifying data visualizations using ggplot. Discussions will include general design principles applicable to other data visualization software used in industry and academia (e.g., Python, Tableau, PowerBI).

Incorporating case studies and real-world examples, the ethical components of this module topic will include:

  • ensuring reproducibility with data visualization
  • building awareness of the decision-making that goes into sharing data visually
  • addressing inequity in data visualization by focusing on accessible design.

Learning Outcomes:

  • Acquire the skills to create and customize data visualizations start to finish.
  • Gain insights into the general design principles for creating accessible/equitable data visualizations.
  • Develop an understanding of data visualization as purposeful/telling a story (and the ethical/professional implications).

Module Delivery: Technical Facilitator-led live webinars.

Machine Learning Skills (following completion of foundational modules)

Participants pursuing the Machine Learning Software Foundations Certificate will complete the Machine Learning Skills module. Machine Learning Skills will teach the following two topics:

Algorithms & Data Structures

There is often a need to work out which algorithm or data structure should be used given some practical situation. This module topic focuses on developing comfort with algorithms and data structures using Big-O notation, recursive functions, and data structures.

Learning Outcomes:

  • Assess options and choices around fundamental algorithms and data structures using Big-O notation.
  • Develop comfort with recursive functions.
  • Identify appropriate data structures.
  • Transform a client-led problem into an optimization challenge and identify opportunities for improvement.
  • Identify causes for slow-running code and implement strategies to optimize performance.

Module Delivery: Technical Facilitator-led live webinars, each lasting 2.5 hours for a total of ~25 hours.

Deep Learning Foundations

This module topic builds upon the statistical foundation provided in the Estimation, Testing & Machine Learning, adding theory around linear methods and classification. The module topic will also focus on model assessment, inference and boosting, and sets the foundation for deep learning with neural networks and related approaches.

Learning Outcomes:

  • Apply advanced linear methods such as Lasso and Ridge regression for feature selection and regularization, and understand their theoretical underpinnings.
  • Evaluate machine learning models with techniques such as hypothesis testing and confidence intervals, and interpret the results in the context of the problem domain.
  • Apply boosting algorithms such as XGBoost and LightGBM to improve the performance of machine learning models on large and complex datasets.
  • Implement neural network architectures such as multi-layer perceptrons (MLPs) and convolutional neural networks (CNNs) in Python and/or R, and understand how to tune their hyperparameters for optimal performance.
  • Discuss ethical considerations in machine learning, such as fairness, accountability, and transparency, and identify potential biases and issues that may arise in the development and deployment of machine learning models.

Module Delivery: Technical Facilitator-led live webinars.