Skip to content

UofT-DSI/r

Repository files navigation

Introduction To R

Welcome! This course, offered by the University of Toronto's Data Sciences Institute Professional Programming, focuses on the use of tools and skills needed to handle the extensive data generated by advancing information technology. One prominent tool is R, a freely available open-source language and environment specifically designed for data science. The course provides comprehensive coverage of R and data science topics, demonstrating their practical applications using RStudio.

Contents

  1. Description
  2. Learning Outcomes
  3. Course Contact
  4. Delivery Instructions
  5. Course Notes
  6. Materials
  7. Schedule
  8. Marking Scheme

Course Overview

Description

This course is designed as an immersive learning experience. The estimated time commitment consists of approximately 20 hours of in-class instruction, and 8 hours of optional but highly recommended tutorials. In total, learners can expect to dedicate approximately 28 hours over a span of 10 days to successfully complete the course.

The first part of this course teaches R with a focus on manipulating and visualizing data. Learners will get set up with a functional RStudio workflow, use different file types, transform data tables, import and manipulate data, use functions and loops, create data visualizations, and learn how to solve problems with their programming. Both base R and tidyverse methods are taught. To work reproducibly, learners will create R Projects. The second part of the course will cover the ethics of consent, Equity, Diversity & Inclusion (EDI) training, and professional skills including presentation, project management, and data security. Finally, the course will conclude with an industry case study.

This course is designed for those who have a degree in something other than Computer Science/Statistics who are looking to enhance their data science skills for their career.

Learning Outcomes

By the end of this course, learners will be able to:

  1. Gain proficiency in utilizing R by understanding various options available for working with R, comprehending the purpose of RStudio and R Projects, adhering to best coding practices in R, and recognizing the position of R among other data science tools.

  2. Apply manipulation and wrangling techniques to describe and define the characteristics of datasets. Learners will become proficient in accessing built-in R datasets and importing external datasets into R, identifying and describing data structures, reshaping datasets through manipulation techniques, detecting missing values, cleaning data, summarizing data, exporting data, and reporting findings. Learners will also be able to produce a wide range of informative and visually appealing charts, graphs, and plots to effectively communicate insights and patterns hidden within the data.

  3. Develop the skills and strategies necessary to diagnose and fix errors in R. Learners will gain proficiency in interpreting error messages, identifying common coding mistakes, and utilizing debugging tools and techniques effectively. Learners will be able to apply systematic approaches to troubleshoot and resolve errors in their R code.

  4. Demonstrate a comprehensive understanding of the concept of consent in data-based studies. Learners will be able to identify the ethical considerations and legal requirements related to obtaining informed consent from participants, and apply appropriate strategies to ensure ethical data usage.

  5. Acquire the skills and knowledge required to create compelling presentations and effectively manage projects using R. Learners will demonstrate proficiency in utilizing R's visualization and presentation packages to generate visually appealing and informative slides, charts, and graphs. They will also gain proficiency in project management techniques specific to R, including organizing and structuring R code, as well as effectively documenting code.

Logistical Information

Course Contact

  • Instructor for this course is Julia Gallucci, PhD student (she/her). For emails to the instructor, use [email protected]. Must use the subject line DSI-IntroR. E.g., DSI-IntroR: Inquiry about Lecture I. Response time: 48 hrs on week days, 48-72 hrs on weekends.
  • Course Support for this course is Jessie Wang, PhD student (she/her). For emails to the course support, use [email protected]. Must use the subject line DSI-IntroR. E.g., DSI-IntroR: Inquiry about Lecture I. Response time: 48 hrs on week days, 48-72 hrs on weekends.

Delivery Instructions

Classes

  • The course will span 10 days, with classes scheduled from December 11-20th, 2023. Classes are 6 PM - 8:30 PM EST on Mondays-Thursdays, and 9 AM - 11:30 AM EST on Saturdays. The format of the course will be online and synchronous, conducted through Zoom (Meeting ID and passcode provided in the email subject 'Data Sciences Institute, UofT – Welcome & Pre-Class info'). Being mindful of online fatigue, there will be one or two brief breaks during each class. In the event that you encounter any difficulties joining the live lectures, it is essential to email the Instructor. Provide a description of the issue, along with the time and date of occurrence (including a screenshot if available), to ensure that participation marks are not affected. If, due to unforeseen circumstances, the live (synchronous) lecture is disrupted or cannot be conducted, the instructor will upload a recording and notify the students via email. It is the responsibility of the learners to view the recording.

Tutorials

  • Tutorial sessions will be conducted on the same dates as the regular classes. These tutorials are scheduled for Mondays- Thursdays, from 5:30 PM to 6 PM EST and 8:30 to 9 PM EST, as well as Saturdays from 8:30 AM to 9 AM EST and 11:30 AM to 12 PM EST. Attendance at the tutorials is optional, and the structure is flexible. Tutorials provide an opportunity to seek clarification on software-related inquiries, homework, and assignments. The course support will also be leading the tutorial sessions.

Course Notes

  • All course material will be available via IntroductionToR GitHub repository. Folder structure is as follows:

    • Assessments:
      This folder contains assessment files for learners.
    • Lessons-AllFiles:
      This folder contains all files (Rmarkdown, slide-html, slide-PDFs, images, data, etc.) and is designed for the instructor.
    • Lessons-Data:
      This folder contains data only and is designed for the learners. Learners should download and copy this folder as 'data' folder within their R Project.
    • Lessons-PDF:
      This folder contains slide-PDFs only and is designed for the learners. Learners should download the slides. Slides should be referenced before class to prepare or after class to review. During class, there will be mostly live-coding. The end of each slide deck will contain homework for that particular lesson. It is highly recommended that learners attempt these and attend tutorial sessions to seek help.
    • Lessons-Rscripts:
      This folder contains R scripts used by the instructor. It will be updated after each class and learners may download it for reference.
    • Teaching-Notes:
      This folder contains lesson plans only and is designed to guide the instructor.
    • README: README file.
    • .gitignore: List of files to ignore specified by instructor.

Materials

  • Learners must have internet connection, a computer with administrative privileges, a microphone, and all required software installed in order to participate in online activities.
  • Learners must have R (http://www.r-project.org/). We will help with downloading.
  • Learners must have RStudio (Previously: http://www.rstudio.com/; now: https://posit.co/download/rstudio-desktop/). We will help with downloading.
  • GitHub account (https://github.com/).
  • Screen space can be a limitation during online learning since you'll want to see the instructor's screen and have your RStudio open so that you can type along. If you have access to a second monitor or a larger tablet to attend the course while keeping your laptop screen available for coding - this would be great! If not - don't worry, we'll manage!
  • Key texts: General reference
  • Key texts: For specific topics
    • Alexander, 2022, Telling Stories with Data, CRC Press. https://www.tellingstorieswithdata.com/
    • de Graaf, 2019. Managing Your Data Science Projects: Learn Salesmanship, Presentation, and Maintenance of Completed Models, Apress.
    • Healy, 2018. Data Visualization: A Practical Introduction, Princeton University Press
    • Timbers et al., 2021. Data Science: A First Introduction. https://ubc-dsci.github.io/introduction-to-datascience/
    • Wickham, 2021. Mastering Shiny, O'Reilly. https://mastering-shiny.org/
    • Wiley, Matt, Wiley, Joshua F., 2020. Advanced R 4 Data Programming and the Cloud
    • Using PostgreSQL, AWS, and Shiny, Apress.

Schedule

Schedule may be modified as needed, and learners will be informed. Course will be taught using R version 4.2.1 and RStudio Desktop version 2022.06.23.

Class topics

Class Date Topic Slides
0 Before Class 1 Getting set up!
(R/RStudio Installation)
Instructions
1 Monday 11 December
6 PM - 8:30 PM EST
Hello World! And Work practices
(R basics; file types; errors)
0- Introduction
1- Hello World
2- Work Practices
2 Tuesday 12 December
6 PM - 8:30 PM EST
Data in R
(tibbles, strings, factors, times, missing values)
3- Data in R
3 Wednesday 13 December
6 PM - 8:30 PM EST
Manipulation
(filtering, arranging, selecting, mutating, piping, grouping, summarizing)
4-Manipulating
4 Thursday 14 December
6 PM - 8:30 PM EST
Wrangling
(importing data, pivot, joining data, data tables)
5-Wrangling
5 Saturday 16 December
9 AM - 11:30 AM EST
Programming
(custom functions, loops, logic statements, purr, simulations)
6-Programming
6-Extra
6 Monday 18 December
6 PM - 8:30 PM EST
Visualization
(initialization, choosing chart types, ggplot, customizing)
7-Visualization
7 Tuesday 19 December
6 PM - 8:30 PM EST
Shiny, ethics, inequity and professional skills 8-Shiny
9-Ethics
10-Inequity
11-Professional Skills
8 Wednesday 20 December
6 PM - 8:30 PM EST
Industry case study- Kevin Ha NA

In-class code & summaries

Class Date Code Summary
1 Monday 11 December in-class coding examples summary sheet
2 Tuesday 12 December in-class coding examples summary sheet
3 Wednesday 13 December in-class coding examples summary sheet
4 Thursday 14 December in-class coding examples summary sheet
5 Saturday 16 December in-class coding examples summary sheet
6 Monday 18 December in-class coding examples summary sheet
7 Tuesday 19 December in-class coding examples
in-class app example 1
in-class app example 2
summary sheet
8 Wednesday 20 December NA NA

Marking Scheme

Grading is Pass/Fail based on learner's demonstration of learning outcomes. This will be assessed based on two components: 2 assignments and class participation.

Assignments Assignments will be introduced in class, can be discussed in tutorial, and questions can be asked of the Instructor or Course Support over email. Assignments are due by midnight. Please arrange for extensions in advance with the Instructor or Course Support. Please submit assignments via Google Form, as an RMarkdown PDF, titled DSI-IntroR: Assignment X, Name. The assignments can be located in the Assessment directory, or below. You will find an .pdf file (knitted Markdown file) for convenient reading purposes, as well as an .Rmd file that can be modified and submitted. To download the files, click on "Raw" and select "Save as." Please note, assignments will be graded as Pass/Fail based on learner's demonstration of learning outcomes (see Assignment's grading rubrics for further details).

Assignment Due-dates

Assessment Due Date Submission Link
Assignment 1
Rmd modifiable template
Sunday 17 December, by 11:59 PM EST https://forms.gle/ssLQEDirnKEQZuSv7
Assignment 2
Rmd modifiable template
Friday 22 December, by 11:59 PM EST https://forms.gle/UheooTnEzUPRZAQY9

Acknowledgements

  • Slides are adapted from Anjali Silva, originally from Amy Farrow under the supervision of Rohan Alexander, University of Toronto. Slides have been created and modified by Julia Gallucci for Summer 2023.

  • We wish to acknowledge this land on which the University of Toronto operates. For thousands of years it has been the traditional land of the Huron-Wendat, the Seneca, and most recently, the Mississaugas of the Credit River. Today, this meeting place is still the home to many Indigenous people from across Turtle Island and we are grateful to have the opportunity to work on this land.

Maintainer

Contributions

  • IntroductionToR welcomes issues, enhancement requests, and other contributions. To submit an issue, use the GitHub issues.