Skip to content

Latest commit

 

History

History
133 lines (103 loc) · 6.11 KB

File metadata and controls

133 lines (103 loc) · 6.11 KB

Introduction to Reproducible Science: A 3-day Summer Workshop

  • Author: Nelson Roque, PhD
  • [email protected]
  • Director of the Context Lab at University of Central Florida

Intentions of this Book and Web Course

  • To train the next-generation of scientists to work with data - regardless of the type, format, or volume.
  • Make available a set of open-source materials to learn how to engage in reproducible science, leveraging code-based techniques.
  • This repository is intended to house various sample workflows, and code snippets, to support research + data science activities.

Background

A reproducibility crisis (Ioannidis, 2005; Open Science Collaboration, 2015) has emerged as a threat to the scientific enterprise. Over the last decade I've engaged in learning opportunities to become proficient across topics including data wrangling and modeling of text, image, video, and eye-tracking data, as well as more recently sensor data, and look forward to training the next generation of scientists on code-based methods to apply in their research.

  • Ioannidis, John P A. 2005. “Why Most Published Research Findings Are False.” PLoS Medicine 2 (8): e124.doi:10.1371/journal.pmed.0020124.
  • Open Science Collaboration. 2015. “Estimating the Reproducibility of Psychological Science.” Science 349 (6251):aac4716–aac4716. doi:10.1126/science.aac4716.

Workshop Format

Before the Workshop

  1. Install R - Download R
  2. Install RStudio - Download RStudio
  3. Install packages for various analyses
```
install.packages(c('tidyverse', 'devtools', 'readr', 'tidytext', 'textdata',
'topicmodels', 'wordcloud', 'ggwordcloud'))
```

Learning Objectives

  • Describe various tools and techniques supportive of open and reproducible science.
  • List and describe the FAIR Principles (https://www.go-fair.org/fair-principles)
  • Develop a code-only pipeline to allow reproducibility of data prep and analyses.
  • Develop a long-term learning plan for practicing reproducible science tools and techniques.

Workshop Schedule

  • Day 1 | July 6, 2022
    • What is Reproducible Science?
    • Reproducible & FAIR Data Workflows
    • Tools Supporting Reproducible Science
    • Overview of available tools
      • Skill 1: Using Endnote for Reference Management
      • Skill 2: Using Git (and Github) for code management and collaboration
    • Orientation to R, RStudio, RMarkdown
      • Skill 3: R syntax primer
    • Data Science: Latest trends
    • Long-term Learning Recommendations
  • Day 2 | July 8th, 2022
    • Data wrangling and visualization of Big Data
      • Skill 1: Data wrangling the Google Mobility dataset
    • Reproducible survey research
      • Qualtrics survey design tips
      • Skill 2: Data wrangling Qualtrics data
    • Working with JSON data
      • Skill 3: cleaning and visualizing keystroke JSON data
  • Day 3 | July 11th, 2022
    • Text mining
      • Skill 1: word and bigram frequency analysis
      • Skill 2: generating wordclouds
      • Skill 3: sentiment analysis
    • Interacting with APIs and JSON data
      • Skill 4: querying API for results and data aggregation
    • Closing Discussion & Q/A

Submit your questions

Do you have any questions about the workshop or related content? Submit your questions here

Resources

Books

Cheatsheets

Visualization

Stats

Blogs

Interactive Learning Tools