- Author: Nelson Roque, PhD
- [email protected]
- Director of the Context Lab at University of Central Florida
- To train the next-generation of scientists to work with data - regardless of the type, format, or volume.
- Make available a set of open-source materials to learn how to engage in reproducible science, leveraging code-based techniques.
- This repository is intended to house various sample workflows, and code snippets, to support research + data science activities.
A reproducibility crisis (Ioannidis, 2005; Open Science Collaboration, 2015) has emerged as a threat to the scientific enterprise. Over the last decade I've engaged in learning opportunities to become proficient across topics including data wrangling and modeling of text, image, video, and eye-tracking data, as well as more recently sensor data, and look forward to training the next generation of scientists on code-based methods to apply in their research.
- Ioannidis, John P A. 2005. “Why Most Published Research Findings Are False.” PLoS Medicine 2 (8): e124.doi:10.1371/journal.pmed.0020124.
- Open Science Collaboration. 2015. “Estimating the Reproducibility of Psychological Science.” Science 349 (6251):aac4716–aac4716. doi:10.1126/science.aac4716.
- Location: In person at UCF; in PSY301Q
- Format: Live, with recordings available for later viewing.
- Register: Registration open until July 6th - click here to register
- Materials:
- Data: see
data
folder of this repository. - Slides: see
slides
folder of this repository. - Code: see
scripts
folder of this repository. - Textbook: Click here for a web-based textbook to accompany this workshop
- Data: see
- Install R - Download R
- Install RStudio - Download RStudio
- Install packages for various analyses
```
install.packages(c('tidyverse', 'devtools', 'readr', 'tidytext', 'textdata',
'topicmodels', 'wordcloud', 'ggwordcloud'))
```
- Describe various tools and techniques supportive of open and reproducible science.
- List and describe the FAIR Principles (https://www.go-fair.org/fair-principles)
- Develop a code-only pipeline to allow reproducibility of data prep and analyses.
- Develop a long-term learning plan for practicing reproducible science tools and techniques.
- Day 1 | July 6, 2022
- What is Reproducible Science?
- Reproducible & FAIR Data Workflows
- Tools Supporting Reproducible Science
- Overview of available tools
- Skill 1: Using Endnote for Reference Management
- Skill 2: Using Git (and Github) for code management and collaboration
- Orientation to R, RStudio, RMarkdown
- Skill 3: R syntax primer
- Data Science: Latest trends
- Long-term Learning Recommendations
- Day 2 | July 8th, 2022
- Data wrangling and visualization of Big Data
- Skill 1: Data wrangling the Google Mobility dataset
- Reproducible survey research
- Qualtrics survey design tips
- Skill 2: Data wrangling Qualtrics data
- Working with JSON data
- Skill 3: cleaning and visualizing keystroke JSON data
- Data wrangling and visualization of Big Data
- Day 3 | July 11th, 2022
- Text mining
- Skill 1: word and bigram frequency analysis
- Skill 2: generating wordclouds
- Skill 3: sentiment analysis
- Interacting with APIs and JSON data
- Skill 4: querying API for results and data aggregation
- Closing Discussion & Q/A
- Text mining
Do you have any questions about the workshop or related content? Submit your questions here
- R for Data Science
- Advanced R
- R Graphics Cookbook
- Text Mining with R
- Reproducible Analyses with R
- Featured Bookdown books