diff --git a/content/Concepts.md b/content/Concepts.md index 34a0604..2e01893 100644 --- a/content/Concepts.md +++ b/content/Concepts.md @@ -1,9 +1,9 @@ --- -layout: default -title: Concepts & Syntax -nav_order: 7 -parent: Workshop Content -has_toc: false +layout: default +title: Concepts & Syntax +nav_order: 7 +parent: Workshop Content +has_toc: false --- # **Concepts and Basic Syntax** @@ -13,49 +13,49 @@ has_toc: false In R, an operator is a special symbol or keyword used to perform operations on one or more values. There are several types of operators in R, including: {: .list-with-space} -* **Assignment Operators**: These operators are used to assign a value to a variable. +* **Assignment Operators**: These operators are used to assign a value to a variable. Examples include `<-` and `=`. -* **Arithmetic Operators**: These operators are used to perform basic mathematical operations such as addition, subtraction, multiplication, and division. Examples include - - `+` for addition, - - `-` for subtraction, - - `*` for multiplication, - - `/` for division, +* **Arithmetic Operators**: These operators are used to perform basic mathematical operations such as addition, subtraction, multiplication, and division. Examples include + - `+` for addition, + - `-` for subtraction, + - `*` for multiplication, + - `/` for division, - and `^` for power. Type `?Arithmetic` to read the R document in the help tab. -* **Logical Operators**: These operators are used to perform logical operations on values or variables, and return a logical value of TRUE or FALSE. Examples include - - `&` for logical AND, - - `|` for logical OR, - - and `!` for logical NOT. +* **Logical Operators**: These operators are used to perform logical operations on values or variables, and return a logical value of TRUE or FALSE. Examples include + - `&` for logical AND, + - `|` for logical OR, + - and `!` for logical NOT. Type `?base::Logic` to read the R document in the help tab. -* **Comparison Operators**: These operators are used to compare two values or variables and return a logical value of TRUE or FALSE. Examples include - - `<` for less than, - - `>` for greater than, - - `==` for equal to, - - and `!=` for not equal to. +* **Comparison Operators**: These operators are used to compare two values or variables and return a logical value of TRUE or FALSE. Examples include + - `<` for less than, + - `>` for greater than, + - `==` for equal to, + - and `!=` for not equal to. Type `?Comparison` to read the R document in the help tab. -* **Miscellaneous Operators**: These include - * the hash sign `#` indicates a comment in the code, - * the colon sign `:` creates a sequence of numbers, - * the square bracket `[]` indexes an object such as a vector, - * the dollar sign `$` accesses a variable from a data frame, - * the percent sign `%` is used for special operators like modulo, - * and the double colon `::` used to access functions or variables from a specific package. - +* **Miscellaneous Operators**: These include + * the hash sign `#` indicates a comment in the code, + * the colon sign `:` creates a sequence of numbers, + * the square bracket `[]` indexes an object such as a vector, + * the dollar sign `$` accesses a variable from a data frame, + * the percent sign `%` is used for special operators like modulo, + * and the double colon `::` used to access functions or variables from a specific package. + Understanding how to use these different types of operators is important for writing efficient and effective code in R. ### Practice 1 -Compare the magnitude of the following numbers: -* the sum of all integers from 1 to 100, -* 10 to the power of 11, -* 11 to the power of 10. +Compare the magnitude of the following numbers: +* the sum of all integers from 1 to 100, +* 10 to the power of 11, +* 11 to the power of 10.
Solution @@ -70,7 +70,7 @@ a < c
## 2. Functions and Packages ### Function -When we calculate the sum of all the integers from 1 to 100, we use a formula to help simplify the calculation. What if we cannot memorize the formula? Does it mean that we have to sum those integers brute force by typing 1 + 2 + 3 + ... all the way up to 100? We can use a built-in function `sum()` with the colon operator `:`. +When we calculate the sum of all the integers from 1 to 100, we use a formula to help simplify the calculation. What if we cannot memorize the formula? Does it mean that we have to sum those integers brute force by typing 1 + 2 + 3 + ... all the way up to 100? We can use a built-in function `sum()` with the colon operator `:`. Input {: .label .label-green} @@ -78,23 +78,23 @@ Input sum(1:100) ``` -In R, a **function** is a block of code that performs a specific task and can be called or executed by the user. Functions are an essential part of R programming because they allow you to automate tasks and reuse code. As a R user, you can rely on functions heavily. +In R, a **function** is a block of code that performs a specific task and can be called or executed by the user. Functions are an essential part of R programming because they allow you to automate tasks and reuse code. As a R user, you can rely on functions heavily. -When R is installed, it comes with a default package named `base`, which contains some **built-in functions** that you can use. For example, +When R is installed, it comes with a default package named `base`, which contains some **built-in functions** that you can use. For example, -| `mean()` | Calculates the arithmetic mean of a vector of numbers -| `sd()` | Calculates the standard deviation of a vector of numbers -| `str()` | Displays the structure of an R object -| `table()` | Creates a frequency table of a vector or factor +| `mean()` | Calculates the arithmetic mean of a vector of numbers +| `sd()` | Calculates the standard deviation of a vector of numbers +| `str()` | Displays the structure of an R object +| `table()` | Creates a frequency table of a vector or factor | `plot()` | Creates a basic plot of data ### Package -As introduced earlier, R is powerful partially because it is extensible by installing additional packages quickly. In R, a **package** is a collection of functions, data, and documentation. +As introduced earlier, R is powerful partially because it is extensible by installing additional packages quickly. In R, a **package** is a collection of functions, data, and documentation. Packages can be installed using the `install.packages()` function, which downloads the package from a repository and installs it on your system. Once a package is installed, its functions and data can be loaded into the R environment using the `library()` function. You do not need to install a package repeatedly, but you do need to library it again in order to use it every time when you restart the RStudio. If you want to use a function from a package without library it, you can use the double colon operator, for example, `dplyr::filter()`, where `dplyr` is the package name, and `filter()` is a function from the package. ### Practice 2 -The `tidyverse` is a collection of popular R packages for data manipulation and visualization. Use the following commands to install and load the `tidyverse` package: +The `tidyverse` is a collection of popular R packages for data manipulation and visualization. Use the following commands to install and load the `tidyverse` package: Input {: .label .label-green} @@ -122,14 +122,14 @@ library(tidyverse) # Load package ## 3. Data Type, Vector, and Data Frame -### Data Type +### Data Type In R, there are several **data types** that are commonly used, including: * Numeric: Represents numbers with decimal points. Can be either integers (e.g., 3, 7) or double/floating-point numbers (e.g., 3.14, 2.5e-3). * Character: Represents text strings. Enclosed in quotes (e.g., "hello", 'world'). * Factor: Represents categorical data with a fixed set of possible values or levels (e.g., "male", "female"). * Logical: Represents Boolean values (TRUE or FALSE) or logical values (NA, NULL). -The `typeof()` function can help check for the data type of a object. For example, `a`, `b`, and `c` we created in practice 1 are all of the double numberic type. +The `typeof()` function can help check for the data type of a object. For example, `a`, `b`, and `c` we created in practice 1 are all of the double numberic type. Input {: .label .label-green} @@ -137,9 +137,9 @@ Input typeof(a); typeof(b); typeof(c) # check for data type, the semicolon ";" separates multiple statements on the same line of code ``` -Each data type has its own properties and functions that can be used to manipulate and analyze data. Data type is an important concept in R because it affects how the data is stored, processed, and analyzed. Choosing the appropriate data type can help you optimize your memory usage, perform the necessary data manipulations, conduct the appropriate statistical analyses, and create effective visualizations of your data. +Each data type has its own properties and functions that can be used to manipulate and analyze data. Data type is an important concept in R because it affects how the data is stored, processed, and analyzed. Choosing the appropriate data type can help you optimize your memory usage, perform the necessary data manipulations, conduct the appropriate statistical analyses, and create effective visualizations of your data. -### Vector +### Vector A **vector** is a basic data structure that represents a sequence of values of *the same data type*. A vector can be created using the `c()` function, which combines values into a vector. For example, to create a vector base on `a`, `b` and `c`, you can use the following code: @@ -155,15 +155,15 @@ To access an element in a vector, you can use the operator `[]` and a number ins A **data frame** is a two-dimensional tabular data structure that represents a rectangular grid of data, where each row represents an observation and each column represents a variable. Essentially, a data frame is several equal-length vectors - one for each column. The data in each column must be of the same type, while the data in each row can be different types. -R comes with several **built-in data frames**. These data sets can be useful for learning and practicing data manipulation, analysis, and visualization techniques. To name a few, +R comes with several **built-in data frames**. These data sets can be useful for learning and practicing data manipulation, analysis, and visualization techniques. To name a few, -| `iris` | A data frame containing measurements of the length and width of petals and sepals for three species of iris flowers (setosa, versicolor, and virginica). -| `mtcars` | A data frame containing information about 32 automobiles, including miles per gallon (mpg), horsepower (hp), and other variables related to performance and design. -| `airquality` | A data frame containing daily measurements of air quality in New York City in the summer of 1973, including measurements of ozone, particulate matter, and other pollutants. -| `ChickWeight` | A data frame containing information on the weight of chickens over time, along with details on diet, gender, and other factors that may impact growth. -| `CO2` | A data frame containing measurements of carbon dioxide uptake in plants, along with information on factors such as light intensity, temperature, and humidity. +| `iris` | A data frame containing measurements of the length and width of petals and sepals for three species of iris flowers (setosa, versicolor, and virginica). +| `mtcars` | A data frame containing information about 32 automobiles, including miles per gallon (mpg), horsepower (hp), and other variables related to performance and design. +| `airquality` | A data frame containing daily measurements of air quality in New York City in the summer of 1973, including measurements of ozone, particulate matter, and other pollutants. +| `ChickWeight` | A data frame containing information on the weight of chickens over time, along with details on diet, gender, and other factors that may impact growth. +| `CO2` | A data frame containing measurements of carbon dioxide uptake in plants, along with information on factors such as light intensity, temperature, and humidity. -These data sets can be accessed by name in R and can be loaded into memory using the `data()` function. For example, to load the `iris` data frame into memory, you can use `data(iris)`. +These data sets can be accessed by name in R and can be loaded into memory using the `data()` function. For example, to load the `iris` data frame into memory, you can use `data(iris)`. To access an element in a data frame, you can use the operator `[]` and two numbers inside it indicating the row and column position of the element. For example, to access the element in the second row and fourth column in `iris`, the syntax is `iris[2, 4]`. You can also access a variable in a data frame with the `$` operator, for example, `iris$Sepal.Length` calls out the `Sepal.Length` variable from `iris` dataset. @@ -180,29 +180,29 @@ You can import **foreign data** into R as well. The beginners-friendly way is to The following screenshot shows how to download and import the 2016 and 2021 Census data about the race and gender of judges in Canada [https://abacus.library.ubc.ca/dataset.xhtml?persistentId=hdl:11272.1/AB2/PG2NB4](https://abacus.library.ubc.ca/dataset.xhtml?persistentId=hdl:11272.1/AB2/PG2NB4). The dataset is retrieved from [Abacus](https://abacus.library.ubc.ca/). Try follow the steps to import the data into your RStudio.

- -Figure 4. Download Data from Abacus

+ +Figure 4. Download Data from Abacus

- -Figure 5. Find the Downloaded File and Copy Download Link

+ +Figure 5. Find the Downloaded File and Copy Download Link

- -Figure 6. Import Dataset through Environment

- + +Figure 6. Import Dataset through Environment

+

- -Figure 7. Import Dataset Option 1 Paste URL

+ +Figure 7. Import Dataset Option 1 Paste URL

- -Figure 8. Import Dataset Option 2 Browse File

+ +Figure 8. Import Dataset Option 2 Browse File

## 4. Working Directory -I copied the following code from the Code Preview section on the bottom right corner in Figure 7 when importing data by URL. +I copied the following code from the Code Preview section on the bottom right corner in Figure 7 when importing data by URL. Input {: .label .label-green} @@ -215,14 +215,14 @@ View(X104526_gbrecs_true) Using these code, others can easily import the target data in the URL, much easier than following the series of screenshots in practice 3. Such a nuance could also contribute to research transparency and reproducibility. Let's paste these code to a script and save it for future use.

- -Figure 9. Save a Script

+ +Figure 9. Save a Script

When you click the save button, a pop-out window will ask you to specify where to save the script. If we have several files to save, such as plots and datasets, we can avoid them from popping out repeatedly by setting the working directory. **Working directory** is a file path on your computer that sets the default location of any files you read into R, or save out of R. To set the working directory, you can go to the toolbar or use code.

- -Figure 10. Set Working Directory

+ +Figure 10. Set Working Directory

Input {: .label .label-green} @@ -231,7 +231,6 @@ getwd() # Get working directory setwd() # Set working directory ``` - -This page is meant to introduce some core concepts and basic syntax in R. -What questions do you have about the terminologies and sytax? Now is a good time for you to share your questions, thoughts and comments. -{: .note} +

+This page is meant to introduce some core concepts and basic syntax in R. +What questions do you have about the terminologies and sytax? Now is a good time for you to share your questions, thoughts and comments. diff --git a/content/R.md b/content/R.md index 6c29bad..2b51399 100644 --- a/content/R.md +++ b/content/R.md @@ -1,14 +1,14 @@ --- -layout: default -title: R -nav_order: 5 -parent: Workshop Content -has_toc: false +layout: default +title: R +nav_order: 5 +parent: Workshop Content +has_toc: false --- -# **[What is R?](https://www.r-project.org/about.html)** -**Is R the right tool for your data analysis needs?** +# **[What is R?](https://www.r-project.org/about.html)** +**Is R the right tool for your data analysis needs?** ## 1. A software for statistical computing and graphics @@ -21,7 +21,7 @@ R vs other statistical software such as SPSS, SAS, STATA, Mplus, HLM etc. : * Flexibility: R is highly customizable. You can customize plots, functions or even develop packages for your own needs.
(Optional: [An interesting metaphor](https://rstudio-education.github.io/hopr/preface.html#:~:text=Busses%20are%20very,SPSS.%20%2D%20Greg%20Snow))
* Reproducibility: R code for data manipulation and analysis can be written and saved in scripts, which can be run anytime to reproduce the results given the raw data and scripts.
-(Optional: [A reproducible example](https://journal.r-project.org/articles/RJ-2022-021/#example-gb-rainfall-paper) +(Optional: [A reproducible example](https://journal.r-project.org/articles/RJ-2022-021/#example-gb-rainfall-paper) [CRAN Task View: Reproducible Research](https://cran.r-project.org/web/views/ReproducibleResearch.html))
* Popularity: R has a larger and active user community, which provides a wealth of resources and support.
(Optional: [Getting Help with R](https://support.posit.co/hc/en-us/articles/200552336-Getting-Help-with-R))
@@ -30,10 +30,10 @@ R vs other statistical software such as SPSS, SAS, STATA, Mplus, HLM etc. : ### Cons * User-friendliness: Other statistical software like SPSS is generally considered to be more user-friendly and easier to learn than R, particularly for users with no programming experience.
* Authority: A poorly written or unreliable R package is risky, leading to errors or incorrect results. When you are unsure whether a package is trustworthy, it is recommended to check for its popularity, author reputation, and whether it is well-documented, peer-reviewed and actively maintained.
-(Optional: +(Optional: [Packages grouped by subject area on CRAN](https://cran.r-project.org/web/views/) -[Check how many downloads a CRAN package has?](https://stackoverflow.com/questions/40835078/check-how-many-downloads-a-cran-package-has) -[How to Evaluate R Packages?](https://rfortherestofus.com/2020/07/how-to-evaluate-r-packages/) +[Check how many downloads a CRAN package has?](https://stackoverflow.com/questions/40835078/check-how-many-downloads-a-cran-package-has) +[How to Evaluate R Packages?](https://rfortherestofus.com/2020/07/how-to-evaluate-r-packages/) [Ten simple rules for finding and selecting R packages](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009884) )
@@ -43,10 +43,10 @@ R vs other statistical software such as SPSS, SAS, STATA, Mplus, HLM etc. : R is a specialized language designed for statistical computing and graphics. It is NOT a general-purpose programming language for software and web development. However, Python, a general-purpose programming language, is popular for data analysis, and shares some of R's features like open source and a strong community. It can be a hard call if you want to choose only one. Each of them have certain strengths:

- +

-*Source: Coursera, [Python or R for Data Analysis: Which Should I Learn?](https://www.coursera.org/articles/python-or-r-for-data-analysis)* +*Source: Coursera, [Python or R for Data Analysis: Which Should I Learn?](https://www.coursera.org/articles/python-or-r-for-data-analysis)* -This page is meant to help you to decide whether R is the right tool for your analysis needs. -What questions do you have about what R is used for and its pros and cons compared to alternatives? Now is a good time for you to share your questions, thoughts and comments. -{: .note} +

+This page is meant to help you to decide whether R is the right tool for your analysis needs. +What questions do you have about what R is used for and its pros and cons compared to alternatives? Now is a good time for you to share your questions, thoughts and comments. diff --git a/content/RStudio.md b/content/RStudio.md index 8d53889..4863014 100644 --- a/content/RStudio.md +++ b/content/RStudio.md @@ -1,46 +1,46 @@ --- -layout: default -title: RStudio -nav_order: 6 -parent: Workshop Content -has_toc: false +layout: default +title: RStudio +nav_order: 6 +parent: Workshop Content +has_toc: false --- -# **[What is RStudio?](https://posit.co/products/open-source/rstudio/)** +# **[What is RStudio?](https://posit.co/products/open-source/rstudio/)** RStudio is an integrated development environment (IDE) for R, designed to provide a powerful and user-friendly interface for working with R. When you use RStudio, R executes the commands in the background. You must install R to use RStudio, but you can use R without RStudio.

- +
-Figure 1. R and RStudio icons in Windows app list +Figure 1. R and RStudio icons in Windows app list

- -Figure 2. RGui + +Figure 2. RGui

Note. The RGui is the original R user interface. We will use RStudio instead, to access R during the workshop.

- -Figure 3. RStudio + +Figure 3. RStudio

The RStudio interface has a toolbar and four main panes: -* On the very top, you can find a **toolbar** with buttons such as File, Code, View, Plots, etc. +* On the very top, you can find a **toolbar** with buttons such as File, Code, View, Plots, etc. * On the top left is the **Source Editor**. If it is your first time opening R Studio, the source editor may not show up. You can click from the toolbar 'File - New File - R Script' and then a script file called Untitled1 will show up there. Writing your code in the script is recommmended if you want to save your work for future use. -* On the bottom left is the **Console**. It is where you can type in R commands and see the outputs. Writing your code in the console is recommended for quick exploration. +* On the bottom left is the **Console**. It is where you can type in R commands and see the outputs. Writing your code in the console is recommended for quick exploration. * The top right pane includes tabs such as Environment and History. The **Environment** tab allows you to see what objects are in the workspace. The **History** tab allows you to see the commands that you have entered. -* The bottom right pane shows tabs such as Plots, Packages and Help. **Plot** is where you can view your plot. **Packages** is where you can view the list of all the installed packages. In **Help**, you can can browse the built-in help system of R, which is super helpful. +* The bottom right pane shows tabs such as Plots, Packages and Help. **Plot** is where you can view your plot. **Packages** is where you can view the list of all the installed packages. In **Help**, you can can browse the built-in help system of R, which is super helpful. -If you have questions after the workshop, a cheat sheet is available for quick referral: [RStudio IDE Cheat Sheet](https://posit.co/wp-content/uploads/2022/10/rstudio-ide-1.pdf). The cheat sheet is also accessible within RStudio through the toolbar 'Help - Cheat Sheets - RStudio IDE Cheat Sheet'. +If you have questions after the workshop, a cheat sheet is available for quick referral: [RStudio IDE Cheat Sheet](https://posit.co/wp-content/uploads/2022/10/rstudio-ide-1.pdf). The cheat sheet is also accessible within RStudio through the toolbar 'Help - Cheat Sheets - RStudio IDE Cheat Sheet'.

- -Figure 4. Accessing RStudio Cheat Sheet + +Figure 4. Accessing RStudio Cheat Sheet

-This page is meant to be a brief introduction and we will refer to them later to help you familiarize with the interface. -What questions do you have about the RStudio interface? Now is a good time for you to share your questions, thoughts and comments. -{: .note} +

+This page is meant to be a brief introduction and we will refer to them later to help you familiarize with the interface. +What questions do you have about the RStudio interface? Now is a good time for you to share your questions, thoughts and comments. diff --git a/content/images/install_package.png b/content/images/install_package.png index 72d6ecb..13d9cc8 100644 Binary files a/content/images/install_package.png and b/content/images/install_package.png differ