Introduction to R and RStudio
+Last updated on 2024-01-29 | + + Edit this page
+ + + +Overview
+Questions
+- How to find your way around RStudio? +
- How to manage projects in R? +
- How to install packages? +
- How to interact with R? +
Objectives
+After completing this episode, participants should be able to…
+- Create self-contained projects in RStudio +
- Install additional packages using R code. +
- Manage packages +
- Define a variable +
- Assign data to a variable +
- Call functions +
Project management in RStudio
+RStudio is an integrated development environment (IDE), which means +it provides a (much prettier) interface for the R software. For RStudio +to work, you need to have R installed on your computer. But R is +integrated into RStudio, so you never actually have to open R +software.
+RStudio provides a useful feature: creating projects - self-contained +working space (i.e. working directory), to which R will refer to, when +looking for and saving files. You can create projects in existing +directories (folders) or create a new one.
+Creating RStudio Project
+We’re going to create a project in RStudio in a new directory. To +create a project, go to:
+File
+New Project
+New directory
+- Place the project that you will easily find on your laptop and name
+the project
data-carpentry
+
+ Create project
+
Organising working directory
+Creating an RStudio project is a good first step towards good project +management. However, most of the time it is a good idea to organize +working space further. This is one suggestion of how your R project can +look like. Let’s go ahead and create the other folders:
+-
+
data/
- should be where your raw data is. READ +ONLY +
+ -
+
data_output/
- should be where your data output is +saved READ AND WRITE +
+ -
+
documents/
- all the documentation associated with the +project (e.g. cookbook)
+ -
+
fig_output/
- your figure outputs go here WRITE +ONLY +
+ -
+
scripts/
- all your code goes here READ AND +WRITE +
+
You can create these folders as you would any other folders on your +laptop, but R and RStudio offer handy ways to do it directly in your +RStudio session.
+You can use RStudio interface to create a folder in your project by +going to lower-bottom pane, files tab, and clicking on Folder icon. A +dialog box will appear, allowing you typing a name of a folder you want +to create.
+An alternative solution is to create the folders using R command
+dir.create()
. In the console type:
R +
+
+dir.create('data')
+dir.create('data_output')
+dir.create('documents')
+dir.create('fig_output')
+dir.create('scripts')
+Two main ways to interact with R
+There are two main ways to interact with R through RStudio:
+- test and play environment within the interactive R +console + +
- write and save an R script (
.R
+file) +
+
Each of the modes o interactions has its advantages and +drawbacks.
++ | Console | +R script | +
---|---|---|
Pros | +Immediate results | +Work lost once you close RStudio | +
Cons | +Complete record of your work | +Messy if you just want to print things out | +
Creating a script
+During the workshop we will mostly use an .R
script to
+have a full documentation of what has been written. This way we will
+also be able to reproduce the results. Let’s create one now and save it
+in the scripts
directory.
File
+New File
+R Script
+- A new
Untitled
script will appear in the source +pane.
+ - Save it using floppy disc icon. +
- Select the
scripts/
folder as the file location
+ - Name the script
intro-to-r.R
+
+
Running the code
+Note that all code written in the script can be also executed at a
+spot in the
+interactive console. We will now learn how to run the code both in the
+console and the script.
- In the Console you run the code by hitting Enter at the +end of the line +
- In the R script there are two way to execute the code:
+
- You can use the
Run
button on the top right of the +script window.
+ - Alternatively, you can use a keyboard shortcut: Ctrl + +Enter or Command + Return for MAC +users. +
+ - You can use the
In both cases, the active line (the line where your cursor is placed) +or a highlighted snippet of code will be executed. A common source of +error in scripts, such as a previously created object not found, is code +that has not been executed in previous lines: make sure that all code +has been executed as described above. To run all lines before the active +line, you can use the keyboard shortcut Ctrl + Alt ++ B on Windows/Linux or Command + +option + B on Mac.
+Escaping +
+The console shows it’s ready to get new commands with
+>
sign. It will show +
sign if it still
+requires input for the command to be executed.
Sometimes you don’t know what is missing/ you change your mind and +want to run something else, or your code is running much too long and +you just want it to stop. The way to do it is to press +Esc.
+Packages
+A great power of R lays in packages: add-on sets of
+functions that are build by the community and once they go
+through a quality process they are available to download from a
+repository called CRAN
. They need to be explicitly
+activated. Now, we will be using tidyverse
package, which
+is actually a collection of useful packages. Another package that we
+will use is here
.
You were asked to install tidyverse
package in the
+preparation for the workshop. You need to install a package only once,
+so you won’t have to do it again. We will however need to install the
+here
package. To do so, please go to your script and
+type:
R +
+
+install.packages('here')
+Callout +
+If you are not sure if you have tidyverse
packaged
+installed, you can check it in the Packages
tab in the
+bottom right pane. In the search box start typing
+‘tidyverse
’ and see if it appears in the list of installed
+packages. If not, you will need to install it by writing in the
+script:
R +
+
+install.packages('tidyverse')
+Commenting your code +
+Now we have a bit of an issue with our script. As mentioned, the
+packages need to be installed only once, but now, they will be installed
+each time we run the script, which can take a lot of time if we’re
+installing a large package like tidyverse
.
To keep a trace of you installing the packages, without executing it,
+you can use a comment. In R
, anything that is written after
+a has sign #
, is ignored in execution. Thanks to this
+feature, you can annotate your code. Let’s adapt our script by changing
+the first lines into comments:
R +
+
+# install.packages('here')
+# install.packages('tidyverse')
+Installing packages is not sufficient to work with them. You will
+need to load them each time you want to use them. To do that you use
+library()
command:
R +
+
+# Load packages
+library(tidyverse)
+library(here)
+Handling paths
+You have created a project which is your working directory, and a few
+sub-folders, that will help you organise your project better. But now,
+each time you will save or retrieve a file from those folders, you will
+need to specify the path from the folder you are in (most likely the
+scripts/
folder) to those files.
That can become complicated and might cause a reproducibility +problem, if the person using your code (including future you) is working +in a different sub-folder.
+We will use the here()
package to tackle this issue.
+This package converts relative paths from the root (main folder) of your
+project to absolute paths (the exact location on your computer). For
+instance, instead of writing out the full path like
+“C:/Users/YourName/Documents/r-geospatial-urban/data/file.csv” or
+“~/Documents/r-geospatial-urban/data/file.csv”, you can use the
+here()
function to create a path relative to your project’s
+main directory. This makes your code more portable and reproducible, as
+it doesn’t depend on a specific location of your project on your
+computer.
It might be confusing, so let’s see how it works. We will use the
+here()
function from the here
package. In the
+console, we write:
R +
+
+here()
+here('data')
+You all probably have something different printed out. And this is
+fine, because here
adapts to your computer’s specific
+situation.
Download files
+We still need to download data for the first part of the workshop.
+You can do it with the function download.file()
. We will
+save it in the data/
folder, where the raw
+data should go. In the script, we will write:
R +
+
+# Download the data
+download.file('https://bit.ly/geospatial_data',
+ here('episodes', 'data','gapminder_data.csv'))
+Importing data into R +
+Three of the most common ways of importing data in R are:
+- loading a package with pre-installed data; +
- downloading data from a URL; +
- reading a file from your computer. +
For larger datasets, database connections or API requests are also +possible. We will not cover these in the workshop.
+Introduction to R
+You can use R as calculator, you can for example write:
+R +
+
+1+100
+1*100
+1/100
+Variables and assignment
+However, what’s more useful is that in R we can store values and use
+them whenever we need to. We using the assignment operator
+<-
, like this:
R +
+
+x <- 1/40
+Notice that assignment does not print a value. Instead, we’ve stored
+it for later in something called a variable. x
variable now
+contains the value 0.025
:
R +
+
+x
+Look for the Environment
tab in the upper right pane of
+RStudio. You will see that x
and its value have appeared in
+the list of Values. Our variable x
can be used in place of
+a number in any calculation that expects a number, e.g. when calculating
+a square root:
R +
+
+sqrt(x)
+Variables can be also reassigned. This means that we can assign a new
+value to variable x
:
R +
+
+x <- 100
+x
+You can use one variable to create a new one:
+R +
+
+y <- sqrt(x) # you can use value stored in object x to create y
+y
+