1 + 1[1] 2
This chapter serves as reading material for Meeting 1 of your Introduction to R programming classes. We strongly advice to walk through all steps described BEFORE the meeting takes place.
Tip: On the right side you find the table of contents. You can click on a section, and the document will jump directly there.
The first thing is to download and install both R and RStudio for you operating system. But why do you have to install two things (R AND RStudio)? Isn’t one just enough? The answer is simple.
R is a programming language. It comes with a simple interface, however it is rather unpractical to use, and it lacks useful features such as code completion. RStudio is a program that functions as a more functional interface for various programming languages, among which R. RStudio has useful features such as code completion, file management, an environment overview, R-projects (more on this later) and many more.
In other words: RStudio makes your life easier when you have to work with R. Therefore, we ask you to install both R AND RStudio.
After downloading R and RStudio, you have to install it in the correct order, to ensure that everything works perfectly:
To download and install R and RStudio, follow the guide on this website.
If you already have an older version of R or RStudio installed, it is advised to update them regularly. If you want to start fresh, uninstall both before you install the newest versions.
If you do this, you might have to re-install packages that you already installed before.
Before we get into working with R specifically, in this section we present some general recommendations for organizing your files.
Over the past years, we have noticed that it has changed how people use their computers. This comes naturally, as nowadays basically all devices have search functionalities that make it easy for the user to find their files. Because of the increased use of smartphones or tablets, also computers have become more application-based than they used to: Now you can open, for example, Microsoft Word, and the program allows you to open the most recently used documents with just one click.
However, this easy-to-use approach can become an issue when dealing with large amount of files, when you are collaborating with other people, or when you need to know where your files are stored. Imagine you are writing your Bachelor or Master thesis: In order to not lose track of your data and report drafts, it is useful to create a folder structure that allows you to easily find all files, without having to use a search function.
Therefore, we formulate the following advice:
Move files on your computer by using the file explorer (Windows) or the finder (Mac). If you don’t know how to open it, on Windows you can simultaneously press CTRL + E, on Mac press CMD + OPTION + SPACEBAR.
Do NOT open any data files using Excel or similar. Depending on the language settings of your device, this might mess up the data. For example, if your device is set to German, English, Spanish or any other language that uses commas (,) as decimal separators instead of periods (.), you will have to write additional R code to properly read the data. To prevent this you can set your device language to international English.
On Mac there are issues when downloading files from Canvas using the Safari browser. Use Google Chrome or Mozilla Firefox instead. By default, most browsers save your downloaded files in the Downloads folder on your device.
To choose yourself where something is downloaded to either right-click on the link, and select save link as, or specify in the browser settings that you can choose where something is downloaded to. For example, in Google Chrome go to Settings –> Downloads and enable Ask where to save each file before downloading:
Follow the below steps for setting up a folder structure for this course:
Of course, this you can do this for all your courses.
When working with R, we always advise to use RStudio. Therefore, do NOT open R itself, but open RStudio instead. If you do this for the first time, it should look like Figure 1.4:
As you can see in Figure 1.4, in what is labelled the Console, RStudio tells me that my currently installed R version is R version 4.5.0. This was the most recent R version at the time of creating this document, so by now there might be a newer one. Please always install the newest R version.
If you opened R directly, and not RStudio, it will look like Figure 1.5; please close this program and open RStudio (see Figure 1.4) instead!
At BMS we work with R-projects. Using R-projects, you can ensure that R will always be able to find all (data) files that you need for your current project. To create a Project follow the steps outlined below.
First, you click on File:
Then you click on New Project:
As you already created a (project) folder in the steps above, follow the steps from Figure 1.8 to Figure 1.12.
Now click on Existing Directory:
Then you hit Browse:
And navigate to your existing directory on the Desktop:
Click on Open and afterwards on Create Project:
If you are still not certain how to create an R-project, please watch the following video:
Let’s start by using R as a calculator. For that, type the following in the console and hit Enter:
1 + 1[1] 2
4.79 * 148.27[1] 710.2133
It also knows the order of operations:
1 + 2.36 * 15.13[1] 36.7068
And if we use parentheses:
(1 + 2.36) * 15.13[1] 50.8368
If you write your code in the console only, you are not able to save what you did for later sessions. To do that, you should ALWAYS write all your code in a script. To open a script, follow these steps:
First, you click on File in the top left of your RStudio window:
Next up, you click on New File:
Afterwards, you click on R Script:
You should now see that RStudio opened a script for you, it is named Untitled1 so far. But before you start to write code in R, you should save this script. Click on the save icon:
Navigate to your project folder if necessary (if you followed all the steps above properly and created an R project, you should already be in the correct folder), give the script a name like meeting_1 and hit save:
Now, your script should look like this, named meeting_1.R instead of Untitled1:
Once you saved your script, you can repeat the previously written code. Afterwards, your script should look like this:
If you look closely, I already ran all the code and the output is in the console. To run R code, just highlight (in the script) the code you want to run and hit CTRL + Enter at the same time (CMD + Enter for Mac users).
If you are still not certain how to create and save an R-script, please watch the following video:
R is more than a calculator. An important term for you to know is an object. When you run the following code, you see three objects, named x, y, z in the environment on the right.
x <- 1
y <- 2.36
z <- 15.13Now check your R environment in the top right, it should look like this:
Subsequently, you can use objects that are in your environment for further calculations:
x + y * z[1] 36.7068
Before you can read about the different types of objects you can encounter in R, we tell you about different kind of operators you can encounter in R, afterwards you can find how to define the different types of objects, and what the output in R looks like.
The names you give to objects are arbitraty. Choose names based on what makes sense given the context of your project. In this chapter the names chosen are rather abstract, in Chapter 2 we will use more informative names.
In the previous example we defined the objects x, y and z using the following code:
x <- 1
y <- 2.36
z <- 15.13If you inspect the code closely, you notice the little “arrow” pointing to the left side, we call that the assignment operator <-. It is used to assign a value to a symbolic object, just like above.
Specifically, we created the value \(1\) to an object called x, the value \(2.36\) to an object called y and the value \(15.13\) to an object calledz (\(15.13\)). If you run this code, your environment will include the following:
Aside from the assignment operator, other operators you might use are arithmetic operators, such as + or -. These you use for calculations, for instance if you type x + y * z you would see the output 36.7068.
Throughout the Data Analysis courses that you will follow this year, you will also use logical and comparison operators, including the following:
| Type | Operator | Meaning |
|---|---|---|
| Comparison | \(>\) | larger than |
| Comparison | \(<\) | smaller than |
| Comparison | \(>=\) | larger than or equal to |
| Comparison | \(<=\) | smaller than or equal to |
| Comparison | \(==\) | equal to |
| Comparison | \(!=\) | not equal to |
| Logical | & | AND |
| Logical | \(|\) | OR |
| Logical | \(!\) | NOT |
In R you can encounter different types of objects, including Values, Vectors, Matrices, Data Frames, Lists and Functions.
Below, we explain them one-by-one.
Objects that hold a single number, are called values. To define them write and run code like:
x_1 <- 1To print x_1 in the console, simply type x_1 in your script or console and run the code.
Vectors are objects that essentially hold more than just a single value. In R, you can define a vector using the c() function (more on functions later in this chapter), where you can list the values you want to include separated by a comma. For example, let’s create a vector that only contains the numbers \(1\) and \(2\):
x_2 <- c(1, 2)Yielding:
[1] 1 2
There are also vectors with characters (think of it as “words”) only:
x_3 <- c("a", "b")Yielding:
[1] "a" "b"
And of course there are mixed vectors too, holding both numbers and characters:
x_mix <- c("Hello", "World", "!", 5, 10, 15)Yielding:
[1] "Hello" "World" "!" "5" "10" "15"
R sees mixed vectors automatically as character vectors.
Further, there are matrices. They are basically like an excel table (just without the titles for the columns or rows):
x_4 <- matrix(c(x_2, x_3), ncol = 2, byrow = FALSE)Yielding:
[,1] [,2]
[1,] "1" "a"
[2,] "2" "b"
This is what is called 2 x 2 matrix: It has 2 rows and 2 columns.
Some people call matrices with 1 row a row vector, and matrices with 1 column a column vector. While the term vector might indicate that they are equal, they are technically NOT the same.
You can convert matrices to data frames:
x_5 <- data.frame(x_4)Yielding:
X1 X2
1 1 a
2 2 b
And assign variable names:
names(x_5) <- c(
"Numbers",
"Characters"
)Yielding:
Numbers Characters
1 1 a
2 2 b
In data frames you can have different variable types, the most common ones are:
| Variable type | Meaning |
|---|---|
| dbl | double (numeric) |
| int | integer (numeric) |
| num | numeric |
| fct | factor (categorical) |
| lgl | logical (categorical) |
| chr | character |
| lbl | labelled |
| Missing Values | all types |
Always check the variable types (later you will see how to do that using the glimpse() function).
While data frames can hold different variable types, matrices cannot. In Section 1.7.2.3 for instance, the entire matrice is of type character, althought he first columns only holds the values \(1\) & \(2\).
Last, there are lists. Lists are objects that can “hold” a collection of other objects. It can hold data frames, matrices, vectors, functions, lists, and so on.
x_6 <- list(x_1, x_2, x_3, x_4, x_5)Yielding:
[[1]]
[1] 1
[[2]]
[1] 1 2
[[3]]
[1] "a" "b"
[[4]]
[,1] [,2]
[1,] "1" "a"
[2,] "2" "b"
[[5]]
Numbers Characters
1 1 a
2 2 b
Functions return (i.e. produce) values (e.g. 10) or other R objects. There are built-in functions like the mean() function, but also from packages (more on this later) like mutate() from the package dplyr. Unless you code the function yourself (Disclaimer: You will not have to do that), it will NOT show up in your R environment.
For example, the mean() function has \(2\) important arguments (i.e. settings that you can change), x and na.rm:
Instead of x, you would provide the vector or variable that you want to compute the mean over, for instance:
mean(x_2)[1] 1.5
If you remember from above x_2, it was a vector containing two numbers. However, sometimes you have missing data, meaning that one of the numbers in your vector would be NA:
x_2_na <- c(1, 2, NA)If you now would run mean(x_2_na) the result would be NA. To properly deal with the missing values (with the NAs), in this situation you would have to change the na.rm argument to TRUE, like this:
mean(x_2_na, na.rm = TRUE)[1] 1.5
which would give you the mean of the remaining numbers in the vector, as it now ignores all NAs
For every function in R, and for most functions coming with R packages, you can find the arguments of the function using R’s internal help function, by typing ? followed by the function name in the console and hitting enter, for example ?mean.
In case there are too many objects in your environment that you do not need, you can remove all objects and start over by typing rm(list = ls()) in your console and hitting enter. To remove only one specific object, for instance x_1 type rm(x_1) in your console and hit enter.
R packages can be considered add-on programs to R. There are thousands of different packages, each offering different functionalities. Some packages offer completely new functionalities, that R otherwise is not capable of, some other packages offer alternatives to already built-in functionalities, making your life easier. One of the most important packages you will use is the tidyverse, which you will now install. In Section 2.2, you can learn more on this package.
Installing packages is simple. For example, when you want to install the package tidyverse (which you will use a lot) you just have to type and run the code below:
install.packages("tidyverse")In general, you can think of a package like a program on your computer or an app on your phone. You install them once, and every now and then it might be smart to update them. But you definitely do not need to install them every time again! Therefore, after installing the package, you can remove or comment out the code (use the # in front of the code):
# install.packages("tidyverse")Whenever you restart your phone or computer, it starts without opening all your apps. Of course there are settings to change that, but this is the default behavior. The same goes for R and RStudio. When you start a new session, R will start empty. Therefore, it is crucial to always specify in your scripts what packages you want to work with. You do that in the first lines of the script. The code for loading the tidyverse package is:
library(tidyverse)This will give you this message in the console (if the tidyverse was not loaded before):
R does not load all installed packages automatically, as this might take a long time. Over time you end up with many different packages that you installed on your computer. Loading all of them would take too long (and potentially crash your computer). Therefore, R starts empty and you need to tell R what you want to work with yourself. Think of it like clothes: In the morning you pick the clothes you want to wear that day, rather than wearing all different clothes that you have at once.
Write the code for loading the packages at the top of your script, as you only need to load them once per session. Also annotate your code (describe what it does) using the #, like in below example: