1 Getting started with R & Programming 101 – Introduction to R programming

This chapter serves as reading material for Meeting 1 of your Introduction to R programming classes. We strongly advice to walk through all steps described BEFORE the meeting takes place.

Tip: On the right side you find the table of contents. You can click on a section, and the document will jump directly there.

1.1 Download & installation of R & RStudio

The first thing is to download and install both R and RStudio for you operating system. But why do you have to install two things (R AND RStudio)? Isn’t one just enough? The answer is simple.

R is a programming language. It comes with a simple interface, however it is rather unpractical to use, and it lacks useful features such as code completion. RStudio is a program that functions as a more functional interface for various programming languages, among which R. RStudio has useful features such as code completion, file management, an environment overview, R-projects (more on this later) and many more.

In other words: RStudio makes your life easier when you have to work with R. Therefore, we ask you to install both R AND RStudio.

After downloading R and RStudio, you have to install it in the correct order, to ensure that everything works perfectly:

First install R.
Only once that is done, install RStudio. This ensures that RStudio can find the correct version of R.

To download and install R and RStudio, follow the guide on this website.

Note

If you already have an older version of R or RStudio installed, it is advised to update them regularly. If you want to start fresh, uninstall both before you install the newest versions.

Caution

If you do this, you might have to re-install packages that you already installed before.

1.2 Organizing files on your computer

Before we get into working with R specifically, in this section we present some general recommendations for organizing your files.

Over the past years, we have noticed that it has changed how people use their computers. This comes naturally, as nowadays basically all devices have search functionalities that make it easy for the user to find their files. Because of the increased use of smartphones or tablets, also computers have become more application-based than they used to: Now you can open, for example, Microsoft Word, and the program allows you to open the most recently used documents with just one click.

However, this easy-to-use approach can become an issue when dealing with large amount of files, when you are collaborating with other people, or when you need to know where your files are stored. Imagine you are writing your Bachelor or Master thesis: In order to not lose track of your data and report drafts, it is useful to create a folder structure that allows you to easily find all files, without having to use a search function.

Therefore, we formulate the following advice:

Create folders for your projects and your study, rather than simply keeping your files on the Desktop or in the Downloads folder.

In these folders, create subfolders for different parts of the project. For example, when you have multiple drafts of your report, move the older drafts into a separate folder to keep overview. In case of your study, create a subfolder for each module, then a subfolder for each course, and within that folder a subfolder for each meeting.

Move files on your computer by using the file explorer (Windows) or the finder (Mac). If you don’t know how to open it, on Windows you can simultaneously press WIN + E, on Mac press CMD + OPTION + SPACEBAR.
Do NOT open any data files using Excel or similar. Depending on the language settings of your device, this might mess up the data. For example, if your device is set to German, English, Spanish or any other language that uses commas ( , ) as decimal separators instead of periods ( . ), you will have to write additional R code to properly read the data. To prevent this you can set your device language to international English.
On Mac there are issues when downloading files from Canvas using the Safari browser. Use Google Chrome or Mozilla Firefox instead. By default, most browsers save your downloaded files in the Downloads folder on your device.

Tip

To choose yourself where something is downloaded to either right-click on the link, and select save link as, or specify in the browser settings that you can choose where something is downloaded to. For example, in Google Chrome go to Settings –> Downloads and enable Ask where to save each file before downloading:

1.3 For this course

Follow the below steps for setting up a folder structure for this course:

Go to your Desktop
Right click, then select new folder. Call this new folder “Introduction to R”.
Double click the Introduction to R folder, and create new folders with the names “Meeting 1”, “Meeting 2” and “Meeting 3”.
Click on Meeting 1 and put all files for this meeting in this folder (think of your notes, if you have to download files from Canvas etc.). In the next weeks, you can do the same for the other respective meetings.

Of course, this you can do this for all your courses.

1.4 Setting up R

When working with R, we always advise to use RStudio. Therefore, do NOT open R itself, but open RStudio instead. If you do this for the first time, it should look like Figure 1.4:

As you can see in Figure 1.4, in what is labelled the Console, RStudio tells me that my currently installed R version is R version 4.5.0. This was the most recent R version at the time of creating this document, so by now there might be a newer one. Please always install the newest R version.

Caution

If you opened R directly, and not RStudio, it will look like Figure 1.5; please close this program and open RStudio (see Figure 1.4) instead!

1.4.1 Create a project

At BMS we work with R-projects. Using R-projects, you can ensure that R will always be able to find all (data) files that you need for your current project. To create a Project follow the steps outlined below.

First, you click on File:

Then you click on New Project:

As you already created a (project) folder in the steps above, follow the steps from Figure 1.8 to Figure 1.12.

Now click on Existing Directory:

Then you hit Browse:

And navigate to your existing directory on the Desktop:

Click on Open and afterwards on Create Project:

Important

If you are still not certain how to create an R-project, please watch the following video:

1.5 Simple calculations

Let’s start by using R as a calculator. For that, type the following in the console and hit Enter:

1 + 1

[1] 2

4.79 * 148.27

[1] 710.2133

It also knows the order of operations:

1 + 2.36 * 15.13

[1] 36.7068

And if we use parentheses:

(1 + 2.36) * 15.13

[1] 50.8368

1.6 R-scripts - Writing reproducible R Code

If you write your code in the console only, you are not able to save what you did for later sessions. To do that, you should ALWAYS write all your code in a script. To open a script, follow these steps:

First, you click on File in the top left of your RStudio window:

Next up, you click on New File:

Afterwards, you click on R Script:

You should now see that RStudio opened a script for you, it is named Untitled1 so far. But before you start to write code in R, you should save this script. Click on the save icon:

Navigate to your project folder if necessary (if you followed all the steps above properly and created an R project, you should already be in the correct folder), give the script a name like meeting_1 and hit save:

Now, your script should look like this, named meeting_1.R instead of Untitled1:

Once you saved your script, you can repeat the previously written code. Afterwards, your script should look like this:

If you look closely, I already ran all the code and the output is in the console. To run R code, just highlight (in the script) the code you want to run and hit CTRL + Enter at the same time (CMD + Enter for Mac users).

Important

If you are still not certain how to create and save an R-script, please watch the following video:

1.7 Objects in R

R is more than a calculator. An important term for you to know is an object. When you run the following code, you see three objects, named x, y, z in the environment on the right.

x <- 1
y <- 2.36
z <- 15.13

Now check your R environment in the top right, it should look like this:

Subsequently, you can use objects that are in your environment for further calculations:

x + y * z

[1] 36.7068

Before you can read about the different types of objects you can encounter in R, we tell you about different kind of operators you can encounter in R, afterwards you can find how to define the different types of objects, and what the output in R looks like.

Important

The names you give to objects are arbitraty. Choose names based on what makes sense given the context of your project. In this chapter the names chosen are rather abstract, in Chapter 2 we will use more informative names.

1.7.1 Operators

In the previous example we defined the objects x, y and z using the following code:

x <- 1
y <- 2.36
z <- 15.13

If you inspect the code closely, you notice the little “arrow” pointing to the left side, we call that the assignment operator <-. It is used to assign a value to a symbolic object, just like above.

Specifically, we created the value \(1\) to an object called x, the value \(2.36\) to an object called y and the value \(15.13\) to an object calledz (\(15.13\)). If you run this code, your environment will include the following:

Aside from the assignment operator, other operators you might use are arithmetic operators, such as + or -. These you use for calculations, for instance if you type x + y * z you would see the output 36.7068.

Throughout the Data Analysis courses that you will follow this year, you will also use logical and comparison operators, including the following:

Type	Operator	Meaning
Comparison	\(>\)	larger than
Comparison	\(<\)	smaller than
Comparison	\(>=\)	larger than or equal to
Comparison	\(<=\)	smaller than or equal to
Comparison	\(==\)	equal to
Comparison	\(!=\)	not equal to
Logical	&	AND
Logical	\(\|\)	OR
Logical	\(!\)	NOT

1.7.2 Different types of objects

In R you can encounter different types of objects, including Values, Vectors, Matrices, Data Frames, Lists and Functions.

Below, we explain them one-by-one.

1.7.2.1 Values

Objects that hold a single number, are called values. To define them write and run code like:

x_1 <- 1

To print x_1 in the console, simply type x_1 in your script or console and run the code.

1.7.2.2 Vectors

Vectors are objects that essentially hold more than just a single value. A vector is a one-dimensional set of values of the same type, like a sample of numbers or answers collected from people.

In R, you can define a vector using the c() function (more on functions later in this chapter), where you can list the values you want to include separated by a comma. For example, let’s create a vector that only contains the numbers \(1\) and \(2\):

x_2 <- c(1, 2)

Yielding:

[1] 1 2

There are also vectors with characters (think of it as “words”) only:

x_3 <- c("a", "b")

Yielding:

[1] "a" "b"

And of course there are mixed vectors too, holding both numbers and characters:

x_mix <- c("Hello", "World", "!", 5, 10, 15)

Yielding:

[1] "Hello" "World" "!"     "5"     "10"    "15"

R sees mixed vectors automatically as character vectors.

1.7.2.3 Matrices

Further, there are matrices. They are basically like an excel table (just without the titles for the columns or rows). In a matrix every value will be considered the same type:

x_4 <- matrix(c(x_2, x_3), ncol = 2, byrow = FALSE)

Yielding:

     [,1] [,2]
[1,] "1"  "a" 
[2,] "2"  "b"

This is what is called 2 x 2 matrix: It has 2 rows and 2 columns.

Note

Some people call matrices with 1 row a row vector, and matrices with 1 column a column vector. While the term vector might indicate that they are equal, they are technically NOT the same.

1.7.2.4 Data frames

A data frame is a dataset organized in rows and columns where each column is a variable and columns can have different types (numbers, text, categories):

A column is a variable, meaning a characteristic you measure or record for each person or observation (e.g., age, gender, how much they like pizza).
A row is an observation, representing one individual case or participant with their values for each variable.

You can convert matrices to data frames:

x_5 <- data.frame(x_4)

Yielding:

  X1 X2
1  1  a
2  2  b

And assign variable names:

names(x_5) <- c(
  "Numbers",
  "Characters"
)

Yielding:

  Numbers Characters
1       1          a
2       2          b

In data frames you can have different variable types, the most common ones are:

Variable type	Meaning
dbl	double (numeric)
int	integer (numeric)
num	numeric
fct	factor (categorical)
lgl	logical (categorical)
chr	character
lbl	labelled
Missing Values	all types

Always check the variable types (later you will see how to do that using the glimpse() function).

Important

While data frames can hold different variable types, matrices cannot. In Section 1.7.2.3 for instance, the entire matrice is of type character, althought he first columns only holds the values \(1\) & \(2\).

1.7.2.5 Lists

Last, there are lists. Lists are objects that can “hold” a collection of other objects. It can hold data frames, matrices, vectors, functions, lists, and so on.

x_6 <- list(x_1, x_2, x_3, x_4, x_5)

Yielding:

[[1]]
[1] 1

[[2]]
[1] 1 2

[[3]]
[1] "a" "b"

[[4]]
     [,1] [,2]
[1,] "1"  "a" 
[2,] "2"  "b" 

[[5]]
  Numbers Characters
1       1          a
2       2          b

1.8 Functions

Functions return (i.e. produce) values (e.g. 10) or other R objects. There are built-in functions like the mean() function, but also from packages (more on this later) like mutate() from the package dplyr. Unless you code the function yourself (Disclaimer: You will not have to do that), it will NOT show up in your R environment.

For example, the mean() function has \(2\) important arguments (i.e. settings that you can change), x and na.rm:

Figure 1.22: Default arguments of *mean()*

Instead of x, you would provide the vector or variable that you want to compute the mean over, for instance:

mean(x_2)

[1] 1.5

If you remember from above x_2, it was a vector containing two numbers. However, sometimes you have missing data, meaning that one of the numbers in your vector would be NA:

x_2_na <- c(1, 2, NA)

If you now would run mean(x_2_na) the result would be NA. To properly deal with the missing values (with the NAs), in this situation you would have to change the na.rm argument to TRUE, like this:

mean(x_2_na, na.rm = TRUE)

[1] 1.5

which would give you the mean of the remaining numbers in the vector, as it now ignores all NAs

For every function in R, and for most functions coming with R packages, you can find the arguments of the function using R’s internal help function, by typing ? followed by the function name in the console and hitting enter, for example ?mean.

Tip

In case there are too many objects in your environment that you do not need, you can remove all objects and start over by typing rm(list = ls()) in your console and hitting enter. To remove only one specific object, for instance x_1 type rm(x_1) in your console and hit enter.

1.9 R packages

R packages can be considered add-on programs to R. There are thousands of different packages, each offering different functionalities. Some packages offer completely new functionalities, that R otherwise is not capable of, some other packages offer alternatives to already built-in functionalities, making your life easier. One of the most important packages you will use is the tidyverse, which you will now install. In Section 2.2, you can learn more on this package.

1.9.1 Installing packages

Installing packages is simple. For example, when you want to install the package tidyverse (which you will use a lot) you just have to type and run the code below:

install.packages("tidyverse")

In general, you can think of a package like a program on your computer or an app on your phone. You install them once, and every now and then it might be smart to update them. But you definitely do not need to install them every time again! Therefore, after installing the package, you can remove or comment out the code (use the # in front of the code):

# install.packages("tidyverse")

1.9.2 Loading packages

Whenever you restart your phone or computer, it starts without opening all your apps. Of course there are settings to change that, but this is the default behavior. The same goes for R and RStudio. When you start a new session, R will start empty. Therefore, it is crucial to always specify in your scripts what packages you want to work with. You do that in the first lines of the script. The code for loading the tidyverse package is:

library(tidyverse)

This will give you this message in the console (if the tidyverse was not loaded before):

Figure 1.23: Message after loading the *tidyverse*

Important

R does not load all installed packages automatically, as this might take a long time. Over time you end up with many different packages that you installed on your computer. Loading all of them would take too long (and potentially crash your computer). Therefore, R starts empty and you need to tell R what you want to work with yourself. Think of it like clothes: In the morning you pick the clothes you want to wear that day, rather than wearing all different clothes that you have at once.

Tip

Write the code for loading the packages at the top of your script, as you only need to load them once per session. Also annotate your code (describe what it does) using the #, like in below example: