avsnitt Progress
0% färdig

How to code in R

R is an interpreted language, meaning you have to enter commands written in the R language to carry out operations on your data. The writing of such commands is referred to as coding or programming.

There is a fundamental difference between interpreted languages and software which provides a click interface (e.g SPSS, Excel etc). Although software such as Excel and SPSS may seem comfortable since you can use the mouse to perform most tasks. However, since it is difficult to keep track of your clicks and selections, it quickly becomes impossible to replicate the procedures without a lot of hassle. Writing code, line by line, allows you to continuously assess what you have written, how it is being interpreted and also go back, change and redo any step. An interpreted language is more or less mandatory to perform reproducible and sound research, in any field.

Basic programming in R

This chapter provides an overview of the R language. You will learn how basic operations are executed and interpreted. Don’t worry if you’ve never programmed before, you will learn everything you need to know.

Data analysis implies using you computers memory and mathematical capabilities to store data, manipulate data and perform calculations. Computers are complicated. R is simple. So you will only learn the very basics of how R operates with your computer. Basically, you will tell R (by executing R commands) to direct your computer to perform operations. Here is a very simple operation, in which we will ask the computer to add the numbers 1+9.

1+9
## [1] 10

As you see above in the output, R returns the value 10, as its first result (indicated by [1]). We did not save any information in this example, but we could have. Consider the next example, in which we first save the number 1 to an object named mynumber and the number 9 to an object named your_number. We use the special R operator <- to assign the numbers to the objects. The <-operator is pronounced gets. Then we’ll add these numbers, and get the same result as above:

my_number <- 1
your_number <- 9
my_number + your_number
## [1] 10

In the second example we have saved the value 1 to our computers memory. This was done by assigning 1 to an R object, which we named my_number. So my_number is an R object. You can access the object at any time since it is saved in your computers memory.

What would happen if you multiplied your_number by 10? Let’s try:

10*your_number
## [1] 90

The result is 90. R has saved the number 9 in the computer’s memory and now multiplies that number with 10, which yields 90. Let’s divide the numbers instead:

my_number/your_number
## [1] 0.1111111

You can actually use R as a calculator, as this example shows:

(100-(10*9))/2
## [1] 5

Let’s try a new example where we tell R to print all numbers between 1 and 100. This will be done using colon (:) operator, which tells R to create a sequence of integers from 1 to 100:

1:100
##   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
##  [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
##  [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
##  [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
##  [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
##  [91]  91  92  93  94  95  96  97  98  99 100

You’ll notice that the first number on each row is placed in brackets ([1][18][35][52][69][86]). Those numbers are just indicators that tell you which value the line starts with. So the second line starts with the 18th value in the series. You can mostly ignore the numbers that appear in brackets.

Incomplete commands

If you type an incomplete command and press Enter, R will display the + prompt on the next row, which means R is waiting for you to type the rest of your command. Either finish the command or hit Escape to start over. In the example below, we write 5- on the first line and press Enter. R displays the + prompt on the next line so that we can provide what we want to subtract from the number 5. We will provide the number 1 and press Enter, which will yield the number 4.

> 5 -
+
+ 1
[1] 4

##Cancelling commands in R Some R commands may take a long time to run. You can cancel a command once it has begun by pressing ctrl + c. Note that it may also take R a long time to cancel the command.

Error messages

Code that does not work, for whatever reason, will return errors. An error simply means that R cannot interpret the command. Errors must be addressed by correcting the code. Most errors are due to simple programming typos, but they can certainly be more complex. Errors are prefaced with “Error in…” followed by an explanation that may, or may not, be helpful in addressing the error. We will now enter a command with a typo:

librar(tidyverse)
Error in librar(tidyverse) : could not find function "librar"

The error is due to the fact that we misspelled library as librar. The correct command is:

library(tidyverse)

Commenting code in R

It is important to write comments while you write R code. Virtually all computer languages allow you to write comments by using special symbols. R uses the hashtag symbol (#) for comments. This means that anything that follows a hashtag on a line will not be interpreted. You should always write comments and annotations so that you can remind yourself why and how you did things. R will never interpret your comments. The hashtag is known as the commenting symbol in R.

For the remainder of the book, I’ll use hashtags to display the output of R code. I’ll use a single hashtag to add my own comments and a double hashtag, ##, to display the results of code. I’ll avoid showing >s and [1]s unless I want you to look at them.

Warnings and messages

  • Warnings: Occassionally your code may return a warning, which means that the commands were executed but something needs your attention. For example, if you create a plot and some values are missing, then you may obtain a warning saying: Warning: Removed 13 rows containing missing values. Hence this warning simply informs you that the plot may not represent all observations in your data.
  • Messages: Messages contain benign information. For example, when you load a package you may get a message containing information about packages updates, version etc.

Vectors

A vector is a sequence of data elements of the same basic type. Vectors are important in R and there is nothing complicated about them. Here is a vector containing the numbers 1, 2, 3, 4 and 5. We will create the vector using the function c(), in which c is short for combine.

c(1, 2, 3, 4, 5) 
## [1] 1 2 3 4 5

We could have saved some space by using the colon operator (:):

c(1:5) 
## [1] 1 2 3 4 5

And here is a vector of character strings.

c("David", "Maria", "Mohamed", "Singh", "Lana") 
## [1] "David"   "Maria"   "Mohamed" "Singh"   "Lana"

Objects

In the examples above we created vectors but we never stored them in our computers memory. To store them we need to create an R object, which we do using the assignment operator <-. We will now create an object called my_numbers:

my_numbers <- c(1, 2, 3, 4, 5) 

We now have an R object (my_numbers) which we can use and manipulate again. What is an R object?Just a name that you can use to refer to stored data. Whenever you use the objects name in your R code, R will replace the objects name with the data stored in the object. Consider the example below, in which we store the number 100 in an object called my_number and then we write my_number as a command and press enter.

a <- 100
a
## [1] 100

How to create an R object

To create an R object, choose a name and then use the assignment operator <- to pass data into it. This combination looks like an arrow, <-. R will make an object that contains everything on the right hand side of the <- operator. IF you want to see what is stored in an object, you simply type the name of the object and Enter.

Where can I see my objects in R?

All objects are displayed in the Environment pane of RStudio, as shown in Figure 3. This pane will show you all of the objects you’ve created since opening RStudio.

Figure 3: The RStudio Environment pane displays your R objects.

Naming an R object

You can name an object in R almost anything you want, but there are a few rules. First, a name cannot start with a number. Second, a name cannot use some special symbols, like ^!$@+-/, or *:

AllowedNot allowed
a1per
b$
FOO^dia
my_var5th
.day!bad
my.object?var

R is case sensitive

R is case-sensitive, so income and Income will refer to different objects. R is actually case sensitive throughout, which means that in every aspect – whether your referring to objects or using functions – you must respect case sensitivity. If you try to use the function survival() by writing Survival(), you will obtain an error.

R overwrites objects

If you create an object and at a later stage overwrite information in that object, R will not as for permission. Bear this in mind when creating and naming objects. The example below shows how we overwrite the object my_number:

my_number <- 100
my_number <- 999
my_number
## [1] 999

You can use the function ls() to print out all objects currently stored in your R environment:

ls()
## [1] "a"           "my_number"   "my_numbers"  "your_number"

R uses element-wise execution

Let’s create a new vector called new_numbers:

new_numbers <- c(1, 2, 3, 4, 5, 6)

We will now see what happens when we manipulate this object in different ways.

new_numbers/2
## [1] 0.5 1.0 1.5 2.0 2.5 3.0

As you can see (above) R has divided each element by 2.

new_numbers-1
## [1] 0 1 2 3 4 5

As you can see (above) R has subtracted 1 from each element.

Hence, when you manipulate a vector or object, R executes the same operation to each element in the vector.

If you use two or more vectors in an operation, R will line up the vectors and perform a sequence of individual operations. For example, when you run new_numbers * new_numbers, R lines up the two new_numbers vectors and then multiplies the first element of vector 1 by the first element of vector 2. R then multiplies the second element of vector 1 by the second element of vector 2, and so on, until every element has been multiplied. The result will be a new vector the same length as the first two, as shown below:

new_numbers*new_numbers
## [1]  1  4  9 16 25 36

If you give R two vectors of unequal lengths, R will repeat the shorter vector until it is as long as the longer vector, and then do the math.

a <- c(2)
b <- c(4, 4, 4, 4)
a*b
## [1] 8 8 8 8

Scripts

An R script is simply a text file (with the extension .R) containing all the commands that you want to execute. These commands are usually consecutive, meaning that the lines are executed one at the time. You will virtually always write your R code in longer scripts. To create a new R script you go to the main menu in RStudio and select Files > New file > R Script. This will open a new text file which you can then save via Files > Save as (the file extension is .R). Here is an example of a short R script:

my_numbers <- c(1, 1, 1, 1)
your_numbers <- c(2, 2, 2, 2)
our_numbers <- my_numbers+your_numbers
our_mean <- mean(our_numbers)
our_mean
## [1] 3

You can run these lines one at a time by highlighting each line and press the Run button (or press Control+Return on Windows and Command+Return on Mac). You could also highlight all of the lines and run them simultaneously when pressing the Run button.

Scripts make your work reproducible. They enable you to rerun your code at any time and obtain the same result again. You can also adjust/correct your code and then rerun it again. Consider the situation where you need to rerun your entire analysis by excluding e.g all men from the study; this could be done by adjusting a line or two, and then rerunning the script again.