R Packages

Packages in R

What is an R package?

R packages contain collections of functions and tools for your research project. You can view a package as an extension to R and such extensions intend to facilitate and expand the functionality of R. You will need functions which are not included in the default R installation and there are over 15.000 R packages containing such functions. You can install any package within a few seconds. All packages are written by the R community and most packages undergo validation by experts so that the quality can be trusted. Every R package comes with detailed documentation.

Some examples of popular R packages for research:

  • ggplot2 package for visualizing data in R.
  • dplyr package for data wrangling in R.
  • tidyr package for tidying data in R.
  • purrr package for writing functions effectively in R.
  • survival package for survival analysis in R.
  • lme4 package for creating mixed (random) effects models.
  • caret package for pre-processing of data, elaborating and evaluating prediction models across all commonly used frameworks.
  • mlr3 another package for pre-processing of data, elaborating and evaluating prediction models across all commonly used frameworks.

In summary: there is most likely an R package for whatever you want to do.

How to install an R package

Before using any R package you need to install the package and activate it, so that you can use it in your current R session (a session is launched every time you start R). R comes with a few packages pre-installed. Such packages contain core functionality, e.g basic mathematical operations, handling of data frames, etc. However, the vast majority of packages are not installed by default. To use them you need to download and install them. Note the following:

  • You only install a package once.
  • You must load a package every session you want to use it.

Here is how you install the dplyrpackage:

install.packages("dplyr")

Note the quotation marks above.

How to update an R package.

You may occasionally need to update an R package. This is done by executing the following command:

update.packages("dplyr")

Installing packages in R using the click interface

You can also install packages in R without writing code. We do not recommend this as it is always preferred to have all operations/commands documented in the code. To install the rms package, which contains many useful functions for regression modeling, do the following:

  1. In the Files pane of RStudio:
    1. Click on the “Packages” tab
    2. Click on “Install”
    3. Type the name of the package under “Packages (separate multiple with space or comma):” In this case, type rms
    4. Click “Install”
Install package and load Rstudio

However, the best way is to install the packages by writing install.packages("rms") in the source/script pane.

Let’s install some packages. Since we will be installing multiple packages, we will create a vector that includes the names of all desired packages. We’ll do this in two different ways, which yields identical results.

Method 1

# create an object called new_packages,
# which is a vector containing names of desired packages
new_packages <- c("dplyr", "ggplot2", "rms", "survival")

# install these packages
install.packages(new_packages)

Method 2

install.packages(c("dplyr", "ggplot2", "rms", "survival"))

Method 2 consumes fewer rows and performs the same task. This is one of the beauties of R: you can embed functions within other functions! In this case c()is a function, which is contained in the function install.packages().

6.1.4 Loading packages in R

Every time you launch R a new session is started. This means that your current working environment is empty, as shown below.

Load and install packages Rstudio R
Load and install packages in Rstudio

6.1.5 How to load a package in R

When you start RStudio some packages are loaded by default. These are the basic packages (also called R base), i.e packages containing fundamental functions. You will soon have hundreds of other packages installed but they will not be loaded automatically when you start a new session. You need to load these packages manually every time you start a new session in RStudio. Loading packages is done using the library() command.

For example, to load the ggplot2 and dplyr packages, run the following commands in the Console pane:

library(ggplot2)
library(dplyr)

Note that quotation marks are not required when loading packages.

6.1.5.1 Errors when loading packages in R

R will return an error if you attempt to load a package which is not installed. We will now try to load a package called polish which is not installed:

library(polish)
Error in library(polish): there is no package called 'polish'

6.1.5.2 Successful loading of a package in R

R will return a message when a package is successfully loaded. This message could include a message from the package author or other important information. Loading the package dplyr results in the following message:

library(dplyr)
Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

There are important notes in the message above. You will see similar messages often and you have to pay attention to them. The message says the following:

  • Loading dplyr resulted in masking of some functions in other packages.
  • Specifically, loading of dplyr resulted in masking of the functions filter and lag which are contained in the stats package, as well as the functions intersectsetdiffsetequal and union which are contained in the base package.
  • For example, if you use the filter() function now, R will apply the filter() function of dplyr, and not the filter() function of stats. You can however, force R to use the filter() function of stats; to do so, you declare explicitly that you desire the function from the stats package, as follows: stats::filter().

6.1.6 Tidyverse – The Revolution

R has traditionally been considered as a difficult language to learn. Consider the situation where you wish to filter your data frame in order to keep a subset of your original observations. You may, for example, have a data frame with measurements on men and women and now wish to only keep the men for further analyses. In the old days, we used to write:

my_data_frame[my_data_frame$Sex == 'Males',]
  • my_data_frame is the name of the data frame.
  • Sex is a variable in that data frame.
  • The $ symbol is used to refer to a variable (column) in my_data_frame.
    • my_data_frame$Sex means that we wish to access Sex in my_data_frame.
    • We use brackets ([]) to subset the data frame and we apply the condition that Sex should equal “Males.”
    • We write a comma (,) after “Males.”

This is just one of many examples of R code which most users find difficult to write and read. Thetidyverse contains numerous functions which simplify life in R. Indeed, the tidyverse makes it possible for anyone to read and write R code. Below follows the same code using dplyr, which is one of the packages included in tidyverse:

filter(my_data_frame, Sex=="Males")

To install the tidyverse you enter the following command:

install.packages("tidyverse")

This will download and install tidyverse from CRAN. The tidyverse package actually includes several packages:

  • ggplot2 for creating graphics
  • tibble for handling data frames
  • tidyr for tidying data frames
  • readr for importing data into R
  • purrr for applying functions in various ways
  • dplyr for manipulating data frames

6.1.7 Other R packages

You can view all R packages on CRAN: