Packages in R
What is an R package?
R packages contain collections of functions and tools for your research project. You can view a package as an extension to R and such extensions intend to facilitate and expand the functionality of R. You will need functions which are not included in the default R installation and there are over 15.000 R packages containing such functions. You can install any package within a few seconds. All packages are written by the R community and most packages undergo validation by experts so that the quality can be trusted. Every R package comes with detailed documentation.
Some examples of popular R packages for research:
ggplot2package for visualizing data in R.
dplyrpackage for data wrangling in R.
tidyrpackage for tidying data in R.
purrrpackage for writing functions effectively in R.
survivalpackage for survival analysis in R.
lme4package for creating mixed (random) effects models.
caretpackage for pre-processing of data, elaborating and evaluating prediction models across all commonly used frameworks.
mlr3another package for pre-processing of data, elaborating and evaluating prediction models across all commonly used frameworks.
In summary: there is most likely an R package for whatever you want to do.
How to install an R package
Before using any R package you need to install the package and activate it, so that you can use it in your current R session (a session is launched every time you start R). R comes with a few packages pre-installed. Such packages contain core functionality, e.g basic mathematical operations, handling of data frames, etc. However, the vast majority of packages are not installed by default. To use them you need to download and install them. Note the following:
- You only install a package once.
- You must load a package every session you want to use it.
Here is how you install the
Note the quotation marks above.
How to update an R package.
You may occasionally need to update an R package. This is done by executing the following command:
Installing packages in R using the click interface
You can also install packages in R without writing code. We do not recommend this as it is always preferred to have all operations/commands documented in the code. To install the
rms package, which contains many useful functions for regression modeling, do the following:
- In the Files pane of RStudio:
- Click on the “Packages” tab
- Click on “Install”
- Type the name of the package under “Packages (separate multiple with space or comma):” In this case, type
- Click “Install”
However, the best way is to install the packages by writing
install.packages("rms") in the source/script pane.
Let’s install some packages. Since we will be installing multiple packages, we will create a vector that includes the names of all desired packages. We’ll do this in two different ways, which yields identical results.
# create an object called new_packages, # which is a vector containing names of desired packages new_packages <- c("dplyr", "ggplot2", "rms", "survival") # install these packages install.packages(new_packages)
install.packages(c("dplyr", "ggplot2", "rms", "survival"))
Method 2 consumes fewer rows and performs the same task. This is one of the beauties of R: you can embed functions within other functions! In this case
c()is a function, which is contained in the function
6.1.4 Loading packages in R
Every time you launch R a new session is started. This means that your current working environment is empty, as shown below.
6.1.5 How to load a package in R
When you start RStudio some packages are loaded by default. These are the basic packages (also called R base), i.e packages containing fundamental functions. You will soon have hundreds of other packages installed but they will not be loaded automatically when you start a new session. You need to load these packages manually every time you start a new session in RStudio. Loading packages is done using the
For example, to load the
dplyr packages, run the following commands in the Console pane:
Note that quotation marks are not required when loading packages.
18.104.22.168 Errors when loading packages in R
R will return an error if you attempt to load a package which is not installed. We will now try to load a package called polish which is not installed:
Error in library(polish): there is no package called 'polish'
22.214.171.124 Successful loading of a package in R
R will return a message when a package is successfully loaded. This message could include a message from the package author or other important information. Loading the package
dplyr results in the following message:
Attaching package: ‘dplyr’ The following objects are masked from ‘package:stats’: filter, lag The following objects are masked from ‘package:base’: intersect, setdiff, setequal, union
There are important notes in the message above. You will see similar messages often and you have to pay attention to them. The message says the following:
dplyrresulted in masking of some functions in other packages.
- Specifically, loading of
dplyrresulted in masking of the functions
lagwhich are contained in the
statspackage, as well as the functions
unionwhich are contained in the
- For example, if you use the
filter()function now, R will apply the
filter()function of dplyr, and not the
filter()function of stats. You can however, force R to use the
filter()function of stats; to do so, you declare explicitly that you desire the function from the stats package, as follows:
6.1.6 Tidyverse – The Revolution
R has traditionally been considered as a difficult language to learn. Consider the situation where you wish to filter your data frame in order to keep a subset of your original observations. You may, for example, have a data frame with measurements on men and women and now wish to only keep the men for further analyses. In the old days, we used to write:
my_data_frame[my_data_frame$Sex == 'Males',]
my_data_frameis the name of the data frame.
Sexis a variable in that data frame.
$symbol is used to refer to a variable (column) in
my_data_frame$Sexmeans that we wish to access
- We use brackets (
) to subset the data frame and we apply the condition that
Sexshould equal “Males.”
- We write a comma (
,) after “Males.”
This is just one of many examples of R code which most users find difficult to write and read. The
tidyverse contains numerous functions which simplify life in R. Indeed, the
tidyverse makes it possible for anyone to read and write R code. Below follows the same code using
dplyr, which is one of the packages included in
To install the
tidyverse you enter the following command:
This will download and install
tidyverse from CRAN. The
tidyverse package actually includes several packages:
ggplot2for creating graphics
tibblefor handling data frames
tidyrfor tidying data frames
readrfor importing data into R
purrrfor applying functions in various ways
dplyrfor manipulating data frames
6.1.7 Other R packages
You can view all R packages on CRAN: