Packages in R
Packages in R
What is an R package?
R packages contain collections of functions and tools for your research project. You can view a package as an extension to R and such extensions intend to facilitate and expand the functionality of R. You will frequently need functions which are not included in the default R installation and there are over 14.000 R packages which contain such functions. You can install any package within a few seconds. All packages are written by the R community and most packages undergo validation by experts so that the quality can be trusted. Also, you can inspect the entire code of any R package (although this is rarely done by novices).
Some of the most popular R packages for research are as follows:
ggplot2package for visualizing data in R.
dplyrpackage for data wrangling in R.
tidyrpackage for tidying data in R.
purrrpackage for writing functions effectively in R.
survivalpackage for survival analysis in R.
lme4package for creating mixed (random) effects models.
How to install an R package
Before using any R package you need to _install the package and then activate it, so that you can use it in your current R session (a session is launched every time you start R). R comes with a few packages pre-installed. Such packages contain core functionality, such as functions for basic mathematic operations, handling of data frames etc. However, the vast majority of packages are not installed by default. To use them you need to download and install them. You only install a package once. Here is how you install the
Note when you install an R package you must include the quotation marks as demonstrated above.
How to update an R package.
You may occassionally need to update an R package. This is done by executing the following command:
You can also update a package by installing it again (R will overwrite the old version of the package).
Sometimes package authors change how functions (in that particular package) are used. This may cause errors when executing your code after updating the package. This is fortunately easy to resolve by scrutinizing the new package documentation. Every R package comes with detailed documentation.
Installing packages in R using the click interface
You can actually install packages in R without writing code. We do not recommend this as it is always preferred to have all operations/commands documented in the code. However, for the sake of completeness we will guide you through this process as well. To install the
rms package, which contains many usefull functions for regression modelling, do the following:
- In the Files pane of RStudio:
- Click on the “Packages” tab
- Click on “Install”
- Type the name of the package under “Packages (separate multiple with space or comma):” In this case, type
- Click “Install”
However, the best way is to install the packages by writing
install.packages("rms") in the source/script pane.
Let’s install some packages. Since we will be installing multiple packages, we will create a vector which includes the names of all desired packages. We’ll do this in two different ways, which yields identical results.
# create an object called new_packages, # which is a vector containing names of all desired packages new_packages <- c("dplyr", "ggplot2", "rms", "survival") # install these packages install.packages(new_packages)
install.packages(c("dplyr", "ggplot2", "rms", "survival"))
Method 2 consumes fewer rows and performs the same task. This is one of the beauties of R: you can embed functions within other functions! In this case
c()is a function, which is contained in the function
In Method 1 (above) there are three lines starting with the symbol
# and those lines are comments. A comment is simply a note that explains the code so that you can and your co-workers can remember and understand it. The comment is not interpreted by R.
Loading packages in R
Every time you launch R a new session is started. This means that your current environment is empty, as show below.
When you start RStudio some packages are loaded by default. These are the basic packages (also called R base), i.e packages containing fundamental functions. You may have, or soon have, hundreds of other packages but they will not be loaded by default (refer to screenshot above). You need to load these packages manually every time you start a new session in RStudio. Loading packages is done using the
How to load a package in R
After you’ve installed a package, you can now load it using the
library() command. For example, to load the
dplyr packages, run the following commands in the Console pane:
Note that quotation marks are not needed when loading packages!
Errors when loading packages in R
R will return an error if you attempt to load a package which is not installed. We will now try to load a package called polish which is not installed:
Error in library(polish): there is no package called 'polish'
Successful loading of a package in R
R will return a message when a package is successfully loaded. This message could include a message from the package author or other important information. Loading the package
dplyrresults in the following message:
Attaching package: ‘dplyr’ The following objects are masked from ‘package:stats’: filter, lag The following objects are masked from ‘package:base’: intersect, setdiff, setequal, union
There are important notes in the message above. You will see similar messages often and you have to pay attention to them. The message says the following:
dplyrresulted in masking of some functions in other packages.
- Specifically, loading of
dplyrresulted in masking of the functions
lagwhich are located in the
statspackage, as well as the fucntions
unionwhich are located in the
Although masking may sound as if the functions are inactivated, they are actually still accessible but you have to specify the package name when using masked functions. Hence, if you – after loading
dplyr – want to use the
filter function from the
stats package, then you have to write
Errors when using functions in not yet loaded packages
R will return an error if you attempt to use functions in packages which are not loaded As stated above, you have to load each package you want to use every time you start a new RStudio session. In the example below, we try to use the function called
splined(), which is located in a package not currently loaded.
Error in splined(1) : could not find function "splined"
This error message simply means that R cannot find the function in any of the currently loaded packages.
Tidyverse – A revolution in the R world
R used to be considered as a difficult language, which is due to the fact that the syntax is rather complicated. Consider the situation where you wish to filter your data frame in order to keep a subset of your original observations. You may, for example, have a data frame with measurements on men and women and now wish to only keep the men for further analyses. In the old days, we used to write:
my_data_frame[my_data_frame$Sex == 'Males',]
my_data_frameis the name of the data frame.
Sexis a variable in that data frame.
$symbol is used to refer to a variable (column) in
my_data_frame$Sexmeans that we wish to access
- We use brackets (
) to subset the data frame and we apply the condition that
Sexshould equal “Males”.
- We write a comma (
,) after “Males”.
This is just one of many examples of R code which most users find difficult to write and read. The
tidyverse contains packages which simplify coding in R. Indeed, the
tidyverse makes it possible for anyone to read and write R code. Below follows the same code using
dplyr, which is one of the packages included in
To install the
tidyverse you enter the following command:
This will download and install
tidyverse from CRAN. The
tidyversepackage inccludes several packages:
Other R packages
You can view all R packages on CRAN: