en flag
sv flag
Gå till index

Data Science med R

0% färdig
0/42 Steps
avsnitt 6 av 17

Packages in R

Johan Svensson februari 22, 2020

Packages in R

What is an R package?

R packages contain collections of functions and tools for your research project. You can view a package as an extension to R and such extensions intend to facilitate and expand the functionality of R. You will frequently need functions which are not included in the default R installation and there are over 14.000 R packages which contain such functions. You can install any package within a few seconds. All packages are written by the R community and most packages undergo validation by experts so that the quality can be trusted. Also, you can inspect the entire code of any R package (although this is rarely done by novices).

Some of the most popular R packages for research are as follows:

  • ggplot2 package for visualizing data in R.
  • dplyr package for data wrangling in R.
  • tidyr package for tidying data in R.
  • purrr package for writing functions effectively in R.
  • survival package for survival analysis in R.
  • lme4 package for creating mixed (random) effects models.

How to install an R package

Before using any R package you need to _install the package and then activate it, so that you can use it in your current R session (a session is launched every time you start R). R comes with a few packages pre-installed. Such packages contain core functionality, such as functions for basic mathematic operations, handling of data frames etc. However, the vast majority of packages are not installed by default. To use them you need to download and install them. You only install a package once. Here is how you install the dplyrpackage:

install.packages("dplyr")

Note when you install an R package you must include the quotation marks as demonstrated above.

How to update an R package.

You may occassionally need to update an R package. This is done by executing the following command:

update.packages("dplyr")

You can also update a package by installing it again (R will overwrite the old version of the package).

Sometimes package authors change how functions (in that particular package) are used. This may cause errors when executing your code after updating the package. This is fortunately easy to resolve by scrutinizing the new package documentation. Every R package comes with detailed documentation.

Installing packages in R using the click interface

You can actually install packages in R without writing code. We do not recommend this as it is always preferred to have all operations/commands documented in the code. However, for the sake of completeness we will guide you through this process as well. To install the rms package, which contains many usefull functions for regression modelling, do the following:

  1. In the Files pane of RStudio:
    1. Click on the “Packages” tab
    2. Click on “Install”
    3. Type the name of the package under “Packages (separate multiple with space or comma):” In this case, type rms
    4. Click “Install”

However, the best way is to install the packages by writing install.packages("rms") in the source/script pane.

Let’s install some packages. Since we will be installing multiple packages, we will create a vector which includes the names of all desired packages. We’ll do this in two different ways, which yields identical results.

Method 1:

# create an object called new_packages,
# which is a vector containing names of all desired packages
new_packages <- c("dplyr", "ggplot2", "rms", "survival")

# install these packages
install.packages(new_packages)

Method 2:

install.packages(c("dplyr", "ggplot2", "rms", "survival"))

Method 2 consumes fewer rows and performs the same task. This is one of the beauties of R: you can embed functions within other functions! In this case c()is a function, which is contained in the function install.packages().

In Method 1 (above) there are three lines starting with the symbol # and those lines are commentsA comment is simply a note that explains the code so that you can and your co-workers can remember and understand it. The comment is not interpreted by R.

Loading packages in R

Every time you launch R a new session is started. This means that your current environment is empty, as show below.

When you start RStudio some packages are loaded by default. These are the basic packages (also called R base), i.e packages containing fundamental functions. You may have, or soon have, hundreds of other packages but they will not be loaded by default (refer to screenshot above). You need to load these packages manually every time you start a new session in RStudio. Loading packages is done using the library() command.

How to load a package in R

After you’ve installed a package, you can now load it using the library() command. For example, to load the ggplot2 and dplyr packages, run the following commands in the Console pane:

library(ggplot2)
library(dplyr)

Note that quotation marks are not needed when loading packages!

Errors when loading packages in R

R will return an error if you attempt to load a package which is not installed. We will now try to load a package called polish which is not installed:

library(polish)
Error in library(polish): there is no package called 'polish'

Successful loading of a package in R

R will return a message when a package is successfully loaded. This message could include a message from the package author or other important information. Loading the package dplyrresults in the following message:

library(dplyr)
Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

There are important notes in the message above. You will see similar messages often and you have to pay attention to them. The message says the following:

  • Loading dplyrresulted in masking of some functions in other packages.
  • Specifically, loading of dplyr resulted in masking of the functions filter and lag which are located in the stats package, as well as the fucntions intersectsetdiffsetequal and union which are located in the base package.

Although masking may sound as if the functions are inactivated, they are actually still accessible but you have to specify the package name when using masked functions. Hence, if you – after loading dplyr – want to use the filter function from the stats package, then you have to write stats::filter.

Errors when using functions in not yet loaded packages

R will return an error if you attempt to use functions in packages which are not loaded As stated above, you have to load each package you want to use every time you start a new RStudio session. In the example below, we try to use the function called splined(), which is located in a package not currently loaded.

splined(1)
Error in splined(1) : could not find function "splined"

This error message simply means that R cannot find the function in any of the currently loaded packages.

Tidyverse – A revolution in the R world

R used to be considered as a difficult language, which is due to the fact that the syntax is rather complicated. Consider the situation where you wish to filter your data frame in order to keep a subset of your original observations. You may, for example, have a data frame with measurements on men and women and now wish to only keep the men for further analyses. In the old days, we used to write:

my_data_frame[my_data_frame$Sex == 'Males',]
  • my_data_frame is the name of the data frame.
  • Sex is a variable in that data frame.
  • The $ symbol is used to refer to a variable (column) in my_data_frame.
    • my_data_frame$Sex means that we wish to access Sex in my_data_frame.
    • We use brackets ([]) to subset the data frame and we apply the condition that Sex should equal “Males”.
    • We write a comma (,) after “Males”.

This is just one of many examples of R code which most users find difficult to write and read. The tidyverse contains packages which simplify coding in R. Indeed, the tidyverse makes it possible for anyone to read and write R code. Below follows the same code using dplyr, which is one of the packages included in tidyverse:

filter(my_data_frame, Sex=="Males)"

To install the tidyverse you enter the following command:

install.packages("tidyverse")

This will download and install tidyverse from CRAN. The tidyversepackage inccludes several packages:

  • ggplot2
  • tibble
  • tidyr
  • readr
  • purrr
  • dplyr

Other R packages

You can view all R packages on CRAN:

5/5 (1 Review)