by Steve Belcher, Sr Technical Specialist, Microsoft Data & AI
In some companies, R users can’t download R packages from CRAN. That might be because they work in an environment that’s isolated from the internet, or because company policy dictates that only specific R packages and/or package versions may be used. In this article, we share some ways you can set up a private R package repository you can use as a source of R packages.
The best way to maintain R packages for the corporation when access to the internet is limited and/or package zip files are not allowed to be downloaded is to implement a custom package repository. This will give the company the most flexibility to ensure that only authorized and secure packages are available to the firm’s R users. You can use a custom repository with R downloaded from CRAN, with Microsoft R Open, with Microsoft R Client and Microsoft ML Server, or with self-built R binaries.
Setting Up a Package Repository
One of the strengths of the R language is the thousands of third-party packages that have been made publicly available via CRAN, the Comprehensive R Archive Network. R includes several functions that make it easy to download and install these packages. However, in many enterprise environments, access to the Internet is limited or non-existent. In such environments, it is useful to create a local package repository that users can access from within the corporate firewall.
Your local repository may contain source packages, binary packages, or both. If at least some of your users will be working on Windows systems, you should include Windows binaries in your repository. Windows binaries are R-version-specific; if you are running R 3.3.3, you need Windows binaries built under R 3.3. These versioned binaries are available from CRAN and other public repositories. If at least some of your users will be working on Linux systems, you must include source packages in your repository.
The main CRAN repository only includes Windows binaries for the current and prior release of R, but you can find packages for older version of R at the daily CRAN snapshots archived by Microsoft at MRAN. This is also a convenient source of older versions of binary packages for current R releases.
There are two ways to create the package repository: either mirror an existing repository or create a new repository and populate it with just those packages you want to be available to your users. However, the entire set of packages available on CRAN is large, and if disk space is a concern you may want to restrict yourself to only a subset of the available packages. Maintaining a local mirror of an existing repository is typically easier and less error-prone, but managing your own repository gives you complete control over what is made available to your users.
Creating a Repository Mirror
Maintaining a repository mirror is easiest if you can use the rsync tool; this is available on all Linux systems and is available for Windows users as part of the Rtools collection. We will use rsync to copy packages from the original repository to your private repository.
Creating a Custom Repository
As mentioned above, a custom repository gives you complete control over which packages are available to your users. Here, too, you have two basic choices in terms of populating your repository: you can either rsync specific directories from an existing repository, or you can combine your own locally developed packages with packages from other sources. The latter option gives you the greatest control, but in the past, this has typically meant you needed to manage the contents using home-grown tools.
Custom Repository Considerations
The creation of a custom repository will give you ultimate flexibility to provide access to needed R packages while maintaining R installation security for the corporation. You could identify domain specific packages and rsync them from the Microsoft repository to your in-house custom repository. As part of this process, it makes sense to perform security and compliance scans on downloaded packages before adding them to your internal repository.
To aid in the creation of a custom repository, a consultant at Microsoft created the miniCRAN package which allows you to construct a repository from a subset of packages on CRAN (as well as other CRAN-like repositories). The miniCRAN package includes a function that allows you to add your own custom packages to your new custom repository, which promotes sharing of code with your colleagues.
Like many other capabilities in the R ecosystem, there are other packages and products that are available to create and work with repositories. A couple of open source packages available for working with R repositories include packrat, renv and drat. If you are looking for a supported, commercially available product to manage access to packages within your organization, RStudio offers the RStudio Package Manager.