An introduction to R package development and unit testing: Community Bonding Period(GSoC’ 2017)

First of all, this post assumes that you are a linux user and an open source supporter! 😉

This post aims to cover the basics of R package development and testing in R(as the title screams!). The initial step regarding R and its installation is given here . You can visit the same, if required.

Now coming to the second and main phase, i.e., developing a software package in R programming language. It can be initiated using a good IDE such as Rstudio, which is a very good R software for statistical computing and graphics especially in the context of data analysis. Coming to the package development, it basically involves following components in the form of files and folders:

  • R code : Contains core R code having reusable functions.
  • tests :  Contains functions and files related to the unit testing (which is to be explained later in this post) of R code.
  • man : This is very important component consisting of .Rd files which basically documents the package.
  • Description:  Stores important metadata information related to your package. It also gives description about the dependencies that your package is dependent upon.
  • Other components include installed files, compiled code, data which mainly comprises of sample data.

After the package development, familiarity with a good version control system like git and github is very essential. It helps in sharing your code with the users and developers over the internet. A brief introduction to git and github is given in An easy guide to gihtub.

Now, after creating and sharing your code, one of the main components on which am gonna work for next two or three months is analyzing the performance of a given R package over some important parameters like time and memory. Our project deals with such a package called as Rperform which checks and analyze the performance of other R packages incorporated with git and github.

To solve the above issue, basic concept of testing is essential to get familiar with. Unit testing, automated testing, or formal testing are basically the different names of the same concept. Testing plays an equally important role in a software package as the development of the same. There are various packages and tools available to solve the purpose of profiling R code such as lineprof, Rprof, proftools, summaryRprof, etc. These tools have limitations which make them unsuitable for performing relatively large-scale code performance analysis, which is required by package developers. But, ‘testthat’ is one of such testing packages or tool in R which is  used for unit testing the R packages. Rperform has taken the inspiration from the same. So, thats why getting myself clear with the concepts of ‘testthat’ package is very important.

In the context of unit testing and ‘testthat’ package, the main benefits of testing are described as below:

  • An earlier bug fixing: Reduces the levels of bugs in production code
  • Saves the development time
  • Contributes to the code integrity and code refactory
  • Leads to cost reduction
  • Helps in documenting the code/software

According to the formal definition,  testing means to keep a check on your code whether your function/program/package is performing the way that you expected, whether it is giving the right output or not. In R, testthat package is used for this purpose only. An impressive overview about the testing in R can be drawn from here.  It involves various code blocks in hierarchy as given below:

  • Expectations : The finest testing functions involved
  • tests: Loops of expectations
  • contexts: Loops of tests

Leave a comment