Tuesday, February 08, 2011

How Does R Build a Package with C Source Code?

According to Writing R Extensions, I put C files into a package subdirectory and supply a configure.ac and Makevars.in, and the command “R CMD INSTALL” will compile the extensions. I’m working in Linux, so it uses configure somehow, but what exactly does it do? Where do the defaults come from and how can I change them? As of R version 2.12.1, the build scripts are in the R tools package, no longer written in Perl.

There are a lot of files to put in an R package, from help files to a general description of the library. We are concerned with these three, as an example, for a library called foo.
  • foo/configure.ac - Input to create a configure script.
  • foo/src/Makevars.in - The configure script will create Makevars from this file.
  • foo/src/foo.c - This is our C source to compile.
There are three commands R makes available to work with source while you develop it.
  • R CMD build foo - This creates a tar file of the directory suitable for installing later.
  • R CMD INSTALL foo - This compiles and installs foo into a directory.
  • R CMD check foo - This does all kinds of detailed checks on the health of the R package, all listed in Writing R Extensions, and it also calls R CMD install, so it’s a good way to smoke test compilation.
If you call “R CMD check foo”, then it calls the library tools:::.check_packages() which eventually installs the library into a local directory by calling R again:

R CMD INSTALL -l '/home/username/Documents/rlib/foo.Rcheck' --no-html --no-multiarch '/Users/ajd27/Documents/rlib/foo'

As an aside, calling “R CMD blah” sets R environment variables and then invokes a shell script which looks for a script called blah in R’s bin directory. If it doesn’t find one, it just executes whatever you passed it. Try “R CMD ls -la .” or “R CMD env|sort” to see what environment variables R defines.

The install command is implemented in tools:::.install_packages(). The easiest way to see what it does is to look in the R source code, in the src/library/tools/R/install.R. It executes these steps on your behalf.
  1. Define R-specific variables. These are listed below for one sample.
  2. Call autoconf to create foo/configure from foo/configure.ac.
  3. Call foo/configure, whose main goal is to make foo/src/Makevars from foo/src/Makevars.in.
  4. Look for makefiles in foo/src and call make in foo/src to create shared libraries.
We can specify arguments to configure on the INSTALL command line, with --configure-args and --configure-vars. For instance, typing

R CMD INSTALL “--configure-args=--enable-lizards --disable-frogs”

will call

./configure --enable-lizards --disable-frogs

Use of quotation marks varies depending on the shell. The only way R modifies the execution of the configure command is to define the variables listed at the end of this post. The Guide to Writing R Extensions, however, recommends that authors of configure scripts use R to set defaults using R’s config command. Try

R CMD config --help

to see a list of variables R remembers from when it was configured and compiled.

When R calls make, it tacks a few files together. The first is the Makevars that configure just customized. The next is a list of variables, mostly from when R, itself, was configured. The last, shlib, is the target to build a shared library.

make -f Makevars -f /opt/local/lib/R/etc/x86_64/Makeconf -f /opt/local/lib/R/share/make/shlib.mk SHLIB='foo.so' OBJECTS='foo.o'

The only variables not defined explicitly with Makeconf are
  • PKG_CFLAGS - Where includes go.
  • PKG_CPPFLAGS - For the C preprocessor, if relevant.
  • PKG_CXXFLAGS - For the C++ compiler.
  • PKG_OBJCFLAGS - Objective C’s CFLAGS.
  • PKG_OBJCXXFLAGS - Objective C++’s CFLAGS.
  • PKG_LIBS - Where we put libraries and the directories that hold them.
These are the only variables we should bother to define within Makevars.in. Everything else, from CXX to CFLAGS, is already explicitly within Makeconf, so tough cookies if we want to change it, unless we are willing to make a custom target in our own Makefile in the src directory.

For our package, foo, we want to give the person installing the software a way to customize the include directories and library locations, so we probably want to check in our configure.ac for the existence of FOO_CFLAGS and FOO_LIBS and assign those values to PKG_CFLAGS and PKG_LIBS. Using package-specific naming helps when there are multiple packages installed, an using variables at all helps people installing avoid figuring out how to pass command-line arguments to R.

Sample Variables Defined Before Configure and Make

AWK=awk
DYLD_LIBRARY_PATH=/opt/local/lib/R/lib/x86_64
EGREP=/usr/bin/grep -E
LN_S=ln -s
MAKE=make
PAGER=/usr/bin/less
PERL=/opt/local/bin/perl
R_ARCH=/x86_64
R_BROWSER=/opt/local/bin/kfmclient
R_BZIPCMD=/opt/local/bin/bzip2
R_DOC_DIR=/opt/local/lib/R/doc
R_GZIPCMD=/opt/local/bin/gzip
R_HOME=/opt/local/lib/R
R_INCLUDE_DIR=/opt/local/lib/R/include
R_LIBRARY_DIR=/Users/username/Documents/rlib/foo.Rcheck
R_LIBS=/Users/ajd27/Documents/rlib/foo.Rcheck
R_LIBS_SITE=
R_LIBS_USER=~/R/x86_64-apple-darwin10.6.0-library/2.12
R_PACKAGE_DIR=/Users/ajd27/Documents/rlib/foo.Rcheck/foo
R_PACKAGE_NAME=foo
R_PAPERSIZE=letter
R_PDFVIEWER=/opt/local/bin/ggv
R_PLATFORM=x86_64-apple-darwin10.6.0
R_PRINTCMD=lpr
R_RD4DVI=ae
R_RD4PDF=times,hyper
R_SESSION_TMPDIR=/var/folders/4Z/4ZJIk-FGFl4hDOSWoQMBmU+++TI/-Tmp-//Rtmp3r18dL
R_SHARE_DIR=/opt/local/lib/R/share
R_TEXI2DVICMD=/opt/local/bin/texi2dvi
R_UNZIPCMD=/usr/bin/unzip
R_ZIPCMD=/usr/bin/zip
SED=/usr/bin/sed
TAR=/usr/bin/gnutar
TR=/usr/bin/tr
WHICH=/usr/bin/which