Using R — Calling C Code ‘Hello World!’

来源:互联网 发布:千里眼淘宝版下载 编辑:程序博客网 时间:2024/05/16 19:02
This entry is part 7 of 14 in the series Using R

One of the reasons that R has so much functionality is that people have incorporated a lot of academic code written in C, C++, Fortran and Java into various packages.  Libraries written in these languages are often both robust and fast.  If you are using R to support people in a particular field, you may be called upon to incorporate some outside code into your R environment.  Unfortunately, much of the documentation on how to do this is written at a very high level.  In this post we will distil some of the available information on calling C code from R into three “Hello World” examples.

Most programmers have some experience with the C language.  C is low level, quite fast and is often used to write other programming languages including about50% of R and most of the Python Standard Library.  Many libraries of C code exist that can greatly enhance the speed and functionality of R for a particular use case.  Most of the time, C code will be used to speed up algorithms involving loops which can be very slow in R, but there are other reasons to link R with existing C libraries.

I my case, I am creating an R package to work with seismic data available through IRIS DMC webservices.  After discussions with the client we decided it would be desirable to access binary versions of the data that can be processed with an available libmseed C library.  Now my C is a little rusty, not having written anything in C in over a decade, but I should be fine once I find the “Hello World” example of how to call C code from R.  Ay, there’s the rub!  The official R documentation has little in the way of easy examples and the highly acclaimed Rcpp package is targeted at experienced C++ programmers, not scripting-language science mashup hacks. After a little googling, I found the following sites to contain useful information for an R-calling-C beginner:

  • Calling C and Fortran from R (Charlie Geyer)
  • An Introduction to the .C Interface to R (Peng and de Leeuw, 2002) **
  • Calling other languages from R (R.M. Ripley, 2009) *
  • Calling C from R (Darren Wilkinson, 2010)
  • Calling C Functions from R (Acadia Centre for Mathematical Modelling and Computation, 2010)
  • System and foreign language interfaces (Writing R Extensions manual)

The short list presented above may be all you need to get started but, to be thoroughly pedantic, we’ll go through a couple of iterations of a “Hello World!” example below that explain some of the important details.  These ‘baby steps’ may be helpful for those who, like myself, could use a C refresher.

Birth of C code

Just to start at the very beginning here is the classic introduction to C — helloA.c:

On Unix/Linux systems you can compile this exciting bit of C code with make which will take care of starting up the compiler with the right compiler flags and arguments.  Then just run our new executable at the command line;

To make this executable easier to run from our R session, we’ll create a file namedwrappers.R where we will put a wrapper function that will invoke this C code:

And now we can start up R and try out our new function.

OK, that example was included primarily for those out there who haven’t seen C code in a while.  But it walked us through all the necessary steps that we will repeat as things get more complicated:  write C code; compile; write R wrapper function; invoke from R.  In the example above, we didn’t accept any arguments or return anything to R.  We also started up two separate processes (R and ‘helloA’), each with their own memory space.  One of the main reasons to call C code directly from R is to avoid system calls and keep our data in a single memory space.  The bigger your datasets and the more often you call your C code, the more important this becomes.

Crawling with C

The most basic method for calling C code from R is to use the .C() function described in the System and foreign language interfaces section of the Writing R Extensions manual.  Other methods exist including the.Call() and .External() functions but we’ll stick with the simplest version for this post.  Here are the important points to understand about the.C() function in R:

  • Arguments to the user compiled C code must be pointers to simple R/C data types.
  • The C code must have a return type of void.
  • With no return value, results are communicated from the C code to R by modifying one of the arguments.
  • R sees the values returned by the .C() function as a list.

From the R Extensions manual we get a nice table of R/C data type equivalences:

R storage modeC typelogicalint *integerint *doubledouble *complexRcomplex *characterchar **rawunsigned char *

Note that to get the type Rcomplex you will have to add#include <R.h> in your code.  Also note that a “character vector” in R is a vector of strings, each of which may contain multiple characters.  That’s why the corresponding C type ischar **.

So let’s change our standalone C code to make it callable via the .C() function.  Instead of printing our greeting directly to stdout() we’ll return it to R and let the R session deal with it.  Here is our new version of the code:

This time we compile it into a .so shared object file usingR CMD SHLIB.  This will take care of a lot of compiler flags that we really don’t want to have to remember.

We’ll create an additional wrapper function in wrappers.R to invoke this new function. But before we can invoke our new function we will have to load the shared object file we just created withdyn.load().  Also note that we must pass a character string argument to our function so that memory for the string is allocated in R.  Finally, we will name the arguments we pass to .C() to make it easy to extract our greeting from the list returned by .C().

Running our two examples side by side we see that helloA() prints out immediately and returns an integer — the error code returned by thesystem() call.  The functionhelloB() on the other hand, returns the character string we want.

Baby Steps

Now that we have successfully brought C code into R, let’s try to do some useful work without straying too far from our “Hello World!” example.  We will follow the exact same recipe and create a function that accepts some input, does some computation and then returns the result.  To keep things simple we’ll just count the characters in the greeting we pass to our function.  Here is the C code forhelloC.c:

You may now compile the code with R CMD SHLIB helloC.c.  The R wrapper function should be a little smarter this time and should ensure thatgreeting is of type character.  It is a very good idea to use as.~type~() functions to ensure that your arguments have the correct type before passing them to.C().

So lets try it out to see what we get for different greetings:

As we see, the C strlen() function counts the number of bytes rather than the number of characters and may return surprising results for non-ASCII strings.  For our Russian greeting we have one byte each for the space and exclamation point but two bytes for each Cyrillic character.  (Note to self –  Be careful when working with international character sets.)

That aside aside, I hope this very low level introduction to R’s .C() interface takes some of the fear out of compiling C code for use with R.  Like so many things in life, it’s not really that hard once you know how to do it.  If you intend to move forward with the .C() interface your next stop should probably be the excellent 2002 tutorial:An Introduction to the .C Interface to R.  This pdf file covers the use of the .C() interface and walks users through basic “Hello World!” examples and on to statistically useful examples including a kernel density estimator and C code for convolutions.

.C() vs .Call()

There has been some recent (March 2012) discussion on r-devel on the pros and cons of .C()vs .Call(). This discussion lays out some of the the issues concerning the .C() interface and includes links toPeter Dallgard’s excellent 2004 presentation explaining R interfaces to C code.  This is amust read if you want to delve into the gory details of R internal data types.  This discussion also daylighted Simon Urbanek’s 2012 Quick R API cheat sheet which colllects a lot of the details you need when using the .Call() interface.

The next post in this series, .Call(“hello”), works through the same three “Hello World!” examples using the .Call() interface.  The important differences between the two interfaces are summarized here:

.C()

  • allows you to write simple C code that knows nothing about R
  • only simple data types can be passed
  • all argument type conversion and checking must be done in R
  • all memory allocation must be done in R
  • all arguments are copied locally before being passed to the C function (memory bloat)

.Call()

  • allows you to write simple R code
  • allows for complex data types
  • allows for a C function return value
  • allows C function to allocate memory
  • does not require wasteful argument copying
  • requires much more knowledge of R internals
  • is the recommended, modern approach for serious C programmers
0 0