Thursday, April 19, 2012

R: running by Java process

After trying Multiple Linear Regression in sandbox, let's try some integration.
In this post we will concentrate on how to install and run R from regular Java process; in next post we will plug R into Hadoop mapreduce.

R is programming language and software environment written in C and FORTRAN, so interaction with Java requires JNI layer. It is provided by Java/R Interface project [1] and contains platform-specific .so files.
To prepare environment, we need both R and JRI installed and configured. For Ubuntu these are next two lines:
sudo apt-get install r-base r-recommended r-base-dev
sudo apt-get install r-cran-rjava

For other platforms follow steps from [5] to install R and [6] for JRI.

To reference .so files for Java processes, we need to update LD_LIBRARY_PATH and pass -Djava.library.path to JVM. Feel free to dig a little deeper on configuration reasoning in [2] and [3].
run.sh script in Ubuntu will look like:


Having environment configured, we can now turn to code:




[1] Java/R Interface
http://www.rforge.net/rJava/

[2] Talking R through Java
http://binfalse.de/2011/02/talking-r-through-java/

[3] java.library.path and LD_LIBRARY_PATH
http://kalblogs.blogspot.ca/2009/01/java.html

[4] How to convert a data frame column to numeric type?
http://stackoverflow.com/questions/2288485/how-to-convert-a-data-frame-column-to-numeric-type

[5] CRAN mirrors: chose your favourite location and follow R installation instruction:
http://cran.stat.sfu.ca/mirrors.html

[6] rJava package on CRAN
http://cran.r-project.org/web/packages/rJava/

No comments: