Translation of R (S or Splus language) to common lisp for efficient compilation

A. D. Corlan

Eficient compilation of the R (S, Splus) language by translation to CommonLisp
What it is Download Installation Rationale

What it is. R2cl is a minimal, proof of concept translator from the R or S language into Common Lisp. An increase in speed of over 1000 times can then be achieved using the free CMU Common Lisp native code compiler on the resulting code, making it comparable to FORTRAN, Ada or C.

The sole purpose the current version (0.1) is to allow the automatic translation and compilation of a simple benchmark. We didn't implement any feature not needed for this purpose.

Download R2cl-0.1.tgz released Jan 7, 2006.

Installation and usage. To install you need first to install cmucl (the CMU Common Lisp) and R. Both come with most Linux distributions by default but you must select them for installation. I think most recent (after 2000) versions will work. I have R 1.5.1 and cmucl x86-linux 3.0.8 18c+ running on Debian 3.0 (woody). You must be running the bash shell (which is the default) and have gnu make on your system (which you normally have).

Just untar the distribution archive:

tar xvzf R2cl-0.1.tgz

It will create the directory R2cl. Type:

cd R2cl
make

and watch translation, compilation and duration of execution of benchmarks. That's all, except if you want to have a look at the code.

Rationale. R is a reimplementation of the S language for statistical processing. It is a free project, see r-project.org. R is a highly expressive 'matrix' language (it makes extensive use of matrix and vector operators) with a huge library of statistical functions. However, expressivity requires a dynamic execution engine based on an 'infinite memory' model which is slow. It also requires that expressions and function can be constructed by a program or introduced by an operator at runtime thus enforcing the an interpreted approach. This makes strighforward implementations of it slow. For programs that might have an equivalent in FORTRAN, that FORTRAN equivalent would be over 1000 times faster.

Lisp is a language with the same execution model, except the that it is even more general, and include extensive macro and object oriented feature. There are may flavors, but the main currents are Scheme (which is reductionist, favoring simplicity) and Common Lisp (the ISO standard which is extensive and highly concerned with efficient compilation).

None of these languages would be easy to fully translate, say, in C. To have the complete language you need, at least in part, to have the intepreting engine and the 'infinite memory' machine intimately combined with the generated code.

However, there is no reason for pieces of code which make no use of the generality (such as source code generation and execution at runtime) not to be compilable to code as fast as FORTRAN. In practice, this is a huge task. Nevertheless, it was achieved in a free implementation, for example in CMU CommonLisp, as I was delighted to learn when trying these benchmarks.

The semantics of R is a subset of the semantics of Scheme. It also includes, of course, a wealth of data types suitable for statistics. The semantics (expressive mechanisms) of CommonLisp are probably a superset of Scheme. This means that R is easy to express in Scheme (actually the engine of the free implementation is a Scheme engine) and perhaps easy to translate in CommonLisp. This later is interesting because of the availability of the efficient compiler.

So I arrived at this toy project, to check if such a route for the efficient compilation of R is possible.

Many R applications don't need speed. You can describe the job you need done so quickly that you rarely mind to wait a couple of seconds for the system to do it. However, there are also whole classes of applications where speed is necessary and if a complete porting of R on CommonLisp (really big project) would be done they would all become feasible for R users like myself.