Programming language benchmarks

A.D. Corlan > Language Benchmarks

This page contains the same program, implemented in the same way, in C, Ada, FORTRAN, Lisp, FORTH, Java, Perl, R and Ruby and run on a 300MHz Pentium under the "woody" Debian GNU/Linux, using exclusively tools that come with the Debian distribution, except for compiling Java I used the jikes compiler from IBM.

This is relevant for applications which involve intensive computations for multiple users, such as simulators with open web access (like warfarissimo and autovaca). For such applications, performance translates directly into cost: the number of machines you need to buy and maintain in order to serve a given number of users or to achieve a required response time is directly proportional to computation time.

The program we benchmarked computes the same 100-term polynomial 500,000 times, using exactly the same algorithm. In all programs the indices of the polynomial are kept in a local float vector. In this, the program only tests the quality of code which accesses local vectors and performs simple arithmetics in loops, and is free from differences in the standard library, operating system calls and, indeed, the presence of any advanced language features.

There are two versions of the program: (1) all computations are done inside a single function body; (2) each polynomial is computed through a call to a function; the vector of the 100 coefficients are kept in a statically dimensioned array inside this function; it is called 500000 times.

The execution speed of the first version was also explored through benchmarks on AMD 64x2.

Results are below (in seconds needed to perform the simple task outlined above). Click on the link with a language name to see the details (program and how to compile and run) for each language:

Language	single body (s)	with call (s)
FORTRAN, g77 V2.95.4	2.73	2.73
Ada 95, gnat V3.13p	2.73	2.74
C, hand optimized ^**, gcc V2.95.4	2.73
Java, gcj V3.0	3.03	15.53
D, gcc V4.0.3⁺	¹⁾3.43	¹⁾3.98
C, gcc V2.95.4	3.61	3.57
R translated to lisp using R2cl v0.1 and compiled with cmucl	3.69
Lisp, CMU Common Lisp V3.0.8 18c+, build 3030	4.69	10.69
Java, jikes V1.15 (bytecompiled)	8.23	13.54
FORTH, hand optimized ** Gforth 0.6.1	¹⁾18.21
FORTH,** Gforth 0.6.1	¹⁾27.26
Python** +psyco (interpreted)	¹⁾168.50
Perl, more optimized^$ V5.6.1 (natively compiled)	209.20
Perl, more optimized^$ V5.6.1 (interpreted)	258.64
Perl, hand optimized*** V5.6.1 (bytecompiled)	306.18
Perl* V5.6.1 (natively compiled)	367.23
Python** V2.1.2 (interpreted)	505.50
Perl* V5.6.1 (bytecompiled)	515.04
RUBY*** (interpreted)	1074.52
R V1.5.1 (interpreted)	5662.64

* Contributed by Matei Conovici
** Contributed by Mihai Manolescu
*** Contributed by zgrim
^$ Contributed by Radu Greab
⁺ Contributed by Mihai Militaru

¹⁾estimated

naive (no hand-optimization) gcc-compiled C is some 30% slower than fortran and ada, otherwise none is affected by a detectable overhead when allocating a local vector in a function;
the gcj compiler does a good work, increasing the speed of code more than twice; however, allocation of local variables (the `pol' array for example) offsets all the gain obtained through native code compilation;
the performance of the CMU-CL lisp compiler was stellar; it was better than Java, even compiled natively, despite its highly dynamic nature; of 10.69 seconds, some 6.5 are for garbage collection.
bytecompiled Perl code is comparable in speed with interpreted R code and is not much slower than Perl code compiled natively; notice the fact that in R we use vectorial operators which also make the text very compact.

Discussion. The relevance of these tests rests in the fact that in any language, performance is governed by how quickly operations, loop, access to array elements, function calls, etc, are performed. Machine-oriented languages (C, FORTRAN, Ada and to some extent Java) perform on average 100 times faster than human-expression-oriented languages (Perl, R, Python, Ruby).

Programming a 1 GHz Pentium [fastest, 1000$, 2002 PC] in Perl is like programming a 10 MHz something [overclocked, 50$, 1982 Z80-based ZX Spectrum] in FORTRAN!

This is no big surprise as we are looking at the classical trade-off between the power of a language (allowing the programmer to express something in a compact way) and the performace of an implementation (which classically was related to the language being close to the machine representation).

However, the huge exception is CommonLisp. Lisp is the most powerful language that is, representing the classical extreme choice for expressive power instead of efficient implementation.

This choice seems to have paid off. Given enough expression power it was possible to write a compiler that competes with any machine-oriented language.

Conclusions (for me):

For new projects in which one is free to choose the language, one should choose Lisp.
An example is the offline generation of web views (such as this) of complex, heterogenous databases where there are two requirements: quick implementation of new ideas and a preference for updating the static web site in 10 minutes rather than 15 hours.
If efficiency is very critical and only if the application must be multithreaded, one should choose Ada. Multithreaded, online simulation servers, like the ones mentioned above, are typical examples.
Other languages, with which one might be highly familiar due to historical reasons, could become efficient languages by being translated into CommonLisp and compiled with CMU-CL.

Appendix. Details of each program and how to compile and run it:

FORTRAN

tespol.f

      program tespol
      dimension pol(100)
      real pol
      integer i,j,n
      real su,pu,mu
      real x

      n = 500000
      x = 0.2
      mu = 10.0
      pu = 0.0
      do i = 1,n
         do j=1,100
            mu = (mu + 2.0) / 2.0
            pol(j) = mu
         enddo
         su = 0.0
         do j=1,100
            su = x * su + pol(j)
         enddo
         pu = pu + su
      enddo
      write (*,*) pu
      end

Compile and run with:

f77 tespol.f -O6 -o tespol
time ./tespol

tespol2.f

      function dopoly(x)
      real x
      real su,mu
      integer j
      dimension pol(100)
      real pol

         do j=1,100
            mu = (mu + 2.0) / 2.0
            pol(j) = mu
         enddo
         su = 0.0
         do j=1,100
            su = x * su + pol(j)
         enddo

      dopoly = su
      end

      program tespol

      integer i
      real pu
      real x


      n = 500000
      x = 0.2
      mu = 10.0
      pu = 0.0
      do i = 1,n
         pu = pu + dopoly(x)
      enddo
      write (*,*) pu
      end

Compile and run with:

f77 tespol2.f -O6 -o tespol2
time ./tespol2

Ada-95

tpol.adb

with Ada.Command_Line; use Ada.Command_Line;
with Ada.Text_Io; use Ada.Text_Io;

procedure Tpol is

   Pol: array(1..100) of Float;
   N: Integer:= Integer'Value(Argument(1));
   X: Float:= Float'Value(Argument(2));
   S: Float;
   Mu: Float:= 10.0;
   Pu: Float:= 0.0;

begin
   for I in 1..N loop
      for J in 1..100 loop
         Mu := (Mu + 2.0) / 2.0;
         Pol(J) := Mu;
      end loop;
      S := 0.0;
      for J in 1..100 loop
         S := X * S + Pol(J);
      end loop;
      Pu := Pu+S;
   end loop;
   Put_Line(Float'Image(Pu));
end Tpol;

Compile and run with:

gnatmake -O6 tpol
time ./tpol 500000 0.2

tpol2.adb

with Ada.Command_Line; use Ada.Command_Line;
with Ada.Text_Io; use Ada.Text_Io;

procedure Tpol2 is

   N: Integer:= Integer'Value(Argument(1));
   X: Float:= Float'Value(Argument(2));
   Pu: Float:= 0.0;

   function Dopol(X: Float) return Float is

      Pol: array(1..100) of Float;
      S: Float;
      Mu: Float:= 10.0;

   begin
      for J in 1..100 loop
         Mu := (Mu + 2.0) / 2.0;
         Pol(J) := Mu;
      end loop;
      S := 0.0;
      for J in 1..100 loop
         S := X * S + Pol(J);
      end loop;
      return S;
   end Dopol;

begin
   for I in 1..N loop
      Pu := Pu+Dopol(X);
   end loop;
   Put_Line(Float'Image(Pu));
end Tpol2;

Compile and run with:

gnatmake -O6 tpol2
time ./tpol2 500000 0.2

Java

public class tpoly {

    static public void main(String argv[]) {
     float mu = (float)10.0;
     float x,s;
     float pu = (float)0.0;
     int su, i, j, n;
     float pol[] = new float[100];

     n = 500000;
     x = (float)0.2;
     for(i=0; i<n; i++) {
       for (j=0; j<100; j++) {
         mu =  (mu + (float)2.0) / (float)2.0;
         pol[j] = mu;
       }
       s = (float)0.0;
       for (j=0; j<100; j++) {
         s = x*s + pol[j];
       }
       pu += s;
     }
     System.out.println(pu);
  }
}

Compile and run with:

gcj-3.0 --main=tpoly --classpath=/usr/share/java/libgcj.jar -O2 -o tpoly tpoly.java
time ./tpoly

or compile to bytecode and run with:

jikes -O tpoly.java
time java tpoly

public class tpoly2 {

    static float dopoly(float x) {
	float pol[] = new float[100];
	int j;
	float mu = (float)10.0;
	float s;

	for (j=0; j<100; j++) {
	    mu =  (mu + (float)2.0) / (float)2.0;
	    pol[j] = mu;
	}
	s = (float)0.0;
	for (j=0; j<100; j++) {
	    s = x*s + pol[j];
	}
	return s;

    }

    static public void main(String argv[]) {
     float x;
     float pu = (float)0.0;
     int i, n;

     n = 500000;
     x = (float)0.2;
     for(i=0; i<n; i++) {
       pu += dopoly(x);
     }
     System.out.println(pu);
  }
}

Compile and run with:

gcj-3.0 --main=tpoly --classpath=/usr/share/java/libgcj.jar -O2 -o tpoly2 tpoly2.java
time ./tpoly2

or compile to bytecode and run with:

jikes -O tpoly2.java
time java tpoly2

tepol.c

#include <stdio.h>
#include <stdlib.h>

main(short argc, char **argv) {
  float mu = 10.0;
  float x,s;
  float pu = 0.0;
  int su, i, j, n;
  float pol[100];

  n = atol(argv[1]);
  x = atof(argv[2]);
  for(i=0; i<n; i++) {
    for (j=0; j<100; j++) {
      pol[j] = mu = (mu + 2.0) / 2.0;
    }
    s = 0.0;
    for (j=0; j<100; j++) {
      s = x*s + pol[j];
    }
    pu += s;
  }
  printf("%f\n",pu);
}

Compile and run with:

gcc -O6 tepol.c -o tepol
time ./tepol 500000 0.2

tepol2.c


#include <stdio.h>
#include <stdlib.h>

float dopoly(float x) {

  float mu = 10.0;
  float s;
  int j;
  float pol[100];

  for (j=0; j<100; j++) {
    pol[j] = mu = (mu + 2.0) / 2.0;
  }
  s = 0.0;
  for (j=0; j<100; j++) {
    s = x*s + pol[j];
  }
  return s;
}

main(short argc, char **argv) {
  float x;
  float pu = 0.0;
  int  i, n;

  n = atol(argv[1]);
  x = atof(argv[2]);
  for(i=0; i<n; i++) {
    pu += dopoly(x);
  }
  printf("%f\n",pu);
}

Compile and run with:

gcc -O6 tepol2.c -o tepol2
time ./tepol2 500000 0.2

Hand optimised C

bench1.c (by Mihai Manolescu)


#include <stdio.h>
#include <stdlib.h>

main(short argc, char **argv) {
  float mu = 10.0;
  float x,s;
  float pu = 0.0;
  int su, i, j, n;
  float pol[100];
  register tm1;
  register int tp1;

  n = atol(argv[1]);
  x = atof(argv[2]);
  for(i=0; i<n; i++) {
    tp1=2;
    tm1=1/2.0;
    for (j=0; j<100; j++) {
      pol[j] = mu = (mu + tp1) * tm1;
    }
    s = 0.0;
    for (j=0; j<100; j++) {
      s = x*s + pol[j];
    }
    pu += s;
  }
  printf("%f\n",pu);
}

Common-Lisp

testpol.lisp

(defun eval-pol (n x)
  (declare (fixnum n) (single-float x))
  (let ((su 0.0) (mu 10.0) (pu 0.0)
	(pol (make-array 100 :element-type 'single-float)))
    (declare (single-float su) (single-float mu) (single-float pu))
    (dotimes (i n)
	     (declare (fixnum i))
	     (setf su 0.0)
	     (dotimes (j 100)
		      (declare (fixnum j))
		      (setf mu (the single-float (/ (+ mu 2.0) 2.0)))
		      (setf (aref pol j)
			    (the single-float mu)))
	     (dotimes (j 100)
		      (declare (fixnum j))
		      (setf su (the single-float
				    (+ (aref pol j) (the single-float (* su x))))))
	     (setf pu (the single-float (+ pu su)))
	     )
    (prin1 pu)
    ))

Start CMU Common lisp:

lisp

... then load, compile and run with:

(load "testpol.lisp")
(compile #'eval-pol)
(time (eval-pol 500000 0.2))

testpol2.lisp


(proclaim '(optimize))

(declaim (start-block dopoly eval-pol))

  (defun dopoly (x) 
    (declare (ftype (function (single-float) single-float) dopoly))
    (declare (single-float x))
    (let ((su 0.0) (mu 10.0)
	  (pol (make-array 100 :element-type 'single-float)))
      (declare (single-float su) (single-float mu))
      (dotimes (j 100)
	(declare (fixnum j))
	(setf mu (the single-float (/ (the single-float (+ mu 2.0)) 2.0)))
	(setf (aref pol j)
	      (the single-float mu)))
      (dotimes (j 100)
	(declare (fixnum j))
	(setf su (the single-float
		      (+ (aref pol j) (the single-float (* su x))))))
      su)
    )


(defun eval-pol (n x)
  (declare (fixnum n) (single-float x))
  (let ((pu 0.0))
    (declare (single-float pu))
    (dotimes (i n)
	     (declare (fixnum i))
	     (setf pu (the single-float (+ pu (the single-float (dopoly x)))))
	     )
    (prin1 pu)
    ))

(declaim (end-block))

Start CMU Common lisp:

lisp

... then load, compile and run with:

(load "testpol2.lisp")
(compile #'eval-pol)
(time (eval-pol 500000 0.2))

tepol.d (by Mihai Militaru)

//tepol.d

import std.stdio;
import std.conv;

int main(char[][] args)
{
	float mu = 10.0;
	float x,s;
	float pu = 0.0;
	int su, i, j, n;
	float pol[100];

	n = toLong(args[1]);
	x = toFloat(args[2]);
	for(i=0; i<n; i++)
	{
		for (j=0; j<100; j++)
	{
		pol[j] = mu = (mu + 2.0) / 2.0;
	}
	s = 0.0;
	for (j=0; j%lt;100; j++)
	{
		s = x*s + pol[j];
	}
		pu += s;
	}
	writefln("%f\n",pu);
	return 0;
}

Compile with:

gdmd -O -release -inline tepol.d

Run with:

time ./tepol 500000 0.2

tepol2.d (by Mihai Militaru)

//tepol2.d

import std.stdio;
import std.conv;

float dopoly(float x)
{
	float mu = 10.0;
	float s;
	int j;
	float pol[100];

	for (j=0; j<100; j++)
	{
		pol[j] = mu = (mu + 2.0) / 2.0;
	}
	s = 0.0;
	for (j=0; j<100; j++)
	{
		s = x*s + pol[j];
	}
	return s;
}

int main(char[][] args)
{
	float x;
	float pu = 0.0;
	int  i, n;

	n = toLong(args[1]);
	x = toFloat(args[2]);
	for(i=0; i<n; i++)
	{
	pu += dopoly(x);
	}
	printf("%f\n",pu);
	return 0;
}

Compile with:

gdmd -O -release -inline tepol2.d

Run with:

time ./tepol2 500000 0.2

Python

tpytpol.py (by Mihai Manolescu)

n = 500000
x = 0.2

def t(x):
    mu = 10.0
    pu = 0.0
    pol =[0] * 100
    r = range(0,100)

    for i in range(0,n):
        for j in r:
            pol[j] = mu = (mu + 2.0) / 2.0
        su = 0.0
        for j in r:
            su = x * su + pol[j]
        pu = pu + su
    print pu

t(x)

Run with:

time python tpytpol.py

Perl

tperlpol.pl (by Matei Conovici)

#! /usr/bin/perl


$n = $ARGV[0];
$x = $ARGV[1];

$mu = 10;
$pu = 0;

@pol = ();

for ($i = 0; $i < $n; $i++) {
	for ($j = 0; $j < 100; $j++) {
		$mu = ($mu + 2) / 2;
		
		$pol[$j] = $mu;
	}

	$s = 0;
	for ($j = 0; $j < 100; $j++) {
		$s = $x * $s + $pol[$j];
	}

	$pu += $s;
}

print "$pu\n";

Compile and run with:

perlcc -O -o tperlpol tperlpol.pl
time ./tperlpol 500000 0.2

or, for byte compilation:

perlcc -B -o tperlpol tperlpol.pl
time ./tperlpol 500000 0.2

Perl (hand optimized)

pl.pl (by zgrim)

my($n,$x,$mu,$pu,@pol)=(500000,0.2,10,0,(0..99));
for(1..$n) {
    for (0..$#pol)  {   $pol[$_] = $mu = ( $mu + 2 ) * 0.5      } 
    $s = 0;
    for(0..$#pol)   {   $s = $x * $s + $pol[$_]                 } 
    $pu += $s
} 
print qq[$pu\n];

Compile and run with:

perlcc -B -o pl pl.pl
time ./pl

Perl (hand optimized for native compilation)

perl3.pl by Radu Greab

#!/usr/bin/perl -w

use strict;

my $n = $ARGV[0];
my $x = $ARGV[1];

my $mu = 10;
my $pu = 0;

my @pol;

foreach (0 .. $n - 1) {
    foreach (0 .. 99) {
        $pol[$_] = $mu = ($mu + 2) / 2;
    }

    my $s = 0;
    foreach (0 .. 99) {
        $s = $x * $s + $pol[$_];
    }

    $pu += $s;
}

print "$pu\n";

Compile and run with:

perlcc -O -o perl3 perl3.pl
time ./perl3

R (or S, Splus)

trpol2.R

trpol2 <- function(n,x) {
  mu <<- 10.0
  pu <<- 0.0
  pol <<- 1:100
  tp1 <<- 2.0
  tm1 <<- 1/2.0
  for (i in 1:n) {
    for (j in 1:100) {
      mu <<- (mu + tp1) * tm1
      pol[j] <<- mu
    }
    s <<- 0.0;
    for (j in 1:100) {
      s <<- x*s + pol[j];
    }
    pu <- s+pu;
  }
  print(pu)
}

Execute the program with:

time echo " source(\"trpol2.R\") ; trpol2(500000,0.2) ; q() " | R --no-save

Ruby (by zgrim)

pol.rb

n = 500000
x = 0.2
mu = 10
pu = 0
pol = []

n.times do
    0.upto(99) { |j| pol[j] = mu = (mu + 2) * 0.5 }
    s = 0
    0.upto(99) { |j| s = x * s + pol[j] }
    pu += s
end

print pu,"\n"

Execute the program with:

time ruby pol.rb

FORTH (by Mihai Manolescu)

bench.fs


--- bench.fs:

\ gforth source code for speed test
\                                                       mmihai, sept '04
\

100 floats allocate throw
constant farray

: init_farray
    0e0
    farray
    100 0 do
        dup fdup f!
        float+
    loop
    drop fdrop
;

: my_loop
    init_farray
    0e0                 \ x pu
    10e0                \ x pu mu

    0 do
        farray
        100 0 do
            2e0 f+ f2/
            dup fdup f!
            float+
        loop
        drop                    \ x pu mu
        frot frot               \ mu x pu
        fswap 0e0               \ mu pu x su
        farray
        100 0 do
            fover  f*
            dup f@ f+
            float+
        loop
        drop
        frot f+                 \ mu x pu
        frot                    \ x pu mu
    loop
    fdrop f. cr fdrop
;

0.2e0 500000 my_loop

Execute the program with:

time gforth-fast bench.fs -e bye

FORTH, hand optimised (by Mihai Manolescu)

bench2.fs

\ gforth source code for speed test
\                                                       mmihai, sept '04
\

falign here 100 floats allot constant farray

: my_loop
    0e0                 \ x pu
    10e0                \ x pu mu

    0 do
        farray
        2e0 fswap
        100 0 do
            fover f+ fover f/
            dup fdup f!
            float+
        loop
        fnip
        drop                    \ x pu mu
        frot frot               \ mu x pu
        fswap 0e0               \ mu pu x su
        farray
        100 0 do
            dup
            fover  f*
            f@ f+
            float+
        loop
        drop
        frot f+                 \ mu x pu
        frot                    \ x pu mu
    loop
    fdrop f. cr fdrop
;

0.2e0 500000 my_loop

Execute the program with:

time gforth-fast bench2.fs -e bye