A.D. Corlan > Language Benchmarks |
Programming language benchmarks
Translations of this page
This page contains the same program, implemented in the same way, in C, Ada, FORTRAN, Lisp, FORTH, Java, Perl, R and Ruby and run on a 300MHz Pentium under the "woody" Debian GNU/Linux, using exclusively tools that come with the Debian distribution, except for compiling Java I used the jikes compiler from IBM.
This is relevant for applications which involve intensive computations for multiple users, such as simulators with open web access (like warfarissimo and autovaca). For such applications, performance translates directly into cost: the number of machines you need to buy and maintain in order to serve a given number of users or to achieve a required response time is directly proportional to computation time.
The program we benchmarked computes the same 100-term polynomial 500,000 times, using exactly the same algorithm. In all programs the indices of the polynomial are kept in a local float vector. In this, the program only tests the quality of code which accesses local vectors and performs simple arithmetics in loops, and is free from differences in the standard library, operating system calls and, indeed, the presence of any advanced language features.
There are two versions of the program: (1) all computations are done inside a single function body; (2) each polynomial is computed through a call to a function; the vector of the 100 coefficients are kept in a statically dimensioned array inside this function; it is called 500000 times.
The execution speed of the first version was also explored through benchmarks on AMD 64x2.
Results are below (in seconds needed to perform the simple task outlined above). Click on the link with a language name to see the details (program and how to compile and run) for each language:
Language | single body (s) | with call (s) |
FORTRAN, g77 V2.95.4 | 2.73 | 2.73 |
Ada 95, gnat V3.13p | 2.73 | 2.74 |
C, hand optimized **, gcc V2.95.4 | 2.73 | |
Java, gcj V3.0 | 3.03 | 15.53 |
D, gcc V4.0.3+ | 1)3.43 | 1)3.98 |
C, gcc V2.95.4 | 3.61 | 3.57 |
R translated to lisp using R2cl v0.1 and compiled with cmucl | 3.69 | |
Lisp, CMU Common Lisp V3.0.8 18c+, build 3030 | 4.69 | 10.69 |
Java, jikes V1.15 (bytecompiled) | 8.23 | 13.54 |
FORTH, hand optimized ** Gforth 0.6.1 | 1)18.21 | |
FORTH,** Gforth 0.6.1 | 1)27.26 | |
Python** +psyco (interpreted) | 1)168.50 | |
Perl, more optimized$ V5.6.1 (natively compiled) | 209.20 | |
Perl, more optimized$ V5.6.1 (interpreted) | 258.64 | |
Perl, hand optimized*** V5.6.1 (bytecompiled) | 306.18 | |
Perl* V5.6.1 (natively compiled) | 367.23 | |
Python** V2.1.2 (interpreted) | 505.50 | |
Perl* V5.6.1 (bytecompiled) | 515.04 | |
RUBY*** (interpreted) | 1074.52 | |
R V1.5.1 (interpreted) | 5662.64 |
1)estimated
Discussion. The relevance of these tests rests in the fact that in any language, performance is governed by how quickly operations, loop, access to array elements, function calls, etc, are performed. Machine-oriented languages (C, FORTRAN, Ada and to some extent Java) perform on average 100 times faster than human-expression-oriented languages (Perl, R, Python, Ruby).
Programming a 1 GHz Pentium [fastest, 1000$, 2002 PC] in Perl is like programming a 10 MHz something [overclocked, 50$, 1982 Z80-based ZX Spectrum] in FORTRAN!
This is no big surprise as we are looking at the classical trade-off between the power of a language (allowing the programmer to express something in a compact way) and the performace of an implementation (which classically was related to the language being close to the machine representation).
However, the huge exception is CommonLisp. Lisp is the most powerful language that is, representing the classical extreme choice for expressive power instead of efficient implementation.
This choice seems to have paid off. Given enough expression power it was possible to write a compiler that competes with any machine-oriented language.
Conclusions (for me):
An example is the offline generation of web views (such as this) of complex, heterogenous databases where there are two requirements: quick implementation of new ideas and a preference for updating the static web site in 10 minutes rather than 15 hours.
Appendix. Details of each program and how to compile and run it:
tespol.f
program tespol dimension pol(100) real pol integer i,j,n real su,pu,mu real x n = 500000 x = 0.2 mu = 10.0 pu = 0.0 do i = 1,n do j=1,100 mu = (mu + 2.0) / 2.0 pol(j) = mu enddo su = 0.0 do j=1,100 su = x * su + pol(j) enddo pu = pu + su enddo write (*,*) pu endCompile and run with:
f77 tespol.f -O6 -o tespol time ./tespol
tespol2.f
function dopoly(x) real x real su,mu integer j dimension pol(100) real pol do j=1,100 mu = (mu + 2.0) / 2.0 pol(j) = mu enddo su = 0.0 do j=1,100 su = x * su + pol(j) enddo dopoly = su end program tespol integer i real pu real x n = 500000 x = 0.2 mu = 10.0 pu = 0.0 do i = 1,n pu = pu + dopoly(x) enddo write (*,*) pu endCompile and run with:
f77 tespol2.f -O6 -o tespol2 time ./tespol2
tpol.adb
with Ada.Command_Line; use Ada.Command_Line; with Ada.Text_Io; use Ada.Text_Io; procedure Tpol is Pol: array(1..100) of Float; N: Integer:= Integer'Value(Argument(1)); X: Float:= Float'Value(Argument(2)); S: Float; Mu: Float:= 10.0; Pu: Float:= 0.0; begin for I in 1..N loop for J in 1..100 loop Mu := (Mu + 2.0) / 2.0; Pol(J) := Mu; end loop; S := 0.0; for J in 1..100 loop S := X * S + Pol(J); end loop; Pu := Pu+S; end loop; Put_Line(Float'Image(Pu)); end Tpol;
Compile and run with:
gnatmake -O6 tpol time ./tpol 500000 0.2
tpol2.adb
with Ada.Command_Line; use Ada.Command_Line; with Ada.Text_Io; use Ada.Text_Io; procedure Tpol2 is N: Integer:= Integer'Value(Argument(1)); X: Float:= Float'Value(Argument(2)); Pu: Float:= 0.0; function Dopol(X: Float) return Float is Pol: array(1..100) of Float; S: Float; Mu: Float:= 10.0; begin for J in 1..100 loop Mu := (Mu + 2.0) / 2.0; Pol(J) := Mu; end loop; S := 0.0; for J in 1..100 loop S := X * S + Pol(J); end loop; return S; end Dopol; begin for I in 1..N loop Pu := Pu+Dopol(X); end loop; Put_Line(Float'Image(Pu)); end Tpol2;
Compile and run with:
gnatmake -O6 tpol2 time ./tpol2 500000 0.2
public class tpoly { static public void main(String argv[]) { float mu = (float)10.0; float x,s; float pu = (float)0.0; int su, i, j, n; float pol[] = new float[100]; n = 500000; x = (float)0.2; for(i=0; i<n; i++) { for (j=0; j<100; j++) { mu = (mu + (float)2.0) / (float)2.0; pol[j] = mu; } s = (float)0.0; for (j=0; j<100; j++) { s = x*s + pol[j]; } pu += s; } System.out.println(pu); } }Compile and run with:
gcj-3.0 --main=tpoly --classpath=/usr/share/java/libgcj.jar -O2 -o tpoly tpoly.java time ./tpolyor compile to bytecode and run with:
jikes -O tpoly.java time java tpoly
public class tpoly2 { static float dopoly(float x) { float pol[] = new float[100]; int j; float mu = (float)10.0; float s; for (j=0; j<100; j++) { mu = (mu + (float)2.0) / (float)2.0; pol[j] = mu; } s = (float)0.0; for (j=0; j<100; j++) { s = x*s + pol[j]; } return s; } static public void main(String argv[]) { float x; float pu = (float)0.0; int i, n; n = 500000; x = (float)0.2; for(i=0; i<n; i++) { pu += dopoly(x); } System.out.println(pu); } }Compile and run with:
gcj-3.0 --main=tpoly --classpath=/usr/share/java/libgcj.jar -O2 -o tpoly2 tpoly2.java time ./tpoly2or compile to bytecode and run with:
jikes -O tpoly2.java time java tpoly2
tepol.c
#include <stdio.h> #include <stdlib.h> main(short argc, char **argv) { float mu = 10.0; float x,s; float pu = 0.0; int su, i, j, n; float pol[100]; n = atol(argv[1]); x = atof(argv[2]); for(i=0; i<n; i++) { for (j=0; j<100; j++) { pol[j] = mu = (mu + 2.0) / 2.0; } s = 0.0; for (j=0; j<100; j++) { s = x*s + pol[j]; } pu += s; } printf("%f\n",pu); }Compile and run with:
gcc -O6 tepol.c -o tepol time ./tepol 500000 0.2
tepol2.c
#include <stdio.h> #include <stdlib.h> float dopoly(float x) { float mu = 10.0; float s; int j; float pol[100]; for (j=0; j<100; j++) { pol[j] = mu = (mu + 2.0) / 2.0; } s = 0.0; for (j=0; j<100; j++) { s = x*s + pol[j]; } return s; } main(short argc, char **argv) { float x; float pu = 0.0; int i, n; n = atol(argv[1]); x = atof(argv[2]); for(i=0; i<n; i++) { pu += dopoly(x); } printf("%f\n",pu); }Compile and run with:
gcc -O6 tepol2.c -o tepol2 time ./tepol2 500000 0.2
bench1.c (by Mihai Manolescu)
#include <stdio.h> #include <stdlib.h> main(short argc, char **argv) { float mu = 10.0; float x,s; float pu = 0.0; int su, i, j, n; float pol[100]; register tm1; register int tp1; n = atol(argv[1]); x = atof(argv[2]); for(i=0; i<n; i++) { tp1=2; tm1=1/2.0; for (j=0; j<100; j++) { pol[j] = mu = (mu + tp1) * tm1; } s = 0.0; for (j=0; j<100; j++) { s = x*s + pol[j]; } pu += s; } printf("%f\n",pu); }
testpol.lisp
(defun eval-pol (n x) (declare (fixnum n) (single-float x)) (let ((su 0.0) (mu 10.0) (pu 0.0) (pol (make-array 100 :element-type 'single-float))) (declare (single-float su) (single-float mu) (single-float pu)) (dotimes (i n) (declare (fixnum i)) (setf su 0.0) (dotimes (j 100) (declare (fixnum j)) (setf mu (the single-float (/ (+ mu 2.0) 2.0))) (setf (aref pol j) (the single-float mu))) (dotimes (j 100) (declare (fixnum j)) (setf su (the single-float (+ (aref pol j) (the single-float (* su x)))))) (setf pu (the single-float (+ pu su))) ) (prin1 pu) ))Start CMU Common lisp:
lisp... then load, compile and run with:
(load "testpol.lisp") (compile #'eval-pol) (time (eval-pol 500000 0.2))
testpol2.lisp
(proclaim '(optimize)) (declaim (start-block dopoly eval-pol)) (defun dopoly (x) (declare (ftype (function (single-float) single-float) dopoly)) (declare (single-float x)) (let ((su 0.0) (mu 10.0) (pol (make-array 100 :element-type 'single-float))) (declare (single-float su) (single-float mu)) (dotimes (j 100) (declare (fixnum j)) (setf mu (the single-float (/ (the single-float (+ mu 2.0)) 2.0))) (setf (aref pol j) (the single-float mu))) (dotimes (j 100) (declare (fixnum j)) (setf su (the single-float (+ (aref pol j) (the single-float (* su x)))))) su) ) (defun eval-pol (n x) (declare (fixnum n) (single-float x)) (let ((pu 0.0)) (declare (single-float pu)) (dotimes (i n) (declare (fixnum i)) (setf pu (the single-float (+ pu (the single-float (dopoly x))))) ) (prin1 pu) )) (declaim (end-block))Start CMU Common lisp:
lisp... then load, compile and run with:
(load "testpol2.lisp") (compile #'eval-pol) (time (eval-pol 500000 0.2))
tepol.d (by Mihai Militaru)
//tepol.d import std.stdio; import std.conv; int main(char[][] args) { float mu = 10.0; float x,s; float pu = 0.0; int su, i, j, n; float pol[100]; n = toLong(args[1]); x = toFloat(args[2]); for(i=0; i<n; i++) { for (j=0; j<100; j++) { pol[j] = mu = (mu + 2.0) / 2.0; } s = 0.0; for (j=0; j%lt;100; j++) { s = x*s + pol[j]; } pu += s; } writefln("%f\n",pu); return 0; }Compile with:
gdmd -O -release -inline tepol.dRun with:
time ./tepol 500000 0.2
tepol2.d (by Mihai Militaru)
//tepol2.d import std.stdio; import std.conv; float dopoly(float x) { float mu = 10.0; float s; int j; float pol[100]; for (j=0; j<100; j++) { pol[j] = mu = (mu + 2.0) / 2.0; } s = 0.0; for (j=0; j<100; j++) { s = x*s + pol[j]; } return s; } int main(char[][] args) { float x; float pu = 0.0; int i, n; n = toLong(args[1]); x = toFloat(args[2]); for(i=0; i<n; i++) { pu += dopoly(x); } printf("%f\n",pu); return 0; }Compile with:
gdmd -O -release -inline tepol2.dRun with:
time ./tepol2 500000 0.2
tpytpol.py (by Mihai Manolescu)
n = 500000 x = 0.2 def t(x): mu = 10.0 pu = 0.0 pol =[0] * 100 r = range(0,100) for i in range(0,n): for j in r: pol[j] = mu = (mu + 2.0) / 2.0 su = 0.0 for j in r: su = x * su + pol[j] pu = pu + su print pu t(x)Run with:
time python tpytpol.py
tperlpol.pl (by Matei Conovici)
#! /usr/bin/perl $n = $ARGV[0]; $x = $ARGV[1]; $mu = 10; $pu = 0; @pol = (); for ($i = 0; $i < $n; $i++) { for ($j = 0; $j < 100; $j++) { $mu = ($mu + 2) / 2; $pol[$j] = $mu; } $s = 0; for ($j = 0; $j < 100; $j++) { $s = $x * $s + $pol[$j]; } $pu += $s; } print "$pu\n";Compile and run with:
perlcc -O -o tperlpol tperlpol.pl time ./tperlpol 500000 0.2or, for byte compilation:
perlcc -B -o tperlpol tperlpol.pl time ./tperlpol 500000 0.2
pl.pl (by zgrim)
my($n,$x,$mu,$pu,@pol)=(500000,0.2,10,0,(0..99)); for(1..$n) { for (0..$#pol) { $pol[$_] = $mu = ( $mu + 2 ) * 0.5 } $s = 0; for(0..$#pol) { $s = $x * $s + $pol[$_] } $pu += $s } print qq[$pu\n];Compile and run with:
perlcc -B -o pl pl.pl time ./pl
Perl (hand optimized for native compilation)
perl3.pl by Radu Greab
#!/usr/bin/perl -w use strict; my $n = $ARGV[0]; my $x = $ARGV[1]; my $mu = 10; my $pu = 0; my @pol; foreach (0 .. $n - 1) { foreach (0 .. 99) { $pol[$_] = $mu = ($mu + 2) / 2; } my $s = 0; foreach (0 .. 99) { $s = $x * $s + $pol[$_]; } $pu += $s; } print "$pu\n";Compile and run with:
perlcc -O -o perl3 perl3.pl time ./perl3
trpol2.R
trpol2 <- function(n,x) { mu <<- 10.0 pu <<- 0.0 pol <<- 1:100 tp1 <<- 2.0 tm1 <<- 1/2.0 for (i in 1:n) { for (j in 1:100) { mu <<- (mu + tp1) * tm1 pol[j] <<- mu } s <<- 0.0; for (j in 1:100) { s <<- x*s + pol[j]; } pu <- s+pu; } print(pu) }Execute the program with:
time echo " source(\"trpol2.R\") ; trpol2(500000,0.2) ; q() " | R --no-save
pol.rb
n = 500000 x = 0.2 mu = 10 pu = 0 pol = [] n.times do 0.upto(99) { |j| pol[j] = mu = (mu + 2) * 0.5 } s = 0 0.upto(99) { |j| s = x * s + pol[j] } pu += s end print pu,"\n"Execute the program with:
time ruby pol.rb
bench.fs
--- bench.fs: \ gforth source code for speed test \ mmihai, sept '04 \ 100 floats allocate throw constant farray : init_farray 0e0 farray 100 0 do dup fdup f! float+ loop drop fdrop ; : my_loop init_farray 0e0 \ x pu 10e0 \ x pu mu 0 do farray 100 0 do 2e0 f+ f2/ dup fdup f! float+ loop drop \ x pu mu frot frot \ mu x pu fswap 0e0 \ mu pu x su farray 100 0 do fover f* dup f@ f+ float+ loop drop frot f+ \ mu x pu frot \ x pu mu loop fdrop f. cr fdrop ; 0.2e0 500000 my_loopExecute the program with:
time gforth-fast bench.fs -e bye
FORTH, hand optimised (by Mihai Manolescu)
bench2.fs
\ gforth source code for speed test \ mmihai, sept '04 \ falign here 100 floats allot constant farray : my_loop 0e0 \ x pu 10e0 \ x pu mu 0 do farray 2e0 fswap 100 0 do fover f+ fover f/ dup fdup f! float+ loop fnip drop \ x pu mu frot frot \ mu x pu fswap 0e0 \ mu pu x su farray 100 0 do dup fover f* f@ f+ float+ loop drop frot f+ \ mu x pu frot \ x pu mu loop fdrop f. cr fdrop ; 0.2e0 500000 my_loopExecute the program with:
time gforth-fast bench2.fs -e bye
Copyright (c) 2003 Alexandru Dan Corlan et al.