the blog
Back to -Blog

More Cylon benchmarks

Cylon and Pentium 4 and Core Duo
by Dermot Hogan
Wednesday 31 January 2007.

There’s a been a query on the preliminary Cylon benchmarks I ran a few months ago (Cylon is the fast debugger in the Developer Edition of Ruby In Steel), so I’ve just redone the benchmarks. I get similar results to those I got before. This is with no breakpoints so I’m purely measuring the raw Cylon overhead with no extraneous factors like breakpoints.

The results are a bit slower than before because I made some changes to the code after I had done the original benchmarks and I haven’t re-optimised the code paths again. In the original benchmarks, I hand optimised the Cylon code paths eliminating every extra branch and un-needed call in the most commonly used paths. I’ll do that again when we release the multi-threaded version later in the year.


This is on a 2.8GHz Pentium 4 with 1.5GB of memory.

Pentium count debug nodebug overhead
factorial 100000 25.109 20.938 20%
linear 100000000 87.031 59.859 45%

And this is on a 1GHz Celeron with 1GB of memory.

Celeron count debug nodebug overhead
factorial 100000 144.348 95.978 50%
linear 100000000 178.336 106.974 66%

I’ve also done the benchmarks on a Core Duo 6600 with 2GB of memory.

Core Duo count debug nodebug overhead
factorial 100000 14.844 14.297 4%
linear 100000000 44.203 27.344 62%


I’ve done the Core Duo one several times with various values of the count and got the same ratios (yes, I did check that I was actually running the debugger).

I can’t figure out what is going on with the factorial Core Duo or why the linear measure is slower than on the Pentium 4 – but those are the figures I got. I would guess that there might be something odd in the Ruby interpreter that causes this – last time I looked it was compiled using Microsoft C++ v6. Equally, there might be something weird in my code that throwing the benchmarks. When I get a bit more time (when I do the multithreaded version), I’ll recompile Ruby using the latest C++ compiler and see what I get.


It seems to me that my Pentium 4 benchmarks are pretty much in line with what I got previously. They are a bit slower than the first benchmarks, and that’s down to me not optimising things as far as I did initially. It’s certainly not twice as slow.

The Celeron doesn’t show as much of an improvement as the Pentium 4 for some reason. However, it’s a pretty ancient machine - over 6 years old now .

The Core Duo results need more exploration. In one case the Cylon debugger only adds 4% in overhead (which I find difficult to believe, but those are indeed the results) while for the linear test, it adds 60% overhead. It’s still nothing like twice the overhead.

It would seem that there’s more to the Intel Core Duo than meets the eye.


The code I used is here. It’s very simple:

def fac(n)
        n == 1 ? 1 : n * fac(n-1)

count = 0

tstart =
#0.upto(100000) {fac(50)}
0.upto(100000000) {count += 1}
tend =
puts "%10.3f" % tstart
puts "%10.3f" % tend.to_f
diff = tend - tstart
puts "%10.3f" % diff.to_f
Bookmark and Share   Keywords:  ide
© SapphireSteel Software 2014