Updated C compiler benchmarks, Feb 2012

With several new compiler releases it’s time to update my compiler benchmark results (last round of results here, description of the benchmark programs here). Note that the tests were on a different machine this time and the number of iterations was tweaked so numerical results aren’t comparable.

Without further ado:

So, what’s interesting in this set of results? Generally, note that LLVM and GCC now substantially compete for dominance. Notably, GCC canes LLVM in bm4, where it appears that GCC generates very good “memmove” code, whereas LLVM generates much more concise but apparently also much slower code; also in bm8 (trivial loop removal – though neither compiler actually performs this optimization, GCC apparently generates faster code). On the other hand LLVM beats GCC quite handily in bm6 (a common subexpression elimination problem).

GCC 4.6.2 improves quite a bit over 4.5.2 in bm5 (essentially a common subexpression refactoring test). However, it’s slightly worse in bm3 and for some reason there is a huge drop in performance for the bm7 test (stack placement of returned structure).

The Firm suite, surprisingly, mostly loses ground with 1.20.0 and remains uncompetitive.

Edit 25/03/12: I’ve noticed a flaw in bm6, which when corrected causes GCC to perform much worse – about 0.5 seconds rather than the 0.288 reported above.

Advertisements

11 thoughts on “Updated C compiler benchmarks, Feb 2012

      1. Hallo Dav, excuse the spam, I had sent these posts from my phone which failed to show the already posted comments. Greetings.

  1. Hello Davin ( i hope that is your name?),

    for the record, some results as of 2018:
    I ran clang 3.8.1-24 against gcc 6.3.0 on Debian x64:

    wall time in seconds

    bm1.c-gcc
    1.27
    1.27
    1.28
    1.27
    1.27
    #clang seems to do (unnecessary?) stuff before returning
    bm1.c-clang
    1.59
    1.59
    1.60
    1.59
    1.64

    #shift not simplified!
    bm10.c-gcc
    0.25
    0.25
    0.25
    0.24
    0.24
    bm10.c-clang
    0.01
    0.01
    0.01
    0.01
    0.01

    bm2.c-gcc
    0.98
    0.99
    0.99
    1.00
    1.05
    bm2.c-clang
    1.00
    0.99
    1.00
    0.98
    0.98

    bm3.c-gcc
    0.49
    0.49
    0.49
    0.49
    0.49
    bm3.c-clang
    0.49
    0.49
    0.49
    0.49
    0.49

    bm4.c-gcc
    0.56
    0.45
    0.57
    0.55
    0.54
    bm4.c-clang
    0.49
    0.49
    0.64
    0.66
    0.51

    # “NumSift”
    #https://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
    #still open as of 201803
    bm5.c-gcc
    0.14
    0.13
    0.13
    0.13
    0.13
    bm5.c-clang
    0.08
    0.08
    0.08
    0.08
    0.08

    # redundant && || not eliminated
    #http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32306
    #Still open as of 201803
    bm6.c-gcc
    0.31
    0.31
    0.31
    0.31
    0.31
    bm6.c-clang
    0.17
    0.17
    0.17
    0.17
    0.17

    #gcc’s issues seem to be resolved
    bm7.c-gcc
    0.94
    0.90
    0.90
    0.94
    0.93
    bm7.c-clang
    0.93
    0.90
    0.93
    0.93
    0.94

    bm8.c-gcc
    0.25
    0.25
    0.25
    0.25
    0.25
    bm8.c-clang
    0.24
    0.25
    0.24
    0.25
    0.24

    bm9.c-gcc
    0.92
    0.93
    0.92
    0.92
    0.92
    bm9.c-clang
    0.87
    0.87
    0.86
    0.86
    0.88

    It seems many of gcc’s issues are still open.

    1. Great! Yes, that’s my name, although on this blog I usually go by “davmac”.

      It’s interesting that for bm10, clang seems to partially unroll the loop (by 16 iterations) – which makes it much faster, but it still doesn’t perform the ultimate optimisation of removing the loop altogether.

      For bm7 gcc is as good as clang now but as far as I can tell neither compiler optimises fully by storing the result of the call to foo() directly “in place” for where it can be passed to bar() – both actually copy the result (gcc emits a call to memcpy, clang uses “rep movsq”).

      The benchmarks as a whole are a little unfair to GCC, because some of them come from GCC bugs. It is much more difficult to find good Clang bugs to make tests out of because their bug database is not as well organised.

      1. Indeed, the results for bm2 are wrong.
        That’s because I wrongly assumed bm2 and bm3 should use the same structs.

        In bm1 I don’t understand why clang 3.8 does anything more than mov eax, 100; retq
        https://godbolt.org/g/4bPsh5
        But that seems fixed by more recent clang (4.0+)

        In bm10 both compilers can improve, as you said, by realising that after a certain number of shifts the value is directly compile-time computable.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.