Wrap on integer overflow is not a good idea

A discussion of undefined behaviour and compiler optimisation, particularly in regards to signed integer overflow.

C (and C++) compilers are becoming notorious for exploiting the notion of undefined behaviour – the idea that certain things a program might do have no behaviour proscribed by the language standard, and that the compiler can assume the program doesn’t do these things when it is generating object code. Quite a few people have been objecting to this, since it can result in the generated code not doing what the programmer intended; the problem is becoming more noticeable over time, as compilers introduce more sophisticated optimisation techniques which are more likely to exploit the notion.

One prominent example is that of signed integer overflow. Most C programmers are developing for machines which use a 2’s complement representation of integers; addition and subtraction, with such a representation, is implemented in exactly the same way as for unsigned arithmetic. If the addition of two positive signed integers overflows – that is, if the result is larger than can be represented – the processor will produce a number that, when interpreted as a 2’s complement signed integer, will appear to be negative. This is called “wrapping” because the value has “wrapped around” from the high end of the numeric range to the low end.

For this reason, you occasionally see C code that looks something like this:

int b = a + 1000;
if (b < a) { // overflow
    puts("input too large!"); return;
}

The “if” statement is designed to detect the overflow condition (in this case from adding 1000 to the value from the variable ‘a’) and report an error. The problem is that, in C, signed integer overflow is one case of undefined behaviour. Compilers, for some time now, have performed an analysis which shows that the condition can never be true: if I add 1000 (or any positive number) to another value, the result cannot be smaller than the original value; if overflow occurred, that is undefined behaviour, and it is the programmer’s responsibility (arguably) to ensure that their program never exhibits such behaviour. Therefore, the compiler may decide that the entire if statement can be removed as an optimisation (it can never be true, it can never have an effect, it may as well not be there).

The problem with this compiler optimisation, in this case, is that it has removed the test that the programmer specifically used in an attempt to detect the overflow situation and handle it. An example of this with a real compiler can be seen here. (Side note: the godbolt.org site on which that example is hosted is great! you can edit the code and see the compiled form with a wide range of compilers. Play with it!). Observe that the overflow check is not removed if the type is changed to an unsigned integer, since unsigned overflow has defined behaviour in C (or rather, more accurately, unsigned arithmetic is defined to wrap and thus the overflow does not actually occur).

So is this wrong? Some have argued that it is, though it’s clear that many compiler vendors feel that it’s legitimate. The main arguments made by proponents of (edit: implementation-defined)wrapping overflow behaviour, if I understand them correctly, boil down to variants of the following:

  • Wrapping on overflow is a useful behaviour.
  • Wrapping is the behaviour expected by programmers.
  • Exploiting undefined behaviour semantics on overflow gives no significant benefit.
  • The C language standard, in regards to undefined behaviour, gives license for implementations “ignoring the situation completely, with unpredictable results”, but this doesn’t allow optimisations to assume that the situation for which the undefined behaviour is proscribed will not come about.

Let’s look at these one by one:

Wrapping on overflow is a useful behaviour?

The main utility for a wrapping behaviour is to be able to detect overflow after it occurs. (If there are other uses, that could not be handled using unsigned integers instead, I am not immediately unable to think of any and suspect they are rare). While this would indeed simplify the problem of avoiding the use of erroneously overflowed results, it certainly doesn’t help in all cases (consider multiplication, or addition of two unknown quantities with unknown sign).

For the trivial case where wrapping behaviour does allow simply detecting overflow after it occurs, it is also straightforward to determine whether overflow would occur, before it actually does so. The example above can be rewritten as follows:

if (a > INT_MAX - 1000) { // would overflow
    puts("input too large!");
    return;
}
int b = a + 1000;

That is, you can perform a check to see whether the result of an addition will exceed the maximum representable value, rather than performing the addition and then trying to determine whether that overflow occurred by checking if the result is mathematically inconsistent. (If the sign of both operands is unknown, the check becomes significantly more complicated, but this is also true when checking for overflow after the operation with wrapping overflow semantics).

With this in mind, I’m not really convinced that wrapping overflow is generally useful.

Wrapping is the behaviour expected by programmers?

It’s more difficult to argue against this point, since clearly at least some C programmers have written code which expects wrapping semantics for signed integer overflow. However, I don’t think that this alone is a strong argument for implementing wrapping semantics by default (note that several compilers implement options for wrapping overflow, if it really is desired).

An obvious mitigation for the problem of programmers expecting this particular behaviour is for the compiler to issue a warning when it optimises based on the alternative undefined-behaviour-is-assumed-not-to-occur semantics. Unfortunately as we see in the godbolt.org link above, compilers don’t always do so (Gcc 7.3 does but 8.1 does not, so this appears to be a regression).

Exploiting undefined behaviour semantics on overflow gives no significant benefit?

If true in all cases this would be a compelling argument for having compilers default to wrap-on-overflow, since it is probably better to allow the “detect overflow after it occurs” mechanism described above to work even if it is technically incorrect – if only because that mechanism may be in use in code which is arguably broken.

I suspect that with typical C programs the benefit of this particular optimisation (removing checks for mathematically impossible conditions) is usually negligible, because C programs tend to be written by programmers who are seeking good performance and who tend to hand-optimise their code anyway: that is, if it’s obvious that particular “if” statement has a condition that can never be true, the programmer would likely have removed the statement themselves. Indeed, a search reveals a few studies where the effectiveness of this optimisation has been questioned, tested, and found to be mostly insignificant for the particular benchmarks under test. However, while in many cases there is no benefit for C, the code generation engines and optimisers in compilers are commonly general and could be used for other languages where the same might not be so generally true; consider C++, where it is somewhat idiomatic in templated code to rely on the optimiser from removing redundant code, rather than doing it manually. There is also the case of languages being transpiled to C and relying on the C compiler to optimise away redundant code.

Also, even without overflow check elimination, it is not necessarily correct to assume that wrapping integers has minimal direct cost even on machines which use 2’s complement representation. The Mips architecture, for example, can perform arithmetic operations only in registers, which are fixed size (32 bit). A “short int” is generally 16 bits and a “char” is 8 bits; if assigned to a register, the underlying width of a variable with one of these types will expand, and forcing it to wrap according to the limit of the declared type would require at least one additional operation and possibly the use of an additional register (to contain an appropriate bitmask). I have to admit that it’s been a while since I’ve had exposure to any Mips code and so I’m a little fuzzy on the precise cost involved, but I’m certain it is non-zero and other RISC architectures may well have similar issues.

The language standard does not allow for signed integer overflow not to wrap, if that’s what the underlying architecture does?

This argument is particularly weak when examined. It essentially states that there is a requirement that “undefined behaviour” actually grants only limited license to the implementation (compiler), by the text of the standard. What the text that proponents latch on to says precisely is the following, as part of the definition of undefined behaviour:

NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, …

The claim is that “ignoring the situation completely” would not allow for assuming that the situation leading to the undefined behaviour – overflowing addition, for example – could not happen, but rather that, if it does happen, the implementation must carry on as if it did not happen but must respect the result it would obtain from asking the processor to perform such an operation (putting it another way: as if the translation from source to machine code was direct and naive).

First, we should observe that this text is in a NOTE and therefore non-normative (may not proscribe behaviour), according to the ISO directive mentioned in the foreword of the same document:

In accordance with Part 3 of the ISO/IEC Directives, this foreword, the introduction, notes, footnotes, and examples are also for information only.

Given that the “possible undefined behaviour” appears in such a note, it is not proscriptive. Note that the actual definition text for “undefined behavior” reads:

behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements.

I have added emphasis on the important part: there are no requirements for undefined behaviour; the list of “possible undefined behaviors” in the note contains merely examples and cannot be definitive. “Imposes no requirements” is unambiguous.

Some extend the argument to say that, regardless of what the text actually says, the intention of the language committee when those words were drafted was that the behaviour should in general match that of the underlying hardware, as closely as possible, assuming a naive translation to machine code. This may be true, though I’ve not seen any evidence (such as historical text) that supports it. Even if it were true, however, it would not necessarily apply to the current incarnation of the text.

Final thoughts

The arguments for wrapping on overflow are mostly flawed. Probably the strongest argument that can be made is a combination: it is occasionally expected by less experienced programmers (who do not understand the nuances of C and of its undefined behaviour), and is not particularly harmful to performance – however, the latter is not true in all cases, and the former is a somewhat dubious reasoning when considered by itself.

Personally, I feel that I would much rather have trap on overflow than wrap. That is, I would rather that a program crash instead of continuing with either undefined behaviour or a potentially incorrect value, either of which could be a security issue. This would certainly have a slight performance impact on most(?) architectures, particularly x86, but on the other hand it would immediately flag overflow bugs rather than allowing them to be exploited or to produce incorrect results further down the line. It could also in theory allow the compiler to safely remove redundant comparisons following a potential overflow, because it ensures that they really can’t happen, though I note that Clang and Gcc both apparently fail to take advantage of this.

Fortunately, both trapping and wrapping options are implemented by the compiler I use most often, which is Gcc. The “-ftrapv” and “-fwrapv” command line arguments can be used to enable each respectively.

There are of course a number of other causes of undefined behaviour; integer overflow is only one. I don’t necessarily think that all of these are useful and I do think that there are plenty of specific cases where the semantics should be defined by the language, or at least classified as implementation-defined. And I’m wary of compiler vendors being too liberal in their interpretation: if the compiler behaves in ways that are counter-intuitive, especially for someone who has read the language specification themselves, there is always the risk of real software bugs resulting; if the opportunities for optimisation that such an interpretation opens up are negligible, it is hardly worthwhile to adopt it. An examination of some issues around this area may be the topic of a future post.

Addendum (24 Aug 2018)

I’ve realised that much of the above could be better written. To briefly summarise, clarify, and add some minor points:

  • I was not trying to argue that undefined behaviour on overflow is preferable to wrapping, but rather that wrapping is not much better than undefined behaviour on overflow in practice. In particular, you can get security issues from wrapping behavior in much the same way as you can with undefined behaviour – and I’d argue that many security issues resulting from unchecked integer overflow, other than those which come from the compiler removing erroneous post-overflow checks, actually come from the fact that the value has wrapped around rather than any other undefined behaviour associated with the overflow.
  • The only real benefit of wrap-on-overflow is that it doesn’t cause post-overflow checks to be removed. While that might eliminate some attack vectors, it leaves open the possibility that some overflows won’t be checked for at all (i.e. the programmer did not include an overflow check) and will be uncaught.
  • If security is not a concern but speed of execution is, undefined behaviour on overflow may allow better optimisation possibilities and provide a performance benefit, at least in some cases. On the other hand if security is a concern, wrap-on-overflow potentially leaves holes open.
  • This means that between trap-on-overflow, wrap-on-overflow, and undefined-behaviour-on-overflow, I see very few cases where wrap-on-overflow should be the preferred choice.
  • In regards to post-overflow checks, I have concerns that leaving them in place could lead to the wrong perception (that post-overflow checks work and are guaranteed to work). Trap-on-overflow avoids this problem. Good warnings help to alleviate it.
  • I think that any programmer writing security-sensitive code will ideally have a good grasp of the semantics of, and the potential pitfalls of, the language they are writing in. For C, this means understanding the overflow semantics and nuances of undefined behaviour. It’s unfortunate that some C programmers still don’t seem to have that level of understanding.
  • I have seen a claim that “most C programmers expect wrapping behaviour”, but I’m not aware of any evidence that this is true. (I’ve said “some” above since I’ve seen anecdotal evidence and I doubt this would be disputed anyway).
  • There are two separate issues: one is what the C language standard should require, and another is what the compilers should implement. I’m (somewhat) ok with the language standard specifying undefined behaviour for overflow. This post is arguing for what the behaviour of compilers should be.
  • Trap-on-overflow need not require that every operation is checked for overflow; ideally it would only mandate that the program either behaves in a mathematically consistent manner or aborts, which allows for “temporary overflow” that doesn’t generate an incorrect result. This would allow optimising “a + b – b” to “a” (which wrapping also does) and “(a * b) / b” to “a” (which wrapping doesn’t).

Addendum (21 Nov 2018)

It turns out that the GCC implementation (circa 8.2) of -ftrapv is terrible – it doesn’t detect all overflow cases, and it causes a large performance degradation due to farming out most arithmetic operations to support functions (i.e. it imposes a function call overhead for a simple addition). The Clang/LLVM implementation is much better.

 

Advertisements

Clever hacks are just not.

Just recently I wrote about the clever “dual ABI” hack found in GCC 5’s implementation of the standard C++ library. I quipped at the end of that post:

I suspect the right way to handle changing ABI is to do it the way it’s always been done – by bumping the soname.

Having recently done some work on a C++ project, I’ve discovered an issue present in the dual ABI implementation. What seemed like trivial code to handle an exception from attempting to open a non-existent file just wasn’t working. I worked it down to the following test case:

#include <fstream>
#include <iostream>
#include <typeinfo>

int main(int argc, char **argv)
{
    using namespace std;
    using std::ios;
    
    ifstream a_file;
    a_file.exceptions(ios::badbit | ios::failbit);
    
    try {
        a_file.open("a-non-existent-file", ios::in);
    }
    catch (ios_base::failure &exc) {
        cout << "Caught exception on attempt to open non-existing file." << endl;
    }
    catch (exception &exc) {
        cout << "Caught standard exception: " << typeid(exc).name() << endl;
    }

    return 0;
}

Surprisingly, this fails to work properly (prints the wrong message) when compiled with a straight “g++ -o simple simple.cc”. It works correctly when compiled instead with “g++ -D_GLIBCXX_USE_CXX11_ABI=0 -o simple simple.cc”. I would have filed a bug, but it seems the problem is known.

This is a pretty major bug. Any C++ program that handles I/O errors by catching the appropriate exceptions just isn’t going to work. Furthermore, this isn’t just a minor problem; having to separate out implementations of iosbase and derived classes for the two ABIs would significantly reduce the benefit of even having the dual ABI. The problem with “clever” hacks like the dual ABI libstdc++ is that they are (1) often not so clever and are (2) always most definitely hacks. The proposed solution? Heap more hacks on top so the first hack works:

One option to make both forms work would be to hack the EH runtime to make handlers for std::ios_base::failure able to catch std::ios_base::failure[abi:cxx11] objects, and vice versa, creating an object of the other type on the fly.

Urgh. Just urgh.

What really annoys me most about this nonsense is that the one sane option I have – stripping out the ‘abi_tag’ nonsense from the source, re-compiling libstdc++ and calling it libstdc++.so.7 – would probably cause me problems further down the track, because third-party binaries are going to want the new ABI with the old soname (and even if I didn’t care about those, I’d still be one version number up from the official libstdc++ from that point on, which feels like it could cause problems).

Tale of Two ABIs

With GCC 5.2 just having been released, I figured it was time to upgrade my system to the latest-and-greatest in GNU compiler technology. So far the new compiler seems fine, but there’s a subtle issue that has been introduced to do with the “dual ABI” in the standard C++ library implementation, libstdc++. A Red Hat developer (“rhjason”) writes about it here. If I try to boil it down to the essence:

  • The C++11 standard has some complexity requirements which require re-engineered implementations of certain standard classes, among them std::string and std::list. This thereby requires an ABI change (perhaps the sizes of the structures have changed, or inline functions exist which now need to be implemented differently; given that std::list is a template, there’s really no getting around this).

    “Complexity requirements” refers to the algorithmic complexity (think ‘big-O’ notation). In particular std::list size() must now be O(1) – that is, it must take a constant amount of time regardless of the number of elements in the list (GCC bug). Previously the size() method worked by counting the elements in the list, which is O(n); there’s an explanation (in the form of an argument against) here. (I’m not sure if I agree with this argument).

  • Changing the ABI normally means changing the soname, the name of the dynamic library that you link against. In this case it would have been a bump from libstdc++.so.6 to libstdc++.so.7.

    This does, admittedly, come with problems. It’s possible to have some program (A) link against a library (B) and also against the C++ library (libstdc++.so.7). However, it may be the case that the library B was linked against the older version of the C++ library (libstdc++.so.6). When you execute A, then, it will dynamically link against two versions of the library. Because these libraries have (at least some) common symbol names, one will override the other. That means, for instance, that the library B will call some function but have the libstdc++.so.7 execute, which has the wrong ABI. There’s also the possibility of passing standard library objects (such as strings) between A and B, for which the ABI differs. Either case might lead to data structure corruption, incorrect behavior and possibly crashing.

    (Note that this problem is not limited to the C++ library. Any library which changes its ABI and bumps its soname potentially causes similar issues).

  • To combat these problems, the GCC/libstdc++ folk decided to put the old and new ABI into a single dynamic library (libstdc++.so.6).

    In many cases, nested inline namespaces (explanation) are used to avoid symbol clashes between the old and new ABI; however this is not possible in every case (because you can’t have a namespace inside a class, for instance). So, the compiler now understands ‘__attribute (( abi_tag(“cxx11”) ))’, which can be applied to an inline namespace or a declaration.

    (I do not understand why the tag should ever actually need to be applied to a namespace, and indeed it’s not really done in libstdc++. It seems that you do not need to supply the abi argument if you do – so you can have just ‘__attribute ((abi_tag))’ – but neither is it at all clear to me what the purpose of this would be. With a few experiments I see that it has some effect on warnings when -Wabi-tag is used, and might affect the abi tag that functions/variables inside the namespace “inherit” if they use an abi-tagged type).

  • You can select the ABI to compile against using the _GLIBCXX_USE_CXX11_ABI preprocessor macro (set it to 1 for the new ABI, or 0 for the old ABI). If the macro is not set it will be set to 1 when you include a C++ header.

So, this allows two ABIs to exist in a single dynamic library. As an added bonus you may get link-time errors if you try to link to a library using the other ABI. However, there is at least one significant issue with this strategy; Allan McRae (an Arch Linux guy) writes about it here:

This discovered an issue when building software using the new C++ ABI with clang, which builds against the new ABI (as instructed in the GCC header file), but does not know about the abi_tag attribute. This results in problems such as (for example) any function in a library with a std::string return type will be mangled with a [abi:cxx11] ABI tag. Clang does not handle these ABI tags, so will not add the tag to the mangled name and then linking will fail.

In other words, the GCC people have basically decided that other compilers don’t exist. This is, I have to say, pretty shitty. The LLVM guys are now forced to choose between supporting this non-standard abi_tag attribute or supporting only the non-C++11 ABI when using GNU libstdc++. Although I hope they go with the former (because it’ll make life easier for me personally) one could understand if they decided not to. I suspect the right way to handle changing ABI is to do it the way it’s always been done – by bumping the soname.

Compiler Bugs Worst Bugs

I think that compiler bugs – the kind where they produce the wrong code, i.e. an incorrect compilation – are perhaps the worst kind of bug, because they can be very difficult to identify and they can cause subtle issues (including security issues) in other code that should work correctly. That’s why this GCC bug bothers me a lot. Not only is it a “wrong code” bug, but it is easy to reproduce without using any language extensions or unusual compiler options.

Just -O2 is needed when compiling the following program:

#include <assert.h>

unsigned int global;

unsigned int two()
{
    return 2 * global;
}

unsigned int six()
{
    return 3 * two();
}

unsigned int f()
{
    return two() * 2 + six() * 5;
}

void g(const unsigned int from_f)
{
    const unsigned int thirty_four = two() * 2 + six() * 5;
    assert(from_f == thirty_four);
}

int main()
{
    global = 1;
    const unsigned int f_result = f();
    g(f_result);
}

It’s easy to reproduce, and it’s obviously wrong. Somehow the compiler is managing to mess up the calculation of (2 * 2 + 6 * 5). And yet, it’s classified as Priority 2. And furthermore, GCC 4.9.3 was just recently released, with this bug still present. It makes me start to wonder about the quality of GCC. I’m waiting for this to be fixed before I move to the 4.9 series (5.x series is way too new for my liking, though I might skip over 4.9 if 5.2 is released in the near future).

I’d like to run my compiler benchmarking tests on LLVM 3.6.1 and GCC, but I’ll hold off for a bit. I have done some quick testing with LLVM though and I have to say it is exceeding expectations.

Updated C compiler benchmarks, Feb 2012

With several new compiler releases it’s time to update my compiler benchmark results (last round of results here, description of the benchmark programs here). Note that the tests were on a different machine this time and the number of iterations was tweaked so numerical results aren’t comparable.

Without further ado:

So, what’s interesting in this set of results? Generally, note that LLVM and GCC now substantially compete for dominance. Notably, GCC canes LLVM in bm4, where it appears that GCC generates very good “memmove” code, whereas LLVM generates much more concise but apparently also much slower code; also in bm8 (trivial loop removal – though neither compiler actually performs this optimization, GCC apparently generates faster code). On the other hand LLVM beats GCC quite handily in bm6 (a common subexpression elimination problem).

GCC 4.6.2 improves quite a bit over 4.5.2 in bm5 (essentially a common subexpression refactoring test). However, it’s slightly worse in bm3 and for some reason there is a huge drop in performance for the bm7 test (stack placement of returned structure).

The Firm suite, surprisingly, mostly loses ground with 1.20.0 and remains uncompetitive.

Edit 25/03/12: I’ve noticed a flaw in bm6, which when corrected causes GCC to perform much worse – about 0.5 seconds rather than the 0.288 reported above.

Updated C compiler benchmarks: llvm-2.7, libfirm-1.18.1

Since I did my previous C compiler benchmark, there have been a few new software releases, including Gcc 4.5, llvm 2.7 (and 2.6) and libfirm 1.18 (and 1.17). Here are results from benchmarking these compilers:


Most important things: Gcc version is now 4.4.4, though this hasn’t made much difference to the results. llvm-2.7 is benchmarked both with the “native” clang frontend, and the gcc-4.2 based frontend; note the difference in bm1, 3 and 6.

In the case of LLVM, results have improved significantly in bm2 and bm5.

For LibFirm, results have improved dramatically for bm5, but are also significantly worse for bm3.

Gcc is still (subjectively) the overall winner, though it does get beaten in a few of the benchmarks (bm2,6,7) by LLVM and even in a couple (bm5,7) by LibFirm.

Gcc 4.5 has been released but has yet to be benchmarked. Edit 13/7/2010: Gcc 4.5 benchmarked, and the results are so similar to those for Gcc 4.4.4 that it’s not worth reporting them separately.

There are three new tests:

bm8 tests the compiler’s ability to remove trivial loops (loops with empty bodies, but a large number of iterations). The loops are implemented with “goto” rather than a C loop structure, and there is an inner and outer loop. No tested compiler manages to  remove the loops.

bm9 tests the compiler’s ability to break up a loop into two separate loops, when doing so avoids an efficiency problem caused by a potential aliasing pointer. The function in the benchmark takes two arguments: an int array and an int pointer “p”. The inner part of the loop uses the value pointed at by “p” as part of an expression assigned to each array element in turn. Because “p” might alias one of the array elements, the stored value must be re-loaded each iteration, unless the compiler breaks the loop into two. None of the tested compilers perform this optimization.

bm10 tests that a shift-left of a value inside a large loop can be reduced to a constant (due to having shifted all the bits out). None of the tested compilers manage this.

It’s disappointing that some optimizations which potentially yield huge benefit are not implemented in any of the tested free compilers; however, GCC 4.5 has been released and is yet to be tested (edit 13/7/2010: tested, no surprises). Also, GCC includes some options for loop optimization which are not turned on at any optimization level, due to them requiring external libraries at build time; it would be interesting to see whether these would make any difference in any of the tests (13/7/2010. They don’t).

Further update 4/8/2010: I’ve just tested with gcc 4.5.1. No very noticable differences were seen from the performance of 4.4.4, however, I am running a new kernel and this seems to slightly affect the numbers for 4.4.4; for instance, bm3 comes down to about the 0.162 mark, more-or-less even with LLVM.