Saturday, August 29, 2015

No, this isn't good code

I saw this tweet go by. No, I don't think it's good code:




What this code is trying to solve is the "integer overflow" vulnerability. I don't think it solves the problem well.

The first problem is that the result is undefined. Some programmers will call safemulti_size_t() without checking the result. When they do, the code will behave differently depending on the previous value of *res. Instead, the code should return a defined value in this case, such as zero or SIZE_MAX. Knowing that this sort of thing will usually be used for memory allocations, which you want to have fail, then a good choice would be SIZE_MAX.

The worse problem is integer division. On today's Intel processors, integer multiplication takes a single clock cycle, but integer division takes between 40 and 100 clock cycles. Since you'll be usually dividing by small numbers, it's likely to be closer to 40 clock cycles rather than 100, but that's still really bad. If your solution to security problems is by imposing unacceptable tradeoffs, then you are doing security wrong. If you introduced this level of performance hit, then you might as well be programming in a safer language like JavaScript than in C.

An alternative would be the OpenBSD function reallocarray(), which I'm considering using in all my code as a replacement for malloc(), calloc(), and realloc(). It looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
/*
 * This is sqrt(SIZE_MAX+1), as s1*s2 <= SIZE_MAX
 * if both s1 < MUL_NO_OVERFLOW and s2 < MUL_NO_OVERFLOW
 */
#define MUL_NO_OVERFLOW (1UL << (sizeof(size_t) * 4))

void *
reallocarray(void *optr, size_t nmemb, size_t size)
{
    if ((nmemb >= MUL_NO_OVERFLOW || size >= MUL_NO_OVERFLOW) &&
        nmemb > 0 && SIZE_MAX / nmemb < size) {
            errno = ENOMEM;
            return NULL;
    }
    return realloc(optr, size * nmemb);
}

Firstly, it doesn't call the horrible integer division function (unless one of the parameters is larger than 2-gigs on a 32-bit processor). Secondly, it always has a defined result.

Personally, I would improve upon this function by simply calling a signal(). Virtually no code can recover from a bad memory allocation, so instead of returning NULL, it's better to crash right here.





5 comments:

Anonymous said...

MUL_NO_OVERFLOW is actually 65536 on 32-bit platforms, not 2G.

Anyway, the fast way of doing this is by performing the multiplication on double-precision integers.

#if SIZE_MAX == 0xffffffff
typedef uint64_t double_uint;
#else
typedef __uint128_t double_uint;
#endif

static bool
safemult_sizet(size_t a, size_t b, size_t *r) {
double_uint res = (double_uint)a*b;
*r = (size_t)res;
if(res > SIZE_MAX) return false;
return true;
}

Anonymous said...

Potential improvement, divide by lesser number. Have not benchmarked.
#define MUL_NO_OVERFLOW (1UL << (sizeof(size_t) * 4))
#define IS_IT_WRAP_SAFE(smaller,larger) ((larger) < MUL_NO_OVERFLOW || !(smaller) || (SIZE_MAX / (smaller) >= (larger)))
bool
isItWrapSafe(size_t a,size_t b)
{
if (a<=b)
return IS_IT_WRAP_SAFE(a, b);
else
return IS_IT_WRAP_SAFE(b, a);
}

Anonymous said...

Such a shame that hardware multiply instruction tend not to indicate whether an overflow occurred. Er... wait a second...

Lorenzo Grespan said...

Odd that nobody pointed out how, if a = 0 the result will be different than if b = 0. Which breaks the definition of "multiplication".
Secondly, as far as I remember C is not lazy so if b = 0, you have a nice division by zero because tmp/b will be evaluated. But I'm happy to be wrong here.

Bwanshoom said...

Lorenzo, you'll be happy then. I think you mean short-circuited evaluation and C definitely does it. If b is 0 the division will not occur - and that would be an obvious bug to have in this example.