Wednesday, February 01, 2023

C can be memory-safe

The idea of memory-safe languages is in the news lately. C/C++ is famous for being the world's system language (that runs most things) but also infamous for being unsafe. Many want to solve this by hard-forking the world's system code, either by changing C/C++ into something that's memory-safe, or rewriting everything in Rust.

Forking is a foolish idea. The core principle of computer-science is that we need to live with legacy, not abandon it.

And there's no need. Modern C compilers already have the ability to be memory-safe, we just need to make minor -- and compatible -- changes to turn it on. Instead of a hard-fork that abandons legacy system, this would be a soft-fork that enables memory-safety for new systems.

Consider the most recent memory-safety flaw in OpenSSL. They fixed it by first adding a memory-bounds, then putting every access to the memory behind a macro PUSHC() that checks the memory-bounds:

A better (but currently hypothetical) fix would be something like the following:

size_t maxsize CHK_SIZE(outptr) = out ? *outlen : 0;

This would link the memory-bounds maxsize with the memory outptr. The compiler can then be relied upon to do all the bounds checking to prevent buffer overflows, the rest of the code wouldn't need to be changed.

An even better (and hypothetical) fix would be to change the function declaration like the following:

int ossl_a2ulabel(const char *in, char *out, size_t *outlen CHK_INOUT_SIZE(out));

That's the intent anyway, that *outlen is the memory-bounds of out on input, and receives a shorter bounds on output.

This specific feature isn't in compilers. But gcc and clang already have other similar features. They've only been halfway implemented. This feature would be relatively easy to add. I'm currently studying the code to see how I can add it myself. I could just mostly copy what's done for the alloc_size attribute. But there's a considerable learning curve, I'd rather just persuade an existing developer of gcc or clang to add the new attributes for me.

Once you give the programmer the ability to fix memory-safety problems like the solution above, you can then enable warnings for unsafe code. The compiler knew the above code was unsafe, but since there's no practical way to fix it, it's pointless nagging the programmer about it. With this new features comes warnings about failing to use it.

In other words, it becomes compiler-guided refactoring. Forking code is hard, refactoring is easy.

As the above function shows, the OpenSSL code is already somewhat memory safe, just based upon the flawed principle of relying upon diligent programmers. We need the compiler to enforce it. With such features, the gap is relative small, mostly just changing function parameter lists and data structures to link a pointer with its memory-bounds. The refactoring effort would be small, rather than a major rewrite.

This would be a soft-fork. The memory-bounds would work only when compiled with new compilers. The macro would be ignored on older systems. 

This memory-safety is a problem. The idea of abandoning C/C++ isn't a solution. We already have the beginnings of a solution in modern gcc and clang compilers. We just need to extend that solution.

No comments: