Wednesday, February 14, 2024

C can be memory safe, part 2

This post from last year was posted to a forum, so I thought I'd write up some rebuttals to their comments.

The first comment is by David Chisnall, creator of CHERI C/C++, which proposes we can solve the problem with CPU instruction set extensions. It's a good idea, but after 14 years, CPUs haven't had their instruction-sets upgraded. Even mainstream RISC V processors haven't been created using those extensions.

Chisnall: "If your safety requires you to insert explicit checks, it’s not safe". This is true from one perspective, false from another. My proposal includes compilers spitting out warnings whenever bounds information doesn't exist.

C is full of problems in theory that doesn't exist in practice because the compiler spits out warnings telling programmers to fix the problem. Warnings can also note cases where programmers probably made mistakes. We can't achieve perfect guarantees, because programmers can still make mistakes, but we can certainly achieve "good enough".

Chisnall: ....tread safety..... I'm not sure I full understand the comment. I understand that CHERI can guarantee atomicity of bounds checking, which would require multiple (interruptible) instructions otherwise. The number of cases where this is a problem, and the C proposal would be no worse than other languages like Rust.

Chisnall: Temporal safety.... A lot of Rust "ownership" techniques can be applied to C with these annotations, namely, marking which variables OWN allocated memory and which simply BORROW it. I've reviewed a lot of famous use-after-free and double-free bugs, and most can be trivially fixed by annotation.

Chisnall: If you are writing a blog never having actually tried to make large (million line or more) C codebases memory safe, you probably underestimate the difficulty by at least one order of magnitude. I'm both a programmer who has written a million lines of code in my lifetime as well as a hacker with decades of experience looking for such bugs. The goal isn't to pursue the ideal of 100% safe language, but of getting rid of 99% of safety errors. 1% less safe makes the goal an order of magnitude easier to reach.

snej: This post seems to epitomize the common engineer trait of seeing any problem you haven’t personally worked on as trivial. Sure bro, you’ll add a few patches to Clang and GCC and with those new attributes our C code will be safe. It’ll only take a few weeks and then no one will need Rust anymore. But I've spent decades working on this. The comment epitomizes the common trait of not realizing how much thought and expertise is behind the post. I few patches to clang and GCC will make make C safer. The solution is far less safe than Rust. In fact, my proposal makes code more interoperable and translatable into Rust. Right now, translating C into Rust creates just a bunch of 'unsafe' code that needs to be cleaned up. With such annotations, in a refactoring step using existing testing frameworks, results in code that can no be auto-translated safely in to Rust.

As for existing clang/gcc attributes, there are only a couple that match the macros I propose. They dod show how trivial it would be to actually go further.

danso: In addition to the criticisms I share with everyone else, I found this to be one of the most “talk is cheap, show me the code” posts I’ve ever read. The reason I wrote the post is because learning clang/gcc internals is a long process, and when asking for help, I needed something to point to "this is what I'm trying to achieve". I'm not trying to communicate what other people should do, I'm communicating what I'm trying to do. I still don't know clang/gcc internals enough to even get started ... any pointers would be helpful.




No comments: