Wednesday, February 23, 2011

What's the deal with deleting data from flash drives?

Before flying back to the United States, you wipe your SSD flash. You run “dd if=/dev/zero of=foo; rm foo” twice in order to fill the file system. You then run your level hacker tools to confirm that the drive does indeed only contain zeroes (such as "photrec", which restores deleted photos).

Yet, when passing through customs, the border guards seize your laptop and find the proof of your crimes committed as a member of Anonymous and Wikileaks.

What went wrong?


What happened was that the FBI removed the flash chips from the drive, and soldiered them to a different circuit board, then read the data from the chips. The SSD controller chip didn't tell you the truth when it said it erased all the flash.

Your first problem was using /dev/zero. Many SSD controllers compress blocks of data. Some even do “deduplication”. By writing all zeroes, you actually only overwrote about 10% of the drive. The remaining 90% of your original data was still on the flash chips, despite the fact the controller claimed the entire disk was zeroed out. You should’ve used /dev/random instead to overwrite more of the drive.

But even that is not enough. Researchers showed that even with 7 times overwrite of the entire drive with random data, they still were able to retrieve 1% of the original data.

That is because SSDs have about 10% extra, or “spare”, space. This is used both to replace blocks that have gone bad over time, but also to give flexibility when choosing which free blocks to use, to avoid unnecessary erase/write cycles (“write amplification”).

Overwriting 100% of the disk means that you’ve missed 10% of the flash. But overwriting 110% doesn’t work either. The free space on flash isn’t managed as a first-in/first-out queue of blocks. Instead, when the SSD controller needs a new block, it chooses one randomly from the list of free blocks. Thus, no matter how many times you overwrite the visible disk space, there is still a chance (no matter how slight), that the one block containing incriminating evidence was not overwritten.

So how can you erase your flash and make sure all incriminating evidence is gone? Well, you can’t, at least the researchers found no method that could guarantee all data would be deleted.

Or, asked another way, how can you query the device in order to see if the remaining, un-erased, data is incriminating? You can't do that either. When you read "raw" sectors from the disk, it's just the logical sectors translated by the drive controller. There is no (documented) way to retrieve data directly from the flash.

There are other answers, of course, the best being disk encryption. You don't need to erase the data if it has been encrypted.

There is possibly another answer. Controllers probably contain undocumented features that allow the flash to be accessed directly and programmed or erase. Therefore, some enterprising hacker might find undocumented features of a controller chip (such as the Toshiba T6UG1XBG used in the MacBook Air) and create a tool that securely erases the drive.


Conclusion



SSDs are built from a controller chip and the flash memory. The controller chip hides the details of flash, and makes the device appear as a normal disk drive. It performs magic, like wear leveling, to make flash appear as a disk.

Because of this, you cannot access the flash directly. Even when you think you are reading/writing “raw” blocks on the drive, you aren’t. Even if you think you’ve overwritten all the data multiple times on the drive, you haven’t: some original fragments may remain.

The current published research shows no guaranteed 100% effect means to erase all incriminating evidence from the drive. Your only solution to protect yourself is encryption.

Footnotes


(1) Reliably Erasing Data From Flash-Based Solid State Drives by Wei, Grupp, Spada, and Swanson. This is the core research into the problem.

(2) Understanding and Choosing the Best SSD by Anand Lai Shimpi. Explains why controllers mess things up.

(3) ONFI, Open NAND Flash Interface. If you figure out how to bypass the controller, this is how you will do it.

5 comments:

Nick Sharratt said...

"Your only solution to protect yourself is encryption."

...or physical destruction of the device once used.

sudopeople said...

Why's everything always go to be so complicated? Fireplace.

John Mundinger said...

...or, you could use the flash drive only for legal purposes. Then, it wouldn't matter what the FBI found on it. ;))

It was great to see you!

John Mundinger said...

...or, you could use your flash drive only for legal applications. Then, it wouldn't matter what the fbi found on it.

btw, it was great to see you!

Anonymous said...

you could, but where's the fun in that? :D