Sunday, June 20, 2021

When we'll get a 128-bit CPU

On Hacker News, this article claiming "You won't live to see a 128-bit CPU" is trending". Sadly, it was non-technical, so didn't really contain anything useful. I thought I'd write up some technical notes.

The issue isn't the CPU, but memory. It's not about the size of computations, but when CPUs will need more than 64-bits to address all the memory future computers will have. It's a simple question of math and Moore's Law.

Thursday, April 29, 2021

Anatomy of how you get pwned

Today, somebody had a problem: they kept seeing a popup on their screen, and obvious scam trying to sell them McAfee anti-virus. Where was this coming from?

In this blogpost, I follow this rabbit hole on down. It starts with "search engine optimization" links and leads to an entire industry of tricks, scams, exploiting popups, trying to infect your machine with viruses, and stealing emails or credit card numbers.

Evidence of the attack first appeared with occasional popups like the following. The popup isn't part of any webpage.




This is obviously a trick. But from where? How did it "get on the machine"?

There's lots of possible answers. But the most obvious answer (to most people), that your machine is infected with a virus, is likely wrong. Viruses are generally silent, doing evil things in the background. When you see something like this, you aren't infected ... yet.

Instead, things popping with warnings is almost entirely due to evil websites. But that's confusing, since this popup doesn't appear within a web page. It's off to one side of the screen, nowhere near the web browser.

Moreover, we spent some time diagnosing this. We restarted the webbrowser in "troubleshooting mode" with all extensions disabled and went to a clean website like Twitter. The popup still kept happening.

As it turns out, he had another windows with Firefox running under a different profile. So while he cleaned out everything in this one profile, he wasn't aware the other one was still running

This happens a lot in investigations. We first rule out the obvious things, and then struggle to find the less obvious explanation -- when it was the obvious thing all along.

In this case, the reason the popup wasn't attached to a browser window is because it's a new type of popup notification that's suppose to act more like an app and less like a web page. It has a hidden web page underneath called a "service worker", so the popups keep happening when you think the webpage is closed.

Once we figured the mistake of the other Firefox profile, we quickly tracked this down and saw that indeed, it was in the Notification list with Permissions set to Allow. Simply changing this solved the problem.

Note that the above picture of the popup has a little wheel in the lower right. We are taught not to click on dangerous thing, so the user in this case was avoiding it. However, had the user clicked on it, it would've led him straight here to the solution. I can't recommend you click on such a thing and trust it, because that means in the future, malicious tricks will contain such safe looking icons that aren't so safe.

Anyway, the next question is: which website did this come from?

The answer is Google.

In the news today was the story of the Michigan guys who tried to kidnap the governor. The user googled "attempted kidnap sentencing guidelines". This search produced a page with the following top result:


Google labels this a "featured snippet". This isn't an advertisement, not a "promoted" result. But it's a link that Google's algorithms thinks is somehow more worthy than the rest.

This happened because hackers tricked Google's algorithms. It's been a constant cat and mouse game for 20 years, in an industry known as "search engine optimization" or SEO. People are always trying to trick google into placing their content highest, both legitimate companies and the quasi-illegitimate that we see here. In this case, they seem to have succeeded.

The way this trick works is that the hackers posted a PDF instead of a webpage containing the desired text. Since PDF documents are much less useful for SEO purposes, google apparently trusts them more.

But the hackers have found a way to make PDFs more useful. They designed it to appear like a webpage with the standard CAPTCHA. You click anywhere on the page such as saying "I'm not robot", and it takes you to the real webstie.



But where is the text I was promised in the Google's search result? It's there, behind the image. PDF files have layers. You can put images on top that hides the text underneath. Humans only see the top layer, but google's indexing spiders see all the layers, and will index the hidden text. You can verify this by downloading the PDF and using tools to examine the raw text:


If you click on the "I am not robot" in the fake PDF, it takes you to a page like the following:


Here's where the "hack" happened. The user misclicked on "Allow" instead of "Block" -- accidentally. Once they did that, popups started happening, even when this window appeared to go away.

The lesson here is that "misclicks happen". Even the most knowledgeable users, the smartest of cybersecurity experts, will eventually misclick themselves.

As described above, once we identified this problem, we were able to safely turn off the popups by going to Firefox's "Notification Permissions".

Note that the screenshots above are a mixture of Firefox images from the original user, and pictures of Chrome where I tried to replicate the attack in one of my browsers. I didn't succeed -- I still haven't been able to get any popups appearing on my computer.

So I tried a bunch of different browsers: Firefox, Chrome, and Brave on both Windows and macOS.

Each browser produced a different result, a sort of A/B testing based on the User-Agent (the string sent to webservers that identifies which browser you are using). Sometime following the hostile link from that PDF attempted to install a popup script in our original example, but sometimes it tried something else.

For example, on my Firefox, it tried to download a ZIP file containing a virus:


When I attempt to download, Firefox tells me it's a virus -- probably because Firefox knows the site where it came from is evil.

However, Microsoft's free anti-virus didn't catch it. One reason is that it comes as an encrypted zip file. In order to open the file, you have to first read the unencrypted text file to get the password -- something humans can do but anti-virus products aren't able to do (or at least, not well).


So I opened the password file to get the password ("257048169") and extracted the virus. This is mostly safe -- as long as I don't run it. Viruses are harmless sitting on your machine as long as they aren't running. I say "mostly" because even for experts, "misclicks happen", and if I'm not careful, I may infect my machine.

Anyway, I want to see what the virus actually is. The easiest way to do that is upload it to VirusTotal, a website that runs all the known anti-virus programs on a submission to see what triggers what. It tells me that somebody else uploaded the same sample 2 hours ago, and that a bunch of anti-virus vendors detect it, with the following names:


With VirusTotal, you can investigate why anti-virus products think it may be a virus. 

For example, anti-virus companies will run viruses to see what they do. They run them in "emulated" machines that are a lot slower, but safer. If viruses find themselves running in an emulated environment, then they stop doing all the bad behaviors the anti-virus programs might detection. So they repeated check the timestamp to see how fast they are running -- if too slow, they assume emulation.

But this itself is a bad behavior. This timestamp detection is one of the behaviors the anti-virus programs triggered on as suspicious.


You can go investigate on VirusTotal other things it found with this virus.

Viruses and disconnected popups wasn't the only trick. In yet another attempt with web browsers, the hostile site attempt to open lots and lots of windows full of advertising. This is a direct way they earn money -- hacking the advertising companies rather than hacking you.

In yet another attempt with another browser, this time from my MacBook air, it asked for an email address:

I happily obliged, giving it a fake address.

At this point, the hackers are going to try to use the same email and password to log into Gmail, into a few banks, and so on. It's one of the top hacks these days (if not the most important hack) -- since most people reuse the same password for everything, even though it's not asking your for your Gmail or bank password, most of the time people will simply reuse them anyway. (This is why you need to keep important passwords separate from unimportant ones -- and write down your passwords or use a password manager).

Anyway, I now get the next webpage. This is a straight up attempt to steal my credit card -- maybe. 
This is a website called "AppCine.net" that promises streaming movies, for free signup, but requires a credit card.

This may be a quasi-legitimate website. I saw "quasi" because their goal isn't outright credit card fraud, but a "dark pattern" whereby they make it easy to sign up for the first month free with a credit card, and then make it nearly impossible to stop the service, where they continue to bill you month after month. As long as the charges are small each month, most people won't bother going through all the effort canceling the service. And since it's not actually fraud, people won't call their credit card company and reverse the charges, since they actually did sign up for the service and haven't canceled it.

It's a slimy thing the Trump campaign did in the last election. Their website asked for one time donations but tricked people into unwittingly making it a regular donation. This caused a lot of "chargebacks" as people complained to their credit card company.

In truth, everyone does the same pattern: makes it easy to sign up, and sign up for more than you realize, and then makes it hard to cancel. I thought I'd canceled an AT&T phone but found out they'd kept billing me for 3 years, despite the phone no longer existing and using their network.

They probably have a rewards program. In other words, they aren't out there doing SEO hacking of google. Instead, they pay others to do it for them, and then give a percentage profit, either for incoming links, but probably "conversion", money whenever somebody actually enters their credit card number and signs up.

Those people are in tern a different middleman. It probably goes like this:
  • somebody skilled at SEO optimization, who sends links to a broker
  • a broker who then forwards those links to other middlemen
  • middlemen who then deliver those links to sites like AppCine.net that actually ask for an email address or credit card
There's probably even more layers -- like any fine tuned industry, there are lots of specialists who focus on doing their job well.

Okay, I'll play along, and I enter a credit card number to see what happens (I have bunch of used debit cards to play this game). This leads to an error message saying the website is down and they can't deliver videos for me, but then pops up another box asking for my email, from yet another movie website:

This leads to yet another site:
It's an endless series. Once a site "converts" you, it then simply sells the link back to another middleman, who then forwards you on to the next. I could probably sit there all day with fake email addresses and credit cards and still not come to the end of it all.

Summary

So here's what we found.

First, there was a "search engine optimization" hacker who specializes in getting their content at the top of search results for random terms.

Second, they pass hits off to a broker who distributes the hits to various hackers who pay them. These hackers will try to exploit you with:
  • popups pretending to be anti-virus warnings that show up outside the browser
  • actual virus downloads in encrypted zips that try to evade anti-virus, but not well
  • endless new windows selling you advertising
  • steal your email address and password, hoping that you've simply reused one from legitimate websites, like Gmail or your bank
  • signups for free movie websites that try to get your credit card and charge you legally
Even experts get confused. I had trouble helping this user track down exactly where the popup was coming from. Also, any expert can misclick and make the wrong thing happen -- this user had been clicking the right thing "Block" for years and accidentally hit "Allow" this one time.

Wednesday, April 21, 2021

Ethics: University of Minnesota's hostile patches

The University of Minnesota (UMN) got into trouble this week for doing a study where they have submitted deliberately vulnerable patches into open-source projects, in order to test whether hostile actors can do this to hack things. After a UMN researcher submitted a crappy patch to the Linux Kernel, kernel maintainers decided to rip out all recent UMN patches.

Both things can be true:

  • Their study was an important contribution to the field of cybersecurity.
  • Their study was unethical.
It's like Nazi medical research on victims in concentration camps, or U.S. military research on unwitting soldiers. The research can simultaneously be wildly unethical but at the same time produce useful knowledge.

I'd agree that their paper is useful. I would not be able to immediately recognize their patches as adding a vulnerability -- and I'm an expert at such things.

In addition, the sorts of bugs it exploits shows a way forward in the evolution of programming languages. It's not clear that a "safe" language like Rust would be the answer. Linux kernel programming requires tracking resources in ways that Rust would consider inherently "unsafe". Instead, the C language needs to evolve with better safety features and better static analysis. Specifically, we need to be able to annotate the parameters and return statements from functions. For example, if a pointer can't be NULL, then it needs to be documented as a non-nullable pointer. (Imagine if pointers could be signed and unsigned, meaning, can sometimes be NULL or never be NULL).

So I'm glad this paper exists. As a researcher, I'll likely cite it in the future. As a programmer, I'll be more vigilant in the future. In my own open-source projects, I should probably review some previous pull requests that I've accepted, since many of them have been the same crappy quality of simply adding a (probably) unnecessary NULL-pointer check.

The next question is whether this is ethical. Well, the paper claims to have sign-off from their university's IRB -- their Institutional Review Board that reviews the ethics of experiments. Universities created IRBs to deal with the fact that many medical experiments were done on either unwilling or unwitting subjects, such as the Tuskegee Syphilis Study. All medical research must have IRB sign-off these days.

However, I think IRB sign-off for computer security research is stupid. Things like masscanning of the entire Internet are undecidable with traditional ethics. I regularly scan every device on the IPv4 Internet, including your own home router. If you paid attention to the packets your firewall drops, some of them would be from me. Some consider this a gross violation of basic ethics and get very upset that I'm scanning their computer. Others consider this to be the expected consequence of the end-to-end nature of the public Internet, that there's an inherent social contract that you must be prepared to receive any packet from anywhere. Kerckhoff's Principle from the 1800s suggests that core ethic of cybersecurity is exposure to such things rather than trying to cover them up.

The point isn't to argue whether masscanning is ethical. The point is to argue that it's undecided, and that your IRB isn't going to be able to answer the question better than anybody else.

But here's the thing about masscanning: I'm honest and transparent about it. My very first scan of the entire Internet came with a tweet "BTW, this is me scanning the entire Internet".

A lot of ethical questions in other fields comes down to honesty. If you have to lie about it or cover it up, then there's a good chance it's unethical.

For example, the west suffers a lot of cyberattacks from Russia and China. Therefore, as a lone wolf actor capable of hacking them back, is it ethical to do so? The easy answer is that when discovered, would you say "yes, I did that, and I'm proud of it", or would you lie about it? I admit this is a difficult question, because it's posed in terms of whether you'd want to evade the disapproval from other people, when the reality is that you might not want to get novichoked by Putin.

The above research is based on a lie. Lying has consequences.

The natural consequence here is that now that UMN did that study, none of the patches they submit can be trusted. It's not just this one submitted patch. The kernel maintainers are taking scorched earth response, reverting all recent patches from the university and banning future patches from them. It may be a little hysterical, but at the same time, this is a new situation that no existing policy covers.

I partly disagree with the kernel maintainer's conclusion that the patches "obviously were _NOT_ created by a static analysis tool". This is exactly the sort of noise static analyzers have produced in the past. I reviewed the source file for how a static analyzer might come to this conclusion, and found it's exactly the sort of thing it might produce.

But at the same time, it's obviously noise and bad output. If the researcher were developing a static analyzer tool, they should understand that this is crap noise and bad output from the static analyzer. They should not be submitting low-quality patches like this one. The main concern that researchers need to focus on for static analysis isn't increasing detection of vulns, but decreasing noise.

In other words, the debate here is whether the researcher is incompetent or dishonest. Given that UMN has practiced dishonesty in the past, it's legitimate to believe they are doing so again. Indeed, "static analysis" research might also include research in automated ways to find subversive bugs. One might create a static analyzer to search code for ways to insert a NULL pointer check to add a vuln.

Now incompetence is actually a fine thing. That's the point of research, is to learn things. Starting fresh without all the preconceptions of old work is also useful. That researcher has problems today, but a year or two from now they'll be an ultra-competent expert in their field. That's how one achieves competence -- making mistakes, lots of them.

But either way, the Linux kernel maintainer response of "we are not part of your research project" is a valid. These patches are crap, regardless of which research project they are pursuing (static analyzer or malicious patch submissions).


Conclusion

I think the UMN research into bad-faith patches is useful to the community. I reject the idea that their IRB, which is focused on biomedical ethics rather than cybersecurity ethics, would be useful here. Indeed, it's done the reverse: IRB approval has tainted the entire university with the problem rather than limiting the fallout to just the researchers that could've been disavowed.

The natural consequence of being dishonest is that people can't trust you. In cybersecurity, trust is hard to win and easy to lose -- and UMN lost it. The researchers should have understand that "dishonesty" was going to be a problem.

I'm not sure there is a way to ethically be dishonest, so I'm not sure how such useful research can be done without the researchers or sponsors being tainted by it. I just know that "dishonesty" is an easily recognizable issue in cybersecurity that needs to be avoided. If anybody knows how to be ethically dishonest, I'd like to hear it.

Update: This person proposes a way this research could be conducted to ethically be dishonest:

Friday, March 26, 2021

A quick FAQ about NFTs

I thought I'd write up 4 technical questions about NFTs. They may not be the ones you ask, but they are the ones you should be asking. The questions:

  • What does the token look like?
  • How does it contain the artwork? (or, where is the artwork contained?)
  • How are tokens traded? (How do they get paid? How do they get from one account to another?)
  • What does the link from token to artwork mean? Does it give copyrights?
I'm going to use 4 sample tokens that have been sold for outrageous prices as examples.

#1 What does the token look like?

An NFT token has a unique number, analogous to:

  • your social security number (SSN#)
  • your credit card number
  • the VIN# on your car
  • the serial number on a dollar bill
  • etc.

This unique number is composed of two things:

  • the contract number, identifying the contract that manages the token
  • the unique token identifier within that contract
Here are some example tokens, listing the contract number (the long string) and token ID (short number), as well as a link to a story on how much it sold for recently.

With these two numbers, you can go find the token on the blockchain, and read the code to determine what the token contains, how it's traded, its current owner, and so on.


#2 How do NFTs contain artwork? or, where is artwork contained?

Tokens can't*** contain artwork -- art is too big to fit on the blockchain. That Beeple piece is 300-megabytes in size. Therefore, tokens point to artwork that is located somewhere else than the blockchain.

*** (footnote) This isn't actually true. It's just that it's very expensive to put artwork on the blockchain. That Beeple artwork would cost about $5million to put onto the blockchain. Yes, this less than a tenth the purchase price of $69million, but when you account for all the artwork for which people have created NFTs, the total exceeds the prices for all NFTs.

So if artwork isn't on the blockchain, where is it located? and how do the NFTs link to it?

Our four examples of NFT mentioned above show four different answers to this question. Some are smart, others are stupid -- and by "stupid" I mean "tantamount to fraud".

The correct way to link a token with a piece of digital art is through a hash, which can be used with the decentralized darknet.

hash is a unique cryptographic "key" (sic) generated from the file contents. No two files with different contents (or different lengths) will generate the same hash. A hacker can't create a different file that generates the same hash. Therefore, the hash becomes the identity of the file -- if you have a hash and a file, you can independently verify the two match.

The hash (and therefore unique identity) of the Beeple file is the following string:

QmXkxpwAHCtDXbbZHUwqtFucG1RMS6T87vi1CdvadfL7qA

With the hash, it doesn't matter where the file is located right now in cyberspace. It only matters that at some point in the future, when the owner of the NFT wants to sell it, they can produce the file which provably matches the hash.

To repeat: because of the magic of cryptographic hashes, the artwork in question doesn't have to be located anywhere in particular.

However, people do like having a live copy of the file available in a well known location. One way of doing this is with the darknet, which is essentially a decentralized version of the web. In much the same way the blockchain provides decentralized transactions, darknet services provide decentralized file sharing. The most famous of such services is BitTorrent. The most popular for use with NFTs is known as IPFS (InterPlanetary File System). A hash contained within an NFT token often links to the IPFS system.

In the $69million Beeple NFT, this link is:

ipfs://ipfs/QmPAg1mjxcEQPPtqsLoEcauVedaeMH81WXDPvPx3VC5zUz

Sharp eyed readers will notice the hash of the artwork (above) doesn't match the hash in this IPFS link.

That's because the NFT token points to a metadata file that contains the real hash, along with other information about the artwork. The QmPAg.... hash points to metadata that contains the QmXkx... hash.

But a chain of hashes in this manner is still just as secure as a single hash -- indeed, that's what the "blockchain" is -- a hash chain. In the future, when the owner sells this NFT, they'll need to provide both files, the metadata and the artwork, to conclusively transfer ownership.

Thus, in answer to the question of where the artwork is located (in the NFT? on the web?), the answer is often that the NFT token contains a hash pointing to the darknet.

Let's look at another token on our list, the $180k AP artwork. The NFT links to the following URL:

https://ap-nft.everipedia.org/api/presidential-2020/1

Like the above example with Beeple, this too points to a metadata file, with a link to the eventual artwork (here). However, this chain is broken in the middle with that URL -- it isn't decentralized, and there's no guarantee in the future that it'll exist. The company "Everipedia" could go out of business tomorrow, or simply decide to stop sharing the file to the web, or decide to provide a different file at that location. In these cases, the thing the NFT points to disappears.

In other words, 50 years from now, after WW III and we've all moved to the off-world colonies, the owner of Beeple's NFT will still be able to sell it, providing the two additional files. The owner of this AP NFT probably won't -- the link will probably have disappeared from the web -- they won't be able to prove that the NFT they control points to the indicated artwork.

I would call this tantamount to fraud -- almost. The information is all there for the buyer to check, so they know the problems with this NFT. They obviously didn't care -- maybe they plan on being able to offload the NFT onto another buyer before the URL disappears.

Now let's look at the CryptoPunks #7804 NFT. The contract points to the same hash of an image file that contains all 10,000 possible token images. That hash is the following. Click on it to see the file it maps to:

ac39af4793119ee46bbff351d8cb6b5f23da60222126add4268e261199a2921b

The token ID in question is #7804. If you look in that file for the 7804th face, you'll see which one the token matches.

Unfortunately, the original contract doesn't actually explain how we arrive at the 7804th sub-image. Do we go left to right? Top down? or some other method? Currently, there exists a website that does the translation using one algorithm, but in the future, there's no hard proof which token maps to which face inside that massive image.

Now let's look at the CryptoKitty #896775 . In this case, there's no hashes involved, and no image. Instead, each kitty is expressed as a pattern of "genes", with contracts that specify how to two kittens can breed together to create a new kitty's genes. The above token contains the gene sequence:

235340506405654824796728975308592110924822688777991068596785613937685997

There are other contracts on the blockchain that can interact with this. 

The CryptoKitty images we see are generated by an algorithm that reads the gene sequence. Thus, there is no image file, no hash of a file. The algorithm that does this is located off-chain, so again we have the problem that in the future, the owner of the token may not be able to prove ownership of the correct image.

So what we see in these examples is one case where there's a robust hash chain linking the NFT with the corresponding image file, and three examples where the link is problematic -- ranging from slightly broken to almost fraudulent.


#3 How are tokens traded?

There are two ways you can sell your NFTs:

  • off the blockchain
  • on the blockchain

The Beeple artwork was sold through Christie's -- meaning off blockchain. Christies conducted the bidding and collected the payment, took its cut, and gave the rest to the artist. The artist then transferred the NFT. We can see this on the blockchain where Beeple transferred the NFT for $0, but we can't see the flow of money off blockchain.

This is the exception. The rule is that NFTs are supposed to be traded on blockchain.

NFT contracts don't have auction or selling capabilities themselves. Instead, they follow a standard (known as ERC721) that allows them to be managed by other contracts. A person controlling a token selects some other auction/selling contract that matches the terms they want, and gives control to that contract.

Because contracts are code, both sides are know what the terms are, and can be confident they won't be defrauded by the other side.

For example, a contract's terms might be to provide for bids over 5 days, transfer the NFT from the owner to the buyer, and transfer coins from the buyer to the previous owner.

This is really why NFTs are so popular: not ownership of artwork, but on blockchain buying and selling of tokens. It's the ability to conduct such commerce where the rules are dictated by code rather than by humans, where such transfers happen in a decentralized manner rather than through a central authority that can commit fraud.

So the upshot is that if you own an NFT, you can use the Transfer() function to transfer it to some other owner, or you can authorize some other contract to do the selling for you, which will eventually call this Transfer() function when the deal is done. Such a contract will likely also transfer coins in the other direction, paying you for your token.


#4 What does this all mean?

If you break into the Louvre Museum and steal the Mona Lisa, you will control the artwork. But you won't own it. The word "ownership" is defined to mean your legal rights over the object. If the legal authorities catch up with you, they'll stick you in jail and transfer control of the artwork back to the rightful legal owner.

We keep talking about "ownership" of NFTs, but this is fiction. Instead, all that you get when you acquire an NFT is "control" -- control of just the token even, and not of the underlying artwork. Much of what happens in blockchain/cryptocurrencies isn't covered by the law. Therefore, you can't really "own" tokens. But you certainly control them (with the private key in your wallet that matches the public key of your account/address on the blockchain).

This is why NFTs are problematic, people are paying attention to the fiction ("ownership") and not the technical details ("control"). We see that in the AP artwork above which simply links to a URL instead of a hash, missing a crucial step. They weren't paying attention to the details.

There are other missing steps. For example, I can create my own NFTs representing all these artworks and sell them (maybe covered in a future blogpost). It's a fiction that one of these is valid and my copy NFTs are invalid.

On the other hand, this criticism can go too far. Some people claim the entire blockchain/cryptocurrency market is complete fiction. This isn't true -- there's lots of obvious value in transactions that are carried out by code rather than by humans.

For example, an oil company might sell tokens for oil futures, allowing people to trade such futures on the blockchain. Ultimately, though, the value of such tokens comes down to faith in the original issuer that they'll deliver on the promise -- that the controller of the token will eventually get something in the real world. There are lots of companies being successful with this sort of thing, such as the BAT token used in the "Brave" web browser that provides websites with micropayment revenue instead of advertising revenue.

Thus, the difference here is that cryptocurrencies are part fiction, part real -- tied to real world things. But NFTs representing artwork are pretty much completely fiction. They confer no control over the artwork in the real world. Whatever tie a token has to the artwork is purely in your imagination.

Saturday, March 20, 2021

Deconstructing that $69million NFT

"NFTs" have hit the mainstream news with the sale of an NFT based digital artwork for $69 million. I thought I'd write up an explainer. Specifically, I deconstruct that huge purchase and show what actually was exchanged, down to the raw code. (The answer: almost nothing).

The reason for this post is that every other description of NFTs describe what they pretend to be. In this blogpost, I drill down on what they actually are.

Note that this example is about "NFT artwork", the thing that's been in the news. There are other uses of NFTs, which work very differently than what's shown here.

tl;dr

I have long bit of text explaining things. Here is the short form that allows you to drill down to the individual pieces.

  • Beeple created a piece of art in a file
  • He created a hash that uniquely, and unhackably, identified that file
  • He created a metadata file that included the hash to the artwork
  • He created a hash to the metadata file
  • He uploaded both files (metadata and artwork) to the IPFS darknet decentralized file sharing service
  • He created, or minted a token governed by the MakersTokenV2 smart contract on the Ethereum blockchain
  • Christies created an auction for this token
  • The auction was concluded with a payment of $69 million worth of Ether cryptocurrency. However, nobody has been able to find this payment on the Ethereum blockchain, the money was probably transferred through some private means.
  • Beeple transferred the token to the winner, who transferred it again to this final Metakovan account
Each of the link above allows you to drill down to exactly what's happening on the blockchain. The rest of this post discusses things in long form.

Why do I care?

Well, you don't. It makes you feel stupid that you haven't heard about it, when everyone is suddenly talking about it as if it's been a thing for a long time. But the reality, they didn't know what it was a month ago, either. Here is the Google Trends graph to prove this point -- interest has only exploded in the last couple months:

The same applies to me. I've been aware of them (since the CryptoKitties craze from a couple years ago) but haven't invested time reading source code until now. Much of this blogpost is written as notes as I discover for myself exactly what was purchased for $69 million, reading the actual transactions.


So what is it?

My definition: "Something new that can be traded on a blockchain that isn't a fungible cryptocurrency".

In this post, I'm going to explain in technical details. Before this, you might want to pause and see what everyone else is saying about it. You can look on Wikipedia to answer that question, or look at the following definition from CNN (the first result when I google it):
Non-fungible tokens, or NFTs, are pieces of digital content linked to the blockchain, the digital database underpinning cryptocurrencies such as bitcoin and ethereum. Unlike NFTs, those assets are fungible, meaning they can be replaced or exchanged with another identical one of the same value, much like a dollar bill.
You can also get a list of common NFT systems here. While this list of NFT systems contains a lot of things related to artwork (as described in this blogpost), a lot aren't. For example, CryptoKitties is an online game, not artwork (though it too allows ties to pictures of the kitties).


What is fungible?

Let's define the word fungible first. The word refers to goods you purchase that can be replaced by an identical good, like a pound of sugar, an ounce of gold, a barrel of West Texas Intermediate crude oil. When you buy one, you don't care which one you get.

In contrast, an automobile is a non-fungible good -- if you order a Tesla Model 3, you won't be satisfied with just any car that comes out of the factory, but one that matches the color and trim that you ordered. Art work is a well known non-fungible asset -- there's only one Mona Lisa painting in the world, for example.

Dollar bills and coins are fungible tokens -- they represent the value printed on the currency. You can pay your bar bill with any dollars. 

Cryptocurrencies like Bitcoin, ZCash, and Ethereum are also "fungible tokens". That's where they get their value, from their fungibility.

NFTs, or non-fungible tokens, is the idea of trading something unique (non-fungible, not the same as anything else) on the blockchain. You can trade them, but each is unique, like a painting, a trading card, a rare coin, and so on.

This is a token  -- it represents a thing. You aren't trading an artwork itself on the blockchain, but a token that represents the artwork. I mention this because most descriptions about NFTs are that you are buying artwork -- you aren't. Instead, you are buying a token that points to the artwork.

The best real world example is a receipt for purchase. Let's say you go to the Louvre and buy the Mona Lisa painting, and they give you a receipt attesting to the authenticity of the transaction. The receipt is not the artwork itself, but something that represents the artwork. It's proof you legitimately purchased it -- that you didn't steal it. If you ever resell the painting, you'll probably need something like this proving the provenance of the piece.


Show me an example!

So let's look an at an example NFT, the technical details, to see how it works. We might as well use this massive $69 million purchase as our example. Some news reports describing the purchase are here: [1] [2] [3].

None of these stories say what actually happened. They say the "artwork was purchased", but what does that actually mean? We are going to deconstruct that here. (The answer is: the artwork wasn't actually purchased).


What was the artwork?

It's a piece created by an artist named "Beeple" (Mike Winkelmann), called "Everydays: The First 5000 Days". It's a 500-megapixel image, which is about 300-megabytes in size. A thumbnail of this work is shown below.



So the obvious question is where is this artwork? Is it somewhere on the blockchain? Well, no, the file is 300-megabytes in size, much too large to put on the blockchain. Instead, the file exists somewhere out in cyberspace (described below).

What exists on the blockchain is a unique fingerprint linking to the file, known as a hash.


What is a hash?

It's at this point we need to discuss cryptography: it's not just about encryption, but also random numbers, public keys, and hashing.

A "hash" passes all the bytes of a file through an algorithm to generate a short signature or fingerprint unique to that file. No two files with different contents can have the same hash. The most popular algorithm is SHA-256, which produces a 256-bit hash.

We call it a cryptographic hash to differentiate it from weaker algorithms. With a strong algorithm, it's essentially impossible for a hacker to create a different file that has the same hash -- even if the hacker tried really hard.

Thus, the hash is the identity of the file. The identity of the artwork in question is not the title of the piece mentioned above, other pieces of art can also be given that title. Instead, the identity of the artwork is its hash. Other pieces of artwork cannot have the same hash.

For this artwork, that 300-megabyte file is hashed, producing a 256-bit value. Written in hex, this value is:

6314b55cc6ff34f67a18e1ccc977234b803f7a5497b94f1f994ac9d1b896a017

Hexadecimal results in long strings. There are shorter ways of representing hashes. One is a format called MultiHash. It's value is shown below. This refers to the same 256-bits, and thus the two forms equivalent, they are simply displayed in different ways.

QmXkxpwAHCtDXbbZHUwqtFucG1RMS6T87vi1CdvadfL7qA

This is the identity of the artwork. If you want to download the entire 300-megabyte file, simply copy and paste that into google, and it'll lead you to someplace in cyberspace where you can download it. Once you download it, you can verify the hash, such as with the command-line tool OpenSSL:

$ openssl dgst -sha256 everdays5000.jfif

SHA256(everdays5000.jfif)= 6314b55cc6ff34f67a18e1ccc977234b803f7a5497b94f1f994ac9d1b896a017

The above is exactly what I've done -- I downloaded the file from cyberspace, named it "everydays5000.jfif", and then calculated the hash to see if it matches. As you can tell by looking at my result with the above hash, they do match, so I know I have an exact copy of the artwork.


Where to download the image from cyberspace?

Above, I downloaded the file in order to demonstrate calculating the hash. It doesn't live on the blockchain, so where does it live?

There's two answers. The first answer is potentially anywhere in cyberspace. Thousands of people have downloaded the file onto the personal computers, so obviously it exists on their machines -- you just can't get at it. If you ever do come across it somewhere, you can always verify it's the exact copy by looking at the hash.

The second answer is somewhere on the darknet. The term "darknet" refers to various systems on the Internet other than the web. Remember, the "web" is not the "Internet", but simply one of many services on the Internet.

The most popular darknet services are decentralized file sharing systems like BitTorrent and IPFS. In much the same way that blockchains are decentralized transaction services, these two system are decentralized file services. When something is too big to live on the blockchain, it often lives on the darknet, usually via IPFS.

The way these services identify files is through their hashes. If you know their hash, you can stick it into one of these services and find it. Thus, if you want to find this file on IPFS, download some IPFS aware software, and plug in the hash.

There's an alternative privacy-focused browser called "Brave" that includes darknet features (TOR, BitTorrent, and IPFS). To download this file using Brave, simply use the following URL:

ipfs://QmXkxpwAHCtDXbbZHUwqtFucG1RMS6T87vi1CdvadfL7qA

But an easier way is to use one of the many IPFS gateways. These are web servers that will copy a file off the darknet and make it available to you. Here is a URL using one of those gateways:

https://ipfsgateway.makersplace.com/ipfs/QmXkxpwAHCtDXbbZHUwqtFucG1RMS6T87vi1CdvadfL7qA

If you click on this link within your browser, you'll download the 300-megabyte file from the IPFS darknet. It'll take a while, the service is slow. Once you get it, you can verify the hashes match. But since the URL is based on the hash, of course they should match, unless there was some error in transmission.


So this hash is on the blockchain?

Well, it could've been, but it wasn't. Instead, the hash that's on the blockchain points to a file containing metadata -- and it's the metadata that points to the hash.

In other words, it's a chain of hashes. The hash on the blockchain (as we'll see below) is this one here (I've made it a link so you can click on it to see the raw data):

QmPAg1mjxcEQPPtqsLoEcauVedaeMH81WXDPvPx3VC5zUz

When you click on this, you see a bunch of JSON data. Below, I've stripped away the uninteresting stuff to show the meaningful bits;

title:"EVERYDAYS: THE FIRST 5000 DAYS
description:"I made a picture from start to finish every single day from May 1st, 2007 - January 7th, 2021.  This is every motherfucking one of those pictures.
digital_media_signature:"6314b55cc6ff34f67a18e1ccc977234b803f7a5497b94f1f994ac9d1b896a017
raw_media_file:"https://ipfsgateway.makersplace.com/ipfs/QmXkxpwAHCtDXbbZHUwqtFucG1RMS6T87vi1CdvadfL7qA"

Now remember that due to the magic of cryptographic hashes, this chain can't be broken. One hash leads to the next, such that changing any single bit breaks the chain. Indeed, that's what a "blockchain" is -- a hash chain. Changing any bit of information anywhere on the Bitcoin blockchain is immediately detectable, because it throws off the hash calculations.

So we have a chain: 

hash -> metadata -> hash -> artwork

So if you own the root, you own the entire chain.

Note that this chain seems unbreakable here, in this $69 million NFT token. However, in a lot of other tokens, it's not. I mean, the hash chain itself doesn't promise much (it simply points at the artwork, giving no control over it), but other NFTs promise even less.


So what, exactly, is the NFT that was bought and sold?

Here's what Christie's sold. Here's how Christies describes it:

Beeple (b. 1981)
EVERYDAYS: THE FIRST 5000 DAYS
token ID: 40913
wallet address: 0xc6b0562605D35eE710138402B878ffe6F2E23807
smart contract address: 0x2a46f2ffd99e19a89476e2f62270e0a35bbf0756
non-fungible token (jpg)
21,069 x 21,069 pixels (319,168,313 bytes)
Minted on 16 February 2021. This work is unique.

The seller is the artist Beeple. The artist created the token (shown below) and assigned their wallet address as the owner. This is their wallet address:

0xc6b0562605D35eE710138402B878ffe6F2E23807

When Beeple created the token, he did so using a smart contract that governs the rules for the token. Such smart contracts is what makes Ethereum different from Bitcoin, allowing things to be created and managed on the blockchain other than simple currency transfers. Contracts have addresses on the blockchain, too, but no person controls them -- they are rules for decentralized transfer of things, with nobody (other than the code) in control.

There are many smart contracts that can manage NFTs. The one Beeple chose is known as MakersTokenV2. This contract has the following address:

0x2a46f2ffd99e19a89476e2f62270e0a35bbf0756

Note that if you browse this link, you'll eventually get to the code so that you can read the smart contract and see how it works. It's a derivation of something known as ERC721 that defines the properties of a certain class of non-fungible tokens.

Finally, we get to the actual token being sold here. It is:

#40913

In other words, it's the 40913rd token created and managed by the MakersTokenV2 contract. The full description of what Christies is selling is this token number governed by the named contract on the Ethereum blockchain:

Ethereum -> 0x2a46f2ffd99e19a89476e2f62270e0a35bbf0756 -> 40913

We have to search the blockchain in order to find the transaction that created this token. The transaction is identified by the hash:

0x84760768c527794ede901f97973385bfc1bf2e297f7ed16f523f75412ae772b3

The smart contract is code, so in the above transaction, Beeple calls functions within the contract to create a new token, assign digital media to it (the hash), and assign himself owner of the newly created token.

After doing this, the token #40913 now contains the following information:

creator : 0xc6b0562605d35ee710138402b878ffe6f2e23807
metadataPath : QmPAg1mjxcEQPPtqsLoEcauVedaeMH81WXDPvPx3VC5zUz
tokenURI : ipfs://ipfs/QmPAg1mjxcEQPPtqsLoEcauVedaeMH81WXDPvPx3VC5zUz

This is the thing that Christie's auction house sold. As you can see in their description above, it all points to this token on the blockcahin.

Now after the auction, the next step is to transfer the token to the new owner. Again, the contract is code, so this is calling the "Transfer()" function in that code. Beeple is the only person who can do this transfer, because only he knows the private key that controls his wallet. This transfer is done in the transaction below:

0xa342e9de61c34900883218fe52bc9931daa1a10b6f48c506f2253c279b15e5bf 

token : 40913
from : 0xc6b0562605d35ee710138402b878ffe6f2e23807
to : 0x58bf1fbeac9596fc20d87d346423d7d108c5361a

That's not the current owner. Instead, it was soon transferred again in the following transaction:

0x01d0967faaaf95f3e19164803a1cf1a2f96644ebfababb2b810d41a72f502d49 

token : 40913
from : 0x58bf1fbeac9596fc20d87d346423d7d108c5361a
to : 0x8bb37fb0f0462bb3fc8995cf17721f8e4a399629

That final address is known to belong to a person named "Metakovan", who the press has identified as the buyer of the piece. I don't know what that intermediary address between Beeple and Metakovan was, but it's common in the cryptocurrency world to have many accounts that people transfer things between, so I bet it also belongs to Metakovan.


How are things transferred?

Like everything on the blockchain, control is transfered via public/private keys. Your wallet address is a hash of your public key, which everyone knows. Anybody can transfer something to your public address without you being involved.

But every public key has a matching private key. Both are generated together, because they are mathematically related. Only somebody who knows the private key that matches the wallet address can transfer something out of the wallet to another person.

Thus Beeple's account as the following public address. But we don't know his private key, which he has stored on a computer file somewhere.

0xc6b0562605D35eE710138402B878ffe6F2E23807


To summarize what was bought and sold

So that's it. To summarize:

  • Beeple created a piece of art in a file
  • He created a hash that uniquely, and unhackably, identified that file
  • He created a metadata file that included the hash to the artwork
  • He created a hash to the metadata file
  • He uploaded both files (metadata and artwork) to the IPFS darknet decentralized file sharing service
  • He created, or minted a token governed by the MakersTokenV2 smart contract on the Ethereum blockchain
  • Christies created an auction for this token
  • The auction was concluded with a payment of $69 million worth of Ether cryptocurrency. However, nobody has been able to find this payment on the Ethereum blockchain, the money was probably transferred through some private means.
  • Beeple transferred the token to the winner, who transferred it again to this final Metakovan account
And that's it.

Okay, I understand. But I have a question. WHAT IS AN NFT????

So if you've been paying attention, and understood everything I've said, then you should still be completely confused. What exactly was purchased that was worth $69 million?

If we are asking what Metakovan purchased for his $69 million, it comes down to this: the ability to transfer MakersTokenV2 #40913 to somebody else.

That's it. That's everything he purchased. He didn't purchase the artwork, he didn't purchase the copyrights, he didn't purchase anything more than the ability to transfer that token. Even saying he owns the token is a misnomer, since the token lives on the blockchain. Instead, since only Metakovan knows the private key that controls his wallet, all that he possesses is the ability to transfer the token to the control of another private key.

It's not even as unique as people claim. Beeple can mint another token for the same artwork. Anybody else can mint a token for Beeple's artwork. Insignificant changes can be made to that artwork, and tokens can be minted for that, too. There's nothing hard and fast controlled by the code -- the relationship is in people's minds.

If you are coming here asking why somebody thinks this is worth $69 million, I have no answer for you.


The conclusion

I think there are two things that are clear here:
  • This token is not going to be meaningful to most of us: who cares if the token points to a hash that eventually points to a file freely available on the Internet?
  • This token is meaningful to those in the "crypto" (meaning "cryptocurrency") community, but it's in their minds, rather than something hard and fast controlled by code or cryptography.
In other words, the work didn't sell for $69 million of real money.

For one thing, it's not the work that was traded, or rights or control over that work. It's simply a token that pointed to the work.

For another thing, it was sold for 42329.453 ETH, not $dollars. Early adopters with lots of cryptocurrency are likely to believe the idea that the token is meaningful, whereas outsiders with $dollars don't.

An NFT is ultimately like those plaques you see next to paintings in a museum telling people about the donor or philanthropist involved -- only this plaque is somewhere where pretty much nobody will see it.




Sunday, February 28, 2021

We are living in 1984 (ETERNALBLUE)

In the book 1984, the protagonist questions his sanity, because his memory differs from what appears to be everybody else's memory.

The Party said that Oceania had never been in alliance with Eurasia. He, Winston Smith, knew that Oceania had been in alliance with Eurasia as short a time as four years ago. But where did that knowledge exist? Only in his own consciousness, which in any case must soon be annihilated. And if all others accepted the lie which the Party imposed—if all records told the same tale—then the lie passed into history and became truth. ‘Who controls the past,’ ran the Party slogan, ‘controls the future: who controls the present controls the past.’ And yet the past, though of its nature alterable, never had been altered. Whatever was true now was true from everlasting to everlasting. It was quite simple. All that was needed was an unending series of victories over your own memory. ‘Reality control’, they called it: in Newspeak, ‘doublethink’.

I know that EternalBlue didn't cause the Baltimore ransomware attack. When the attack happened, the entire cybersecurity community agreed that EternalBlue wasn't responsible.

But this New York Times article said otherwise, blaming the Baltimore attack on EternalBlue. And there are hundreds of other news articles [eg] that agree, citing the New York Times. There are no news articles that dispute this.

In a recent book, the author of that article admits it's not true, that EternalBlue didn't cause the ransomware to spread. But they defend themselves as it being essentially true, that EternalBlue is responsible for a lot of bad things, even if technically, not in this case. Such errors are justified, on the grounds they are generalizations and simplifications needed for the mass audience.

So we are left with the situation Orwell describes: all records tell the same tale -- when the lie passes into history, it becomes the truth.

Orwell continues:

He wondered, as he had many times wondered before, whether he himself was a lunatic. Perhaps a lunatic was simply a minority of one. At one time it had been a sign of madness to believe that the earth goes round the sun; today, to believe that the past is inalterable. He might be ALONE in holding that belief, and if alone, then a lunatic. But the thought of being a lunatic did not greatly trouble him: the horror was that he might also be wrong.

I'm definitely a lunatic, alone in my beliefs. I sure hope I'm not wrong.




Update: Other lunatics document their struggles with Minitrue:

Saturday, February 27, 2021

Review: Perlroth's book on the cyberarms market

New York Times reporter Nicole Perlroth has written a book on zero-days and nation-state hacking entitled “This Is How They Tell Me The World Ends”. Here is my review.


I’m not sure what the book intends to be. The blurbs from the publisher implies a work of investigative journalism, in which case it’s full of unforgivable factual errors. However, it reads more like a memoir, in which case errors are to be expected/forgivable, with content often from memory rather than rigorously fact checked notes.


But even with this more lenient interpretation, there are important flaws that should be pointed out. For example, the book claims the Saudi’s hacked Bezos with a zero-day. I claim that’s bunk. The book claims zero-days are “God mode” compared to other hacking techniques, I claim they are no better than the alternatives, usually worse, and rarely used.


But I can’t really list all the things I disagree with. It’s no use. She’s a New York Times reporter, impervious to disagreement.


If this were written by a tech journalist, then criticism would be the expected norm. Tech is full of factual truths, such as whether 2+2=5, where it’s possible for a thing to be conclusively known. All journalists make errors -- tech journalists are constantly making small revisions correcting their errors after publication.


The best example of this is Ars Technica. They pride themselves on their reader forums, where readers comment, opine, criticize, and correct stories. Sometimes readers add more interesting information to the story, providing free content to other readers. Sometimes they fix errors.


It’s often unpleasant for the journalists who steel themselves after hitting “Submit…”. They have a lot of practice defending or correcting every assertion they make, from both legitimate and illegitimate criticism. This makes them astoundingly good journalists -- mistakes editors miss readers don’t. They get trained fast to deal with criticism.


The mainstream press doesn’t have this tradition. To be fair, it couldn’t. Tech forums have techies with knowledge and experience, while the mainstream press has ignorant readers with opinions. Regardless of the story’s original content it’ll devolve into people arguing about whether Epstein was murdered (for example).


Nicole Perlroth is a mainstream reporter on a techy beat. So you see a conflict here between the expectation both sides have for each other. Techies expect a tech journalist who’ll respond to factual errors, she doesn’t expect all this criticism. She doesn’t see techie critics for what they are -- subject matter experts that would be useful sources to make her stories better. She sees them as enemies that must be ignored. This makes her stories sloppy by technical standards. I hate that this sounds like a personal attack when it’s really more a NYTimes problem -- most of their cyber stories struggle with technical details, regardless of author.


This problem is made worse by the fact that the New York Times doesn’t have “news stories” so much as “narratives”. They don’t have neutral stories reporting what happened, but narratives explaining a larger point.


A good example is this story that blames the Baltimore ransomware attack on the NSA’s EternalBlue. The narrative is that EternalBlue is to blame for damage all over the place, and it uses the Baltimore ransomware as an example. However, EternalBlue wasn’t responsible for that particular ransomware -- as techies point out.


Perlroth doesn’t fix the story. In her book, she instead criticizes techies for focusing on “the technical detail that in this particular case, the ransomware attack had not spread with EternalBlue”, and that techies don’t acknowledge “the wreckage from EternalBlue in towns and cities across the country”.


It’s a bizarre response from a journalist, refusing to fix a falsehood in a story because the rest of the narrative is true.


Some of the book is correct, telling you some real details about the zero-day market. I can't say it won't be useful to some readers, though the useful bits are buried in a lot of non-useful stuff. But most of the book is wrong about the zero-day market, a slave to the narrative that zero-days are going to end the world. I mean, I should say, I disagree with the narrative and her political policy ideas -- I guess it's up to you to decide for yourself if it's "wrong". Apart from inaccuracies, a lot is missing -- for example, you really can't understand what a "zero-day" is without also understanding the 40 year history of vuln-disclosure.


I could go on a long spree of corrections, and others have their own long list of inaccuracies, but there’s really no point. She's already defended her book as being more of a memoir than a work of journalistic integrity, so her subjective point of view is what it's about, not facts. Her fundamental narrative of the Big Bad Cyberarms Market is a political one, so any discussion of accuracy will be in service of political sides rather than the side of truth.


Moreover, she’ll just attack me for my “bruised male ego”, as she has already done to other expert critics.