Saturday, December 31, 2016

Your absurd story doesn't make me a Snowden apologist

Defending truth in the Snowden Affair doesn't make one an "apologist", for either side. There plenty of ardent supporters on either side that need to be debunked. The latest (anti-Snowden) example is the HPSCI committee report on Snowden [*], and stories like this one in the Wall Street Journal [*]. Pointing out the obvious holes doesn't make us "apologists".

As Edward Epstein documents in the WSJ story, one of the lies Snowden told was telling his employer (Booz-Allen) that he was being treated for epilepsy when in fact he was fleeing to Hong Kong in order to give documents to Greenwald and Poitras.

Well, of course he did. If you are going to leak a bunch of documents to the press, you can't do that without deceiving your employer. That's the very definition of this sort of "whistleblowing". Snowden has been quite open to the public about the lies he told his employer, including this one.

Rather than evidence that there's something wrong with Snowden, the way Snowden-haters (is that the opposite of "apologist"?) seize on this is evidence that they are a bit unhinged.


The next "lie" is the difference between the number of documents Greenwald says he received (10,000) and the number investigators claim were stolen (1.5 million). This is not the discrepancy that it seems. A "document" counted by the NSA is not the same as the number of "files" you might get on a thumb drive, which was shown the various ways of counting the size of the Chelsea/Bradley Manning leaks. Also, the NSA can only see which files Snowden accessed, not which ones were then subsequently copied to a thumb drive.

Finally, there is the more practical issue that Snowden cannot review the documents while at work. He'd have to instead download databases and copy whole directories to his thumb drives. Only away from work would he have the chance to winnow down which documents he wanted to take to Hong Kong, deleting the rest. Nothing Snowden has said conflicts with him deleting lots of stuff he never gave journalists, that he never took with him to Hong Kong, or took with him to Moscow.


The next "lie" is that Snowden claims the US revoked his passport after he got on the plane from Hong Kong and before he landed in Moscow.

This is factually wrong, in so far as the US had revoked his passport (and issued an arrest warrant) and notified Hong Kong of the revocation a day before the plane took off. However, as numerous news reports of the time reported, the US information [in the arrest warrant] was contradictory and incomplete, and thus Hong Kong did nothing to stop Snowden from leaving [*]. The Guardian [*] quotes a Hong Kong official as saying Snowden left "through a lawful and normal channel". Seriously, countries are much less concerned about checking passports of passenger leaving than those arriving.

It's the WSJ article that's clearly prevaricating here, quoting a news article where a Hong Kong official admits being notified, but not quoting the officials saying that the information was bad, that they took no action, and that Snowden left in the normal way.


The next item is Snowden's claim he destroyed all his copies of US secrets before going to Moscow. To debunk this, the WSJ refers to an NPR interview [*] with Frants Klintsevich, deputy chairman of the defense and security committee within the Duma at the time. Klintsevich is quoted as saying "Let's be frank, Snowden did share intelligence".

But Snowden himself debunks this:
The WSJ piece was written a week after this tweet. It's hard to imagine why they ignored it. Either it itself is a lie (in which case, it should've been added to the article), or it totally debunks the statement. If Klintsevich is "only speculating", then nothing after that point can be used to show Snowden is lying.

Thus, again we have proof that Epstein cannot be trusted. He clearly has an angle and bends evidence to service that angle, rather than being a reliable source of information.


I am no Snowden apologist. Most of my blogposts regarding Snowden have gone the other way, criticizing the way those like The Intercept distort Snowden disclosures in an anti-NSA/anti-USA manner. In areas of my experience (network stuff), I've blogged showing that those reporting on Snowden are clearly technically deficient.

But in this post, I show how Edward Epstein is clearly biased/untrustworthy, and how he adjusts the facts into a character attack on Snowden. I've documented it in a clear way that you can easily refute if I'm not correct. This is not because I'm a biased toward Snowden, but because I'm biased toward the truth.

Thursday, December 29, 2016

Some notes on IoCs

Obama "sanctioned" Russia today for those DNC/election hacks, kicking out 35 diplomats (**), closing diplomatic compounds (**), seizing assets of named individuals/groups (***). They also published "IoCs" of those attacks, fingerprints/signatures that point back to the attackers, like virus patterns, file hashes, and IP addresses.

These IoCs are of low quality. They are published as a political tool, to prove they have evidence pointing to Russia. They have limited utility to defenders, or those publicly analyzing attacks.

Consider the Yara rule included in US-CERT's "GRIZZLY STEPPE" announcement:


What is this? What does this mean? What do I do with this information?

It's a YARA rule. YARA is a tool ostensibly for malware researchers, to quickly classify files. It's not really an anti-virus product designed to prevent or detect an intrusion/infection, but to analyze an intrusion/infection afterward -- such as attributing the attack. Signatures like this will identify a well-known file found on infected/hacked systems.

What this YARA rule detects is, as the name suggests, the "PAS TOOL WEB KIT", a web shell tool that's popular among Russia/Ukraine hackers. If you google "PAS TOOL PHP WEB KIT", the second result points to the tool in question. You can download a copy here [*], or you can view it on GitHub here [*].

Once a hacker gets comfortable with a tool, they tend to keep using it. That implies the YARA rule is useful at tracking the activity of that hacker, to see which other attacks they've been involved in, since it will find the same web shell on all the victims.

The problem is that this P.A.S. web shell is popular, used by hundreds if not thousands of hackers, mostly associated with Russia, but also throughout the rest of the world (judging by hacker forum posts). This makes using the YARA signature for attribution problematic: just because you found P.A.S. in two different places doesn't mean it's the same hacker.

A web shell, by the way, is one of the most common things hackers use once they've broken into a server. It allows further hacking and exfiltration traffic to appear as normal web requests. It typically consists of a script file (PHP, ASP, PERL, etc.) that forwards commands to the local system. There are hundreds of popular web shells in use.

We have little visibility into how the government used these IoCs. IP addresses and YARA rules like this are weak, insufficient for attribution by themselves. On the other hand, if they've got web server logs from multiple victims where commands from those IP addresses went to this specific web shell, then the attribution would be strong that all these attacks are by the same actor.

In other words, these rules can be a reflection of the fact the government has excellent information for attribution. Or, it could be a reflection that they've got only weak bits and pieces. It's impossible for us outsiders to tell. IoCs/signatures are fetishized in the cybersecurity community: they love the small rule, but they ignore the complexity and context around the rules, often misunderstanding what's going on. (I've written thousands of the things -- I'm constantly annoyed by the ignorance among those not understanding what they mean).

I see on twitter people praising the government for releasing these IoCs. What I'm trying to show here is that I'm not nearly as enthusiastic about their quality.



Note#1: BTW, the YARA rule has to trigger on the PHP statements, not on the imbedded BASE64 encoded stuff. That's because it's encrypted with a password, so could be different for every hacker.

Note#2: Yes, the hackers who use this tool can evade detection by minor changes that avoid this YARA rule. But that's not a concern -- the point is to track the hacker using this tool across many victims, to attribute attacks. The point is not to act as an anti-virus/intrusion-detection system that triggers on "signatures".

Note#3: Publishing the YARA rule burns it. The hackers it detects will presumably move to different tools, like PASv4 instead of PASv3. Presumably, the FBI/NSA/etc. have a variety of YARA rules for various web shells used by know active hackers, to attribute attacks to various groups. They aren't publishing these because they want to avoid burning those rules.

Note#4: The PDF from the DHS has pretty diagrams about the attacks, but it doesn't appears this web shell was used in any of them. It's difficult to see where it fits in the overall picture.



(**) No, not really. Apparently, kicking out the diplomats was punishment for something else, not related to the DNC hacks.

(***) It's not clear if these "sanctions" have any teeth.

Wednesday, December 28, 2016

IoT saves lives but infosec wants to change that

The cybersecurity industry mocks/criticizes IoT. That's because they are evil and wrong. IoT saves lives. This was demonstrated a couple weeks ago when a terrorist attempted to drive a truck through a Christmas market in German. The truck has an Internet-connected braking system (firmware updates, configuration, telemetry). When it detected the collision, it deployed the brakes, bringing the truck to a stop. Injuries and deaths were a 10th of the similar Nice truck attack earlier in the year.

All the trucks shipped by Scania in the last five years have had mobile phone connectivity to the Internet. Scania pulls back telemetry from trucks, for the purposes of improving drivers, but also to help improve the computerized features of the trucks. They put everything under the microscope, such as how to improve air conditioning to make the trucks more environmentally friendly.

Among their features is the "Autonomous Emergency Braking" system. This is the system that saved lives in Germany.

You can read up on these features on their website, or in their annual report [*].


My point is this: the cybersecurity industry is a bunch of police-state fetishists that want to stop innovation, to solve the "security" problem first before allowing innovation to continue. This will only cost lives. Yes, we desperately need to solve the problem. Almost certainly, the Scania system can trivially be hacked by mediocre hackers. But if Scania had waited first to secure its system before rolling it out in trucks, many more people would now be dead in Germany. Don't listen to cybersecurity professionals who want to stop the IoT revolution -- they just don't care if people die.



Update: Many, such the first comment, point out that the emergency brakes operate independently of the Internet connection, thus disproving this post.

That's silly. That's the case of all IoT devices. The toaster still toasts without Internet. The surveillance camera still records video without Internet. My car, which also has emergency brakes, still stops. In almost no IoT is the Internet connectivity integral to the day-to-day operation. Instead, Internet connectivity is for things like configuration, telemetry, and downloading firmware updates -- as in the case of Scania.

While the brakes don't make their decision based on the current connectivity, connectivity is nonetheless essential to the equation. Scania monitors its fleet of 170,000 trucks and uses that information to make trucks, including braking systems, better.

My car is no more or less Internet connected than the Scania truck, yet hackers have released exploits at hacking conferences for it, and it's listed as a classic example of an IoT device. Before you say a Scania truck isn't an IoT device, you first have to get all those other hackers to stop calling my car an IoT device.

Wednesday, December 21, 2016

"From Putin with Love" - a novel by the New York Times

In recent weeks, the New York Times has written many stories on Russia's hacking of the Trump election. This front page piece [*] alone takes up 9,000 words. Combined, the NYTimes coverage on this topic exceeds the length of a novel. Yet, for all this text, the number of verifiable facts also equals that of a novel, namely zero. There's no evidence this was anything other than an undirected, Anonymous-style op based on a phishing campaign.

Tuesday, December 13, 2016

That anti-Trump Recode article is terrible

Trump's a dangerous populist. However, the left-wing media's anti-Trump fetishism is doing nothing to stop Trump. It's no better than "fake news" -- it gets passed around a lot on social-media, but is intellectually bankrupt, unlikely to change anybody's mind. A good example is this op-ed on Re/Code [*] about Silicon Valley leaders visiting Trump.

The most important feature of that Re/code article is that it contains no criticism of Trump other than the fact that he's a Republican. Half the country voted for Trump. Half the country voted Republican. It's not just Trump that this piece imagines as being unreasonable, but half the country. It's a fashionable bigotry among some of Silicon Valley's leftist elite.

But CEOs live in a world where half their customers are Republican, where half their share holders are Republican. They cannot lightly take political positions that differ from their investors/customers. The Re/code piece claims CEOs said "we are duty-bound as American citizens to attend". No, what they said was "we are duty-bound as officers of our corporations to attend".

The word "officer", as in "Chief Operating Officer", isn't an arbitrary title like "Senior Software Engineer" that has no real meaning. Instead, "officer" means "bound by duty". It includes a lot of legal duties, for which they can go to jail if they don't follow. It includes additional duties to shareholders, for which the board can fire them if they don't follow.

Normal employees can have Twitter disclaimers saying "these are my personal opinions only, not that of my employer". Officers of corporations cannot. They are the employer. They cannot champion political causes of their own that would impact their stock price. Sure, they can do minor things, like vote, or contribute quietly to campaigns, as long as they aren't too public. They can also do political things that enhances stock price, such as opposing encryption backdoors. Tim Cook can announce he's gay, because that enhances the brand image among Apple's key demographic of millennials. It's not something he could do if he were the CEO of John Deere Tractors.

Among the things the CEO's cannot do is take a stance against Donald Trump. The Boeing thing is a good example. The Boeing's CEO criticized Trump's stance on free trade, and 30 minutes later Trump tweeted criticisms of a $4 billion contract with Boeing, causing an immediate billion drop in Boeing's stock price.

This incident shows why the rest of us need to oppose Trump. Such vindictive politics is how democracies have failed. We cannot allow this to happen here. But the hands of CEOs are tied -- they are duty bound to avoid such hits to their stock price.

On the flip, this is one of the few chances CEOs will be able to lobby Trump. If Trump has proven anything, it's that he has no real positions on things. This would be a great time to change his mind on "encryption backdoors", for example.


Trump is a dangerous populist who sews distrust in the institutions that give us a stable, prosperous country. Any institution, from the press, to the military, to the intelligence services, to the election system, is attacked, brought into disrepute, even if it supports him. Trump has a dubious relationship with the truth, such as his repeated insistence he won a landslide rather than by a slim margin. He has deep character flaws, such as his vindictive attacks against those who oppose him (Boeing is just one of many examples). Hamilton electors cite deep, patriotic principles for changing their votes, such as Trump's foreign influences and demagoguery.

What I'm demonstrating here is that thinking persons have good reasons to oppose Trump that can be articulated without mentioning political issues that divide Democrats and Republicans. That the Re/code article is unable to do so makes it simply "hyper-partisan news", the sort that stroke's people's prejudices and passions to get passed around a lot on social media, but which is unlikely to inform anybody or change any minds. In other words, it's no better than "fake-news".





Saturday, December 10, 2016

Some notes on a Hamilton election

At least one elector for Trump has promised to switch his vote, becoming a "Hamilton Elector". Assuming 36 more electors (about 10% of Trump's total) do likewise, and Trump fails to get the 270 absolute majority, then what happens? Since all of the constitutional law scholars I follow haven't taken a stab at this, I thought I would write up some notes.

Wednesday, December 07, 2016

Orin's flawed argument on IP address privacy

In the PlayPen cases, judges have ruled that if you use the Tor network, then you don't have a reasonable expectation of privacy. It's a silly demonstration of how the law is out of sync with reality, since the entire point of using Tor is privacy.

Law prof Orin Kerr has a post discussing it. His conclusion is correct, that when the FBI exploits 0day and runs malware on your computer, then it's a search under the Fourth Amendment, requiring a warrant upon probable cause.

However, his reasoning is partly flawed. The title of his piece, "Remotely accessing an IP address inside a target computer is a search", is factually wrong. The IP address in question is not inside a target computer. This may be meaningful.


First, let's discuss how the judge reasons that there's no expectation of privacy with Tor. This is a straightforward application if the Third Party Doctrine, that as soon as you give something to a third party, your privacy rights are lost. Since you give your IP address to Tor, you lose privacy rights over it. You don't have a reasonable expectation of privacy: yes, you have an expectation of privacy, but it's not a reasonable one, and thus it's not protected.

The same is true of all your other digital information. Your credit card receipts, phone metadata, email archive, and all the rest of things you want to keep private on the Internet are not (currently) covered by the Fourth Amendment.

If you are thinking this is bullcrap, then you'd be right. Everyone knows the Third Party Doctrine doesn't fit the Internet. We want these things to be private from the government, meaning, that they must get a warrant to access them. But it's going to take a clueful Supreme Court overturning past precedence or an armed revolution in order to change things.


But that doesn't necessarily fit this case.  As Orin Kerr's post points out:
Fourth Amendment law regulates how the government learns information, not what information it learns
In other words, it doesn't matter if the FBI is allowed to get your IP address, they still need a warrant to search your computer. If you've got public information in your house, the FBI still needs a warrant to enter your house in order to get it.

Where Orin's argument is flawed is the fact that the IP address isn't on the computer being searched by the FBI's "NIT" malware. In other cases, the FBI will be able to discover a target's IP address without a search of their computer. His post would be better entitled something like "Infecting with malware is always a search" instead.

The way the Internet works is that computers have a local IP address that's meaningful only on the local network (like the one inside your home). For example, my laptop currently has the address 192.168.1.107. This may, in fact, be the same address as your laptop. That's because the addresses starting with 192,168.x.x is extremely popular for home networks (along with 10.x.x.x). It's like how we both can have the address 1079 Elm Str, just in different cities, since every city has "Elm Street" somewhere.

As data leaves your computer, the local address is translated (network address translation) into a public IP address. Google "what's my ip address", and it will tell you your public IP address. Google knows it, but your computer doesn't.

Instead, it's your home router that knows your public IP address, using your public IP on the Internet and local IP on your home network.

This Cisco router knows my public IP address
It can get even more complicated. When I travel, I use my iPhone as a wifi hotspot. But my iPhone is given a local IP address within the cellphone company's network. This address is shared with hundreds of other cellphone customers. Thus, it's AT&T's routers which knows my public IP address, neither my phone nor my laptop knows it.

Phone doesn't know public IP, only local 10.x.x.x local IP

In the PlayPen case, the FBI discovers the target's public IP address by causing it to transmit information to the FBI. This information goes through the network address translator, and when it arrives on the FBI server, has the public IP address associated with it. In other words, the point where it's discovered is on the FBI's server located in Quantico, not within the "NIT" malware running on the person's computer. The malware on the computer does not "access" the IP address in any fashion --- but by generating traffic from inside the home, it causes the IP address to be revealed outside the home.

Rather than using malware to infect a computer, the FBI might try other ways to discover a suspect's IP address. They might host a PDF or Word document on the server that has a simple image tag pointing to the FBI's server. When the user opens the document, their Acrobat/Word program isn't protected by Tor. There computer will then contact the FBI's server looking for the image, revealing their public IP address. In this example, no exploit or malware is being used. In fact, Tor warns users about this problem. The target is willfully revealing their public IP address purely because they are unaware of the meaning of their actions.

If this were how the FBI were discovering the IP address, rather than using malware, then the judge's reasoning would (probably) be correct. Since the FBI relied upon user stupidity rather than malware, no search was done.

I'd like to see Orin update his post. Either to clarify, contrary to what his title says, that what he really means is "Running malware on a target is always a search". Or conversely, describe how this "image tag" example is, despite my feelings, a search.


As a wholly separate note, I'd like to point out a different flaw in the judge's reasoning. Yes, the entry Tor node knows your IP address, but it doesn't know it belongs to you or is associated with your traffic. Yes, the exit Tor knows your traffic, but it doesn't know your IP address.

Technically, both your traffic and IP address are public (according to the Third Party Doctrine), but the private bit is the fact that the two are related. The "Tor network" isn't a single entity, but a protocol for how various different entities work together. No single entity in the Tor network sees your IP address combined with your activity or identity. Even when the FBI and NSA themselves run Tor nodes, they still can't piece it together. It is a private piece of information.

In other words, the 4 digit PIN number for your ATM card is located in this document, so it's a public number. But which PIN belongs to you is still a secret. Or, consider this website that lists all possible IP addresses, which one is yours is the secret.

Thus, the judge is wrong. The private information is not the public IP address. The private information is the public IP address combined with the traffic. The person isn't trying to keep their public IP address private, what they are trying to keep private is the fact that this IP address access the PlayPen servers.


Summary

This is a stupid post, because it doesn't disagree with Orin's conclusion: FBI running malware always needs a warrant, even if the information they are after is public. However, the technical details are wrong -- the IP address the FBI is after is located nowhere inside the computer they are searching.

Monday, December 05, 2016

That "Commission on Enhancing Cybersecurity" is absurd

An Obama commission has publish a report on how to "Enhance Cybersecurity". It's promoted as having been written by neutral, bipartisan, technical experts. Instead, it's almost entirely dominated by special interests and the Democrat politics of the outgoing administration.

In this post, I'm going through a random list of some of the 53 "action items" proposed by the documents. I show how they are policy issues, not technical issues. Indeed, much of the time the technical details are warped to conform to special interests.


IoT passwords

The recommendations include such things as Action Item 2.1.4:
Initial best practices should include requirements to mandate that IoT devices be rendered unusable until users first change default usernames and passwords. 
This recommendation for changing default passwords is repeated many times. It comes from the way the Mirai worm exploits devices by using hardcoded/default passwords.

But this is a misunderstanding of how these devices work. Take, for example, the infamous Xiongmai camera. It has user accounts on the web server to control the camera. If the user forgets the password, the camera can be reset to factory defaults by pressing a button on the outside of the camera.

But here's the deal with security cameras. They are placed at remote sites miles away, up on the second story where people can't mess with them. In order to reset them, you need to put a ladder in your truck and drive 30 minutes out to the site, then climb the ladder (an inherently dangerous activity). Therefore, Xiongmai provides a RESET.EXE utility for remotely resetting them. That utility happens to connect via Telnet using a hardcoded password.

The above report misunderstands what's going on here. It sees Telnet and a hardcoded password, and makes assumptions. Some people assume that this is the normal user account -- it's not, it's unrelated to the user accounts on the web server portion of the device. Requiring the user to change the password on the web service would have no effect on the Telnet service. Other people assume the Telnet service is accidental, that good security hygiene would remove it. Instead, it's an intended feature of the product, to remotely reset the device. Fixing the "password" issue as described in the above recommendations would simply mean the manufacturer would create a different, custom backdoor that hackers would eventually reverse engineer, creating MiraiV2 botnet. Instead of security guides banning backdoors, they need to come up with standard for remote reset.

That characterization of Mirai as an IoT botnet is wrong. Mirai is a botnet of security cameras. Security cameras are fundamentally different from IoT devices like toasters and fridges because they are often exposed to the public Internet. To stream video on your phone from your security camera, you need a port open on the Internet. Non-camera IoT devices, however, are overwhelmingly protected by a firewall, with no exposure to the public Internet. While you can create a botnet of Internet cameras, you cannot create a botnet of Internet toasters.

The point I'm trying to demonstrate here is that the above report was written by policy folks with little grasp of the technical details of what's going on. They use Mirai to justify several of their "Action Items", none of which actually apply to the technical details of Mirai. It has little to do with IoT, passwords, or hygiene.


Public-private partnerships
Action Item 1.2.1: The President should create, through executive order, the National Cybersecurity Private–Public Program (NCP 3 ) as a forum for addressing cybersecurity issues through a high-level, joint public–private collaboration.
We've had public-private partnerships to secure cyberspace for over 20 years, such as the FBI InfraGuard partnership. President Clinton's had a plan in 1998 to create a public-private partnership to address cyber vulnerabilities. President Bush declared public-private partnerships the "cornerstone of his 2003 plan to secure cyberspace.

Here we are 20 years later, and this document is full of new naive proposals for public-private partnerships There's no analysis of why they have failed in the past, or a discussion of which ones have succeeded.

The many calls for public-private programs reflects the left-wing nature of this supposed "bipartisan" document, that sees government as a paternalistic entity that can help. The right-wing doesn't believe the government provides any value in these partnerships. In my 20 years of experience with government private-partnerships in cybersecurity, I've found them to be a time waster at best and at worst, a way to coerce "voluntary measures" out of companies that hurt the public's interest.


Build a wall and make China pay for it
Action Item 1.3.1: The next Administration should require that all Internet-based federal government services provided directly to citizens require the use of appropriately strong authentication.
This would cost at least $100 per person, for 300 million people, or $30 billion. In other words, it'll cost more than Trump's wall with Mexico.

Hardware tokens are cheap. Blizzard (a popular gaming company) must deal with widespread account hacking from "gold sellers", and provides second factor authentication to its gamers for $6 each. But that ignores the enormous support costs involved. How does a person prove their identity to the government in order to get such a token? To replace a lost token? When old tokens break? What happens if somebody's token is stolen?

And that's the best case scenario. Other options, like using cellphones as a second factor, are non-starters.

This is actually not a bad recommendation, as far as government services are involved, but it ignores the costs and difficulties involved.

But then the recommendations go on to suggest this for private sector as well:
Specifically, private-sector organizations, including top online retailers, large health insurers, social media companies, and major financial institutions, should use strong authentication solutions as the default for major online applications.
No, no, no. There is no reason for a "top online retailer" to know your identity. I lie about my identity. Amazon.com thinks my name is "Edward Williams", for example.

They get worse with:
Action Item 1.3.3: The government should serve as a source to validate identity attributes to address online identity challenges.
In other words, they are advocating a cyber-dystopic police-state wet-dream where the government controls everyone's identity. We already see how this fails with Facebook's "real name" policy, where everyone from political activists in other countries to LGBTQ in this country get harassed for revealing their real names.

Anonymity and pseudonymity are precious rights on the Internet that we now enjoy -- rights endangered by the radical policies in this document. This document frequently claims to promote security "while protecting privacy". But the government doesn't protect privacy -- much of what we want from cybersecurity is to protect our privacy from government intrusion. This is nothing new, you've heard this privacy debate before. What I'm trying to show here is that the one-side view of privacy in this document demonstrates how it's dominated by special interests.


Cybersecurity Framework
Action Item 1.4.2: All federal agencies should be required to use the Cybersecurity Framework. 
The "Cybersecurity Framework" is a bunch of a nonsense that would require another long blogpost to debunk. It requires months of training and years of experience to understand. It contains things like "DE.CM-4: Malicious code is detected", as if that's a thing organizations are able to do.

All the while it ignores the most common cyber attacks (SQL/web injections, phishing, password reuse, DDoS). It's a typical example where organizations spend enormous amounts of money following process while getting no closer to solving what the processes are attempting to solve. Federal agencies using the Cybersecurity Framework are no safer from my pentests than those who don't use it.

It gets even crazier:
Action Item 1.5.1: The National Institute of Standards and Technology (NIST) should expand its support of SMBs in using the Cybersecurity Framework and should assess its cost-effectiveness specifically for SMBs.
Small businesses can't even afford to even read the "Cybersecurity Framework". Simply reading the doc, trying to understand it, would exceed their entire IT/computer budget for the year. It would take a high-priced consultant earning $500/hour to tell them that "DE.CM-4: Malicious code is detected" means "buy antivirus and keep it up to date".


Software liability is a hoax invented by the Chinese to make our IoT less competitive
Action Item 2.1.3: The Department of Justice should lead an interagency study with the Departments of Commerce and Homeland Security and work with the Federal Trade Commission, the Consumer Product Safety Commission, and interested private sector parties to assess the current state of the law with regard to liability for harm caused by faulty IoT devices and provide recommendations within 180 days. 
For over a decade, leftists in the cybersecurity industry have been pushing the concept of "software liability". Every time there is a major new development in hacking, such as the worms around 2003, they come out with documents explaining why there's a "market failure" and that we need liability to punish companies to fix the problem. Then the problem is fixed, without software liability, and the leftists wait for some new development to push the theory yet again.

It's especially absurd for the IoT marketspace. The harm, as they imagine, is DDoS. But the majority of devices in Mirai were sold by non-US companies to non-US customers. There's no way US regulations can stop that.

What US regulations will stop is IoT innovation in the United States. Regulations are so burdensome, and liability lawsuits so punishing, that it will kill all innovation within the United States. If you want to get rich with a clever IoT Kickstarter project, forget about it: you entire development budget will go to cybersecurity. The only companies that will be able to afford to ship IoT products in the United States will be large industrial concerns like GE that can afford the overhead of regulation/liability.

Liability is a left-wing policy issue, not one supported by technical analysis. Software liability has proven to be immaterial in any past problem and current proponents are distorting the IoT market to promote it now.


Cybersecurity workforce
Action Item 4.1.1: The next President should initiate a national cybersecurity workforce program to train 100,000 new cybersecurity practitioners by 2020. 
The problem in our industry isn't the lack of "cybersecurity practitioners", but the overabundance of "insecurity practitioners".

Take "SQL injection" as an example. It's been the most common way hackers break into websites for 15 years. It happens because programmers, those building web-apps, blinding paste input into SQL queries. They do that because they've been trained to do it that way. All the textbooks on how to build webapps teach them this. All the examples show them this.

So you have government programs on one hand pushing tech education, teaching kids to build web-apps with SQL injection. Then you propose to train a second group of people to fix the broken stuff the first group produced.

The solution to SQL/website injections is not more practitioners, but stopping programmers from creating the problems in the first place. The solution to phishing is to use the tools already built into Windows and networks that sysadmins use, not adding new products/practitioners. These are the two most common problems, and they happen not because of a lack of cybersecurity practitioners, but because the lack of cybersecurity as part of normal IT/computers.

I point this to demonstrate yet against that the document was written by policy people with little or no technical understanding of the problem.


Nutritional label
Action Item 3.1.1: To improve consumers’ purchasing decisions, an independent organization should develop the equivalent of a cybersecurity “nutritional label” for technology products and services—ideally linked to a rating system of understandable, impartial, third-party assessment that consumers will intuitively trust and understand. 
This can't be done. Grab some IoT devices, like my thermostat, my car, or a Xiongmai security camera used in the Mirai botnet. These devices are so complex that no "nutritional label" can be made from them.

One of the things you'd like to know is all the software dependencies, so that if there's a bug in OpenSSL, for example, then you know your device is vulnerable. Unfortunately, that requires a nutritional label with 10,000 items on it.

Or, one thing you'd want to know is that the device has no backdoor passwords. But that would miss the Xiongmai devices. The web service has no backdoor passwords. If you caught the Telnet backdoor password and removed it, then you'd miss the special secret backdoor that hackers would later reverse engineer.

This is a policy position chasing a non-existent technical issue push by Pieter Zatko, who has gotten hundreds of thousands of dollars from government grants to push the issue. It's his way of getting rich and has nothing to do with sound policy.


Cyberczars and ambassadors

Various recommendations call for the appointment of various CISOs, Assistant to the President for Cybersecurity, and an Ambassador for Cybersecurity. But nowhere does it mention these should be technical posts. This is like appointing a Surgeon General who is not a doctor.

Government's problems with cybersecurity stems from the way technical knowledge is so disrespected. The current cyberczar prides himself on his lack of technical knowledge, because that helps him see the bigger picture.

Ironically, many of the other Action Items are about training cybersecurity practitioners, employees, and managers. None of this can happen as long as leadership is clueless. Technical details matter, as I show above with the Mirai botnet. Subtlety and nuance in technical details can call for opposite policy responses.


Conclusion

This document is promoted as being written by technical experts. However, nothing in the document is neutral technical expertise. Instead, it's almost entirely a policy document dominated by special interests and left-wing politics. In many places it makes recommendations to the incoming Republican president. His response should be to round-file it immediately.

I only chose a few items, as this blogpost is long enough as it is. I could pick almost any of of the 53 Action Items to demonstrate how they are policy, special-interest driven rather than reflecting technical expertise.

Thursday, December 01, 2016

Electoral college should ignore Lessig

Reading this exchange between law profs disappoints me. [1] [2] [3] [4] [5]

The decision Bush v Gore cites the same principle as Lessig, that our system is based on "one person one vote". But it uses that argument to explain why votes should not be changed once they are cast:
Having once granted the right to vote on equal terms, the State may not, by later arbitrary and disparate treatment, value one person's vote over that of another.
Lessig cites the principle of "one person one vote", but in a new and novel way. He applies in an arbitrary way that devalues some of the votes that have already been cast. Specifically, he claims that votes cast for state electors should now be re-valued as direct votes for a candidate.

The United States isn't a union of people. It's a union of states. It says so right in the name. Compromises between the power of the states and power of the people have been with us for forever. That's why states get two Senators regardless of size, but Representatives to the House are assigned proportional to population. The Presidential election is expressly a related compromise, assigning the number of electors to a state equal to the number of Senators plus Representatives.

The Constitution doesn't even say electors should be chosen using a vote. It's up to the states to decide. All states have chosen election, but they could've demanded a wrestling match or juggling contest instead. The point is that the Constitution, historical papers, and 200 years of history rejects Lessig's idea that the President should be elected with a popular vote.

Moreover, this election shows the value of election by states. The tension nowadays is between big urban areas and rural areas. In the city, when workers lose their jobs due to immigration or trade, they can go down the street and get another job. In a rural area, when the factory shuts down, the town is devastated, and there are no other jobs to be had. The benefits of free trade are such that even Trump can't roll them back -- but as a nation we need to address the disproportionate impact changes have on rural communities. That rural communities can defend their interests is exactly why our Constitution is the way it is -- and why the President isn't chosen with a popular vote.

Hillary did not win the popular vote. No popular vote was held. Instead, we had state-by-state votes for electors. It's implausible that the per-candidate votes would have been the same had this been a popular vote. Candidates would have spent their time and money campaigning across the entire country instead of just battleground states. Voters would have different motivations on which candidates to choose and on whether they should abstain. There is nothing more clearly "disparate and arbitrary" treatment of votes than claiming a your vote for an elector  (or abstention) will now instead be treated as a national vote for the candidate.

Hillary got only 48% of the vote, what we call a plurality. Counting abstentions, that's only 26% of the vote. The rules of the Electoral College demands the winner get an absolute majority, meaning 50% even with abstentions, or almost double what Hillary got in votes. So among the arbitrary rules that Lessig has pulled out of his hat is that a plurality is now sufficient. Even though 74% of voters did not vote for her, Lessig uses the principle of "one person one vote" means she is the unambiguous choice of the people.

Even if you accept all this, there is still the problem that our election system isn't accurate. As Bush v Gore noted, around 2% of ballots nationwide didn't clearly show a choice between a presidential candidate. Others have pointed to weather in different parts of the country as having a significant impact on voter turnout. In science, we call this a measurement error. It means that any vote within 2% is scientifically a tie. That's more than the difference between Hillary and Trump. Yes, elections must still choose a winner despite a tie. However, an Electoral College evaluating the "sense of the people" (as Lessig cites Federalist #68) is bound by no such limitation. That they see no clear winner among the popular vote is best view to take -- not that Hillary won some sort of mandate.

My point isn't to show that Lessig is wrong so much to show that his argument is arbitrary. Had the positions been reversed, with Hillary getting the electoral vote and Trump the popular, Lessig could cite the same principle of "one person one vote" and the same "Federalist #68" in order to demonstrate why the Electoral College should still choose Hillary. In other words, Lessig would argue that the principle means (as in Bush v Gore) that Hillary's electors not devalue the votes cast for them by treating them as popular vote. Lessig would argue that since Trump didn't get a statistically significant absolute majority, then there was no clear "sense of the people".

America is in danger of populism, which ravages our institutions that make our country prosperous, stable, and "great". Trump is populist on the right, but Lessig is a populist on the left. Lessig ran for the presidency on the left on a platform no less populist than Trump's. This current piece, demanding we follow arbitrary rules to get desired results, is no less an attack on the institution of the "Rule of Law" and "Equal Protection" than Trump's attacks.

What "should" the Electoral College do? Whatever the heck they want. I would point out that Federalist #68 does warn about the influence of "foreign powers" and of men using the "little arts of popularity" to gain the Presidency. This matches Trump accurately. I would hope that at least some Trump-electors consider this and change their votes. Historically, that we haven't seen more electors change their votes seems to be a bit of a problem.

Sunday, November 27, 2016

No, it’s Matt Novak who is a fucking idiot

I keep seeing this Gizmodo piece entitled “Snowden is a fucking idiot”. I understand the appeal of the piece. The hero worship of Edward Snowden is getting old. But the piece itself is garbage.

The author, Matt Novak, is of the new wave of hard-core leftists intolerant of those who disagree with them. His position is that everyone is an idiot who doesn’t agree with his views: Libertarians, Republicans, moderate voters who chose Trump, and even fellow left-wingers that aren’t as hard-core.

If you carefully read his piece, you’ll see that Novak doesn’t actually prove Snowden is wrong. Novak doesn’t show how Snowden disagrees with facts, but only how Snowden disagrees with the left-wing view of the world, "libertarian garbage" as Novak puts it. It’s only through deduction that we come to the conclusion: those who aren’t left-wing are idiots, Snowden is not left-wing, therefore Snowden is an idiot.

The question under debate in the piece is:
technology is more important than policy as a way to protect our liberties
In other words, if you don’t want the government spying on you, then focus on using encryption (use Signal) rather than trying to change the laws so they can’t spy on you.

On a factual basis (rather than political), Snowden is right. If you live in Germany and don’t want the NSA spying on you there is little policy-wise that you can do about it, short of convincing Germany to go to war against the United States to get the US to stop spying.

Likewise, for all those dissenters in countries with repressive regimes, technology precedes policy. You can’t effect change until you first can protect yourselves from the state police who throws you in jail for dissenting. Use Signal.

In our own country, Snowden is right about “politics”. Snowden’s leak showed how the NSA was collecting everyone’s phone records to stop terrorism. Privacy organizations like the EFF supported the reform bill, the USA FREEDOM ACT. But rather than stopping the practice, the “reform” opened up the phone records to all law enforcement (FBI, DEA, ATF, IRS, etc.) for normal law enforcement purposes.

Imagine the protestors out there opposing the Dakota Access Pipeline. The FBI is shooting down their drones and blasting them with water cannons. Now, because of the efforts of the EFF and other privacy activists, using the USA FREEDOM ACT, the FBI is also grabbing everyone’s phone records in the area. Ask yourself who is the fucking idiot here: the guy telling you to use Signal, or the guy telling you to focus on “politics” to stop this surveillance.

Novak repeats the hard-left version of the creation of the Internet:
The internet has always been monitored by the state. It was created by the fucking US military and has been monitored from day one. Surveillance of the internet wasn’t invented after September 11, 2001, no matter how many people would like to believe that to be the case.
No, the Internet was not created by the US military. Sure, the military contributed to the Internet, but the majority of contributions came from corporations, universities, and researchers. The left-wing claim that the government/military created the Internet involves highlighting their contributions while ignoring everyone else’s.

The Internet was not “monitored from day one”, because until the 1990s, it wasn’t even an important enough network to monitor. As late as 1993, the Internet was dwarfed in size and importance by numerous other computer networks – until the web took off that year, the Internet was considered a temporary research project. Those like Novak writing the history of the Internet are astonishingly ignorant of the competing networks of those years. They miss XNS, AppleTalk, GOSIP, SNA, Novel, DECnet, Bitnet, Uunet, Fidonet, X.25, Telenet, and all the other things that were really important during those years.

And, mass Internet surveillance did indeed come only after 9/11. The NSA’s focus before that was on signals and telephone lines, because that’s where all the information was.  When 9/11 happened, they were still trying to catch up to the recent growth of the Internet. Virtually everything Snowden documents came after 9/11. Sure, they had programs like FAIRVIEW that were originally created to get telephone information in the 1970s, but these programs only started delivering mass Internet information after 9/11. Sure, the NSA occasionally got emails before 9/11, but nothing like the enormous increase in collection afterwards.

What I’ve shown here is that Matt Novak is a fucking idiot. He gets basic facts wrong about how the Internet works. He doesn’t prove Snowden’s actually wrong by citing evidence, only that Snowden is wrong because he disagrees with what leftists like Novak believe to be right. All the actual evidence supports Snowden in this case.

It doesn't mean we should avoid politics. Technology and politics are different things, it's not either-or. Whether we do one has no impact on deciding to do the other. But if you are a DAP protester, use Signal instead of unencrypted messaging or phone, instead of waiting for activists to pass legislation.

Monday, November 21, 2016

The false-false-balance problem

Until recently, journalism in America prided itself on objectivity -- to report the truth, without taking sides. That's because big debates are always complexed and nuanced, and that both sides are equally reasonable. Therefore, when writing an article, reporters attempt to achieve balance by quoting people/experts/proponents on both sides of an issue.

But what about those times when one side is clearly unreasonable? You'd never try to achieve balance by citing those who believe in aliens and big-foot, for example.Thus, journalists have come up with the theory of false-balance to justify being partisan and one-sided on certain issues.

Typical examples where journalists cite false-balance is reporting on anti-vaxxers, climate-change denialists, and Creationists. More recently, false-balance has become an issue in the 2016 Trump election.

But this concept of false-balance is wrong. It's not that anti-vaxxers, denialists, Creationists, and white supremacists are reasonable. Instead, the issue is that the left-wing has reframed the debate. They've simplified it into something black-and-white, removing nuance, in a way that shows their opponents as being unreasonable. The media then adopts the reframed debate.


Let's talk anti-vaxxers. One of the policy debates is whether the government has the power to force vaccinations on people (or on people's children). Reasonable people say the government doesn't have this power. Many (if not most) people hold this opinion while agreeing that vaccines are both safe and effective (that they don't cause autism).

Consider this February 2015 interview with Chris Christy. He's one of the few politicians who have taken the position that government can override personal choice, such as in the case of an outbreak. Yet, when he said "parents need to have some measure of choice in things as well, so that's the balance that the government has to decide", he was broadly reviled as an anti-vaxxer throughout the media. The press reviled other Republican candidates the same way, even while ignoring almost identical statements made at the same time by the Obama administration. They also ignored clearly anti-vax comments from both Hillary and Obama during the 2008 election.

Yes, we can all agree that anti-vaxxers are a bunch of crazy nutjobs. In calling for objectivity, we aren't saying that you should take them seriously. Instead, we are pointing out the obvious bias in the way the media attacked Republican candidates as being anti-vaxxers, and then hiding behind "false-balance".


Now let's talk evolution. The issue is this: Darwinism has been set up as some sort of competing religion against belief in God(s). High-schools teach children to believe in Darwinism, but not to understand Darwinism. Few kids graduate understanding Darwinism, which is why it's invariably misrepresented in mass-media (X-Men, Planet of the Apes, Waterworld, Godzilla, Jurassic Park, etc.). The only movie I can recall getting evolution correct is Idiocracy.

Also, evolution has holes in it. This isn't a bad thing in science, every scientific theory has holes. Science isn't a religion. We don't care about the holes. That some things remain unexplained by a theory doesn't bother us. Science has no problem with gaps in knowledge, where we admit "I don't know". It's religion that has "God of the gaps", where ignorance isn't tolerated, and everything unexplained is explained by a deity.

The hole in evolution is how the cell evolved. The fossil record teaches us a lot about multi-cellular organisms over the last 400-million years, but not much about how the cell evolved in the 4-billion years on planet Earth before that. I can point to radio isotope dating and fossil finds to prove dinosaurs existed 250,000 million to 60 million years ago, thus disproving your crazy theory of a 10,000 year-old Earth. But I can't point to anything that disagrees with your view that a deity created the original cellular organisms. I don't agree with that theory, but I can't disprove it, either.

The point is that Christians have a good point that Darwinism is taught as a competing religion. You see this in the way books that deny holes in knowledge, insisting that Darwinism explains even how cells evolved, and that doubting Darwin is blasphemy. 

The Creationist solution is wrong, we can't teach religion in schools. But they have a reasonable concern about religious Darwinism. The solution there is to do a better job teaching it as a science. If kids want to believe that one of the deities created the first cells, then that's okay, as long as they understand the fossil record and radioisotope dating.


Now let's talk Climate Change. This is a tough one, because you people have lost your collective minds. The debate is over how much change? how much danger? how much costs?. The debate is not over Is it true?. We all agree it's true, even most Republicans. By keeping the debate between the black-and-white "Is global warming true?", the left-wing can avoid the debate "How much warming?".

Consider this exchange from one of the primary debates:

Moderator: ...about climate change...
RUBIO: Because we’re not going to destroy our economy ...
Moderator: Governor Christie, ... what do you make of skeptics of climate change such as Senator Rubio?
CHRISTIE: I don’t think Senator Rubio is a skeptic of climate change.
RUBIO: I'm not a denier/skeptic of climate change.

The media (in this case CNN) is so convinced that Republican deny climate change that they can't hear any other statement. Rubio clearly didn't deny Climate Change, but the moderator was convinced that he did. Every statement is seen as outright denial, or code words for denial. Thus, convinced of the falseness of false-balance, the media never sees the fact that most Republicans are reasonable.

Similar proof of Republican non-denial is this page full of denialism quotes. If you actually look at the quotes, you'll see that when taken in context, virtually none of the statements deny climate change. For example, when Senator Dan Sulliven says "no concrete scientific consensus on the extent to which humans contribute to climate change", he is absolutely right. There is 97% consensus that mankind contributes to climate change, but there is widespread disagreement on how much.

That "97% consensus" is incredibly misleading. Whenever it's quoted, the speaker immediately moves the bar, claiming that scientists also agree with whatever crazy thing the speaker wants, like hurricanes getting worse (they haven't -- at least, not yet).

There's no inherent reason why Republicans would disagree with addressing Climate Change. For example, Washington State recently voted on a bill to impose a revenue neutral carbon tax. The important part is "revenue neutral": Republicans hate expanding government, but they don't oppose policies that keep government the same side. Democrats opposed this bill, precisely because it didn't expand the size of government. That proves that Democrats are less concerned with a bipartisan approach to addressing climate change, but instead simply use it as a wedge issue to promote their agenda of increased regulation and increased spending. 

If you are serious about address Climate Change, then agree that Republicans aren't deniers, and then look for bipartisan solutions.


Conclusion

The point here is not to try to convince you of any political opinion. The point here is to describe how the press has lost objectivity by adopting the left-wing's reframing of the debate. Instead of seeing balanced debate between two reasonable sides, they see a warped debate between a reasonable (left-wing) side and an unreasonable (right-wing) side. That the opposing side is unreasonable is so incredible seductive they can never give it up.

That Christie had to correct the moderator in the debate should teach you that something is rotten in journalism. Christie understood Rubio's remarks, but the debate moderator could not. Journalists cannot even see the climate debate because they are wedded to the left-wing's corrupt view of the debate.

The issue of false-balance is wrong. In debates that evenly divide the population, the issues are complex and nuanced, both sides are reasonable. That's the law. It doesn't matter what the debate is. If you see the debate simplified to the point where one side is obviously unreasonable, then it's you who has a problem.



Dinner with Rajneeshees

One evening I answered the doorbell to find a burgundy clad couple on the doorstep. They were followers of the Bagwan Shree Rajneesh, whose cult had recently purchased a large ranch in the eastern part of the state. No, they weren't there to convert us. They had come for dinner. My father had invited them.

My father was a journalist, who had been covering the controversies with the cult's neighbors. Yes, they were a crazy cult which later would breakup after committing acts of domestic terrorism.  But this couple was a pair of young professionals (lawyers) who, except for their clothing, looked and behaved like normal people. They would go on to live normal lives after the cult.

Growing up, I lived in two worlds. One was the normal world, which encourages you to demonize those who disagree with you. On the political issues that concern you most, you divide the world into the righteous and the villains. It's not enough to believe the other side wrong, you most also believe them to be evil.

The other world was that of my father, teaching me to see the other side of the argument. I guess I grew up with my own Atticus Finch (from To Kill a Mockingbird), who set an ideal. In much the same way that Atticus told his children that they couldn't hate even Hitler, I was told I couldn't hate even the crazy Rajneeshees.

Monday, November 14, 2016

Comments for my biracial niece

I spent the night after Trump’s victory consoling my biracial niece worried about the election. Here are my comments. You won’t like them, expecting the opposite given the title. But it’s what I said.


I preferred Hillary, but that doesn’t mean Trump is an evil choice.

Don’t give into the hate. You get most of your news via social media sites like Facebook and Twitter, which are at best one-sided and unfair. At worst, they are completely inaccurate. Social media posts are driven by emotion, not logic. Sometimes that emotion is love of cute puppies. Mostly it’s anger, fear, and hate. Instead of blindly accepting what you read, challenge it. Find the original source. Find a better explanation. Search for context.

Don’t give into the hate. The political issues that you are most concerned about are not simple and one-sided with obvious answers. They are complex and nuanced. Just because somebody disagrees with you doesn’t mean they are unreasonable or evil. In today’s politics, it has become the norm that we can’t simply disagree with somebody, but must also vilify and hate them. We’ve redefined politics to be the fight between the virtuous (whatever side we are on) and the villains (the other side). The reality is that both sides are equally reasonable, equally virtuous.

Don’t give into the hate. Learn “critical thinking”. Learn how “cherry picking” the fringe of the opposing side is used to tarnish the mainstream. Learn how “strawman arguments” makes the other side sound dumb. Learn how “appeal to emotion” replaces logic. Learn how “ad hominem” statements attack the credibility of opponent’s arguments. Learn how issues are simplified into “black vs. white” options rather than the nuance and complexity that actually exists.

Don’t give into the hate. The easy argument is that it’s okay to be hateful and bigoted toward Trump and his supporters because they are bigoted against you. No, it’s not okay to hate anybody, not even Hitler, as Atticus Finch explains in “To Kill A Mockingbird”. In that book, Atticus even tries to understand, and not hate, Robert Ewell, the racist antagonist in the book who eventually tries to stab Scout (Atticus’s daughter). Trump’s supporters may be wrong, but it’s a wrongness largely based on ignorance, not malice. Yes, they probably need to be kindly educated, but they don’t deserve punishment and hate.

America is the same country it was last week. It's citizens haven't changed, only one man in an office has changed. The President has little actual power, either to fix things (as his supporters want) or to break things (as his opponents fear). We have strong institutions, from Congress, to the Courts, to the military, that will hold him check. The biggest worries are that he's the first President in history with no government experience, and that he's strongly "populist" (which historically has been damaging for countries). We should be watchful, and more willing to stand up and fight when Trump does something bad. However, we shouldn't give into hate.

How to teach endian

On /r/programming is this post about byte-order/endianness. It gives the same information as most documents on the topic. It is wrong. It's been wrong for over 30 years. Here's how it should be taught.

One of the major disciplines in computer science is parsing/formatting. This is the process of converting the external format of data (file formats, network protocols, hardware registers) into the internal format (the data structures that software operates on).

It should be a formal computer-science discipline, because it's actually a lot more difficult than you'd expect. That's because the majority of vulnerabilities in software that hackers exploit are due to parsing bugs. Since programmers don't learn about parsing formally, they figure it out for themselves, creating ad hoc solutions that are prone to bugs. For example, programmers assume external buffers cannot be larger than internal ones, leading to buffer overflows.

An external format must be well-defined. What the first byte means must be written down somewhere, then what the second byte means, and so on. For Internet protocols, these formats are written in RFCs, such as RFC 791 for the "Internet Protocol". For file formats, these are written in documents, such as those describing GIF files, JPEG files, MPEG files, and so forth.

Among the issues is how integers should be represented. The definition must include the size, whether signed/unsigned, what the bits means (almost always 2s-compliment), and the byte-order. Integers that have values above 255 must be represented with more than one byte. Whether those bytes go left-to-right or right-to-left is known as byte-order.

We also called this endianness, where one form is big-endian and the other form is little-endian. This is a joke, referring back to Jonathan Swift's tale Gulliver's Travels, where two nations were at war arguing whether an egg should be cracked on the big end or the little end. The joke refers to the Holy Wars in computing where two sides argued strongly for one byte-order or the other. The commentary using the term "endianess" is that neither format matters.

However, big-endian is how humans naturally process numbers. If we have the hex value 0x2211, then we expect that representing this number in a file/protocol will consist of one byte with the value 0x22 followed by another byte with the value 0x11. In a little-endian format specification, however, the order of bytes will be reversed, with a value of 0x2211 represented with 0x11 followed by 0x22.

This is further confused by the fact that the nibbles in the byte will still be written in conventional, big-endian order. In other words, the big-endian format for the number 0x1234 is 0x12 0x34. however, the little-endian format is 0x34 0x12  -- not 0x43 0x21 as you might naively expect trying to swap everything around in your mind.

If little-endian is so confusing to the human mind, why would anybody ever use it? The answer is that it can be more efficient for logic circuits. Or at least, back in the 1970s, when CPUs had only a few thousand logic gates, it could be more efficient. Therefore, a lot of internal processing was little-endian, and this bled over into external formats as well.

On the other hand, most network protocols and file formats remain big-endian. Format specifications are written for humans to understand, and big-endian is easier for us humans.


So once you understand the byte-order issue in external formats, the next problem is figuring out how to parse it, to convert it into an internal data structure. Well, we first have to understand how to parse things in general.

There are two ways of parsing thing: buffered or streaming. In the buffered model, you read in the entire input first (like the entire file, or the entire network packet), then parse it. In the streaming mode, you read a byte at a time, parse that byte, then read in the next byte. Stream mode is best for very large files or for streaming data across TCP network connections.

However, buffered parsing is the general way most people do it, so I'll assume that in this guide.

Let's assume you've read in the file (or network data) into a buffer we'll call buf. Your parse that buffer at the current offset until you reach the end.

Given that, then the way you'd parse a big-endian integer x is the following line of code:

 x = buf[offset] * 256 + buf[offset+1];

Or, if you prefer logical operators, you might do it as:

 x = buf[offset]<<8 | buf[offset+1];

Compilers always translate multiplication by powers-of-2 into shift instructions, so either statement will perform the same. Some compilers are smart enough to recognize this pattern as parsing an integer, and might replace this with loading two bytes from memory and byte-swapping instead.

For a little-endian integer in the external data, you'd reverse how you parse this, like one of the following two statements.

x = buf[offset+1] * 256 + buf[offset];
x = buf[offset] + buf[offset+1] * 256;

If we were talking about JavaScript, C#, or a bunch of other languages, at this point the conversation about endianess would end. But if talking about C/C++, we've got some additional wrinkles to deal with.


The problem with C is that it's a low-level language. That means it exposes the internal format of integers to the programmer. In other words, the above code focuses on the external representation of integers, and doesn't care about the internal representation. It doesn't care if you are using an x86 little-endian CPU or some RISC big-endian CPU.

But in C, you can parse an integer by relying upon the internal CPU representation. It would look something like the following:

 x = *(short*)(buf + offset);

This code produces different results on a little-endian machine and a big-endian machine. If the two bytes are 0x22 and 0x11, then on a big-endian machine this produces a short integer with a value of 0x2211, but a little-endian machine produces the value of 0x1122.

If the external format is big-endian, then on a little-endian machine, you'll have to byte-swap the result. In other words, the code would look something like:

 x = *(short*)(buf + offset);
 #ifdef LITTLE_ENDIAN
 x = (x >> 8) | ((x & 0xFF) << 8);
 #endif

Of course, you'd never write code that looks like this. Instead, you'd use a macro, as follows:

 x = ntohs(*(short*)(buf + offset));

The macro means network-to-host-short, where network byte-order is big-endian, and host byte-order is undefined. On a little-endian host CPU, the bytes are swapped as shown above. On a big-endian CPU, the macro is defined as nothing. This macro is defined in standard sockets libraries, like <arpa/inet.h>. There are a broad range of similar macros in other libraries for byte swapping integers.

In truth, this is not how it's really done, parsing an individual integer at a time. Instead, what programmers do is define a packed C structure that corresponds to the external format they are trying to parse, then cast the buffer into that structure.

For example, in Linux is the include file <netinet/ip.h> which defines the Internet protocol header:

struct ip {
#if BYTE_ORDER == LITTLE_ENDIAN 
u_char ip_hl:4, /* header length */
ip_v:4; /* version */
#endif
#if BYTE_ORDER == BIG_ENDIAN 
u_char ip_v:4, /* version */
ip_hl:4; /* header length */
#endif
u_char ip_tos; /* type of service */
short ip_len; /* total length */
u_short ip_id; /* identification */
short ip_off; /* fragment offset field */
u_char ip_ttl; /* time to live */
u_char ip_p; /* protocol */
u_short ip_sum; /* checksum */
struct in_addr ip_src,ip_dst; /* source and dest address */
};

To "parse" the header, you'd do something like:

 strict ip *hdr = (struct ip *)buf;
 printf("checksum = 0x%04x\n", ntohs(ip->ip_sum));

This is considered the "elegant" way of doing things, because there is no "parsing" at all. On big-endian CPUs, it's also a no-op -- it costs precisely zero instructions in order to "parse" the header, since both the internal and external structures map exactly.

In C, though, the exact format of structures in undefined. There is often padding between structure members to keep integers aligned on natural boundaries. Therefore, compilers have directives to declare a structure as "packed" to get rid of such padding, this strictly defining the internal structure to match the external structure.

But this is the wrong wrong wrong way of doing it. Just because it's possible in C doesn't mean it's a good idea.

Some people think it's faster. It's not really faster. Even low-end ARM CPUs are super fast these days, multiple issue with deep pipelines. What determines their speed is more often things like branch mispredictions and long chain dependencies. The number of instructions is almost an afterthought. Therefore, the difference in performance between the "zero overhead" mapping of a structure on top of external data, versus parsing a byte at a time, is almost immeasurable.

On the other hand, there is the cost in "correctness". The C language does not define the result of casting an integer as shown in the above examples. As wags have pointed out, instead of returning the expected two-byte number, acceptable behavior is to erase the entire hard disk.

In the real world, undefined code has lead to compiler problems as they try to optimize around issues. Sometimes important lines of code are removed from a program because the compiler strictly interprets the rules of the C language standard. Using undefined behavior in C truly produces undefined results -- quite at odds from what the programmer expected.

The result of parsing a byte at a time is defined. The result of casting integers and structures is not. Therefore, that practice should be avoided. It confuses compilers. It confuses static and dynamic analyzers that try to verify the correctness of code.

Moreover, there is the practical matter that casting such things confuses programmers. Programmers understand parsing external formats fairly well, but mixing internal/external endianess causes endless confusion. It causes no end to buggy code. It causes no end to ugly code. I read a lot of open-source code. Code that parses integers the right way is consistently much easier to read than code that uses macros like ntohs(). I've seen code where the poor confused programmer keeps swapping integers back and forth, not understanding what's going on, and simply adding another byte-swap whenever the input to the function was in the wrong order.

Conclusion

There is the right way to teach endianess: it's a parser issue, dealing with external data formats/protocols. You deal with in in C/C++ the same way as in JavaScript or C# or any other language.

Then there is wrong way to teach endianess, that it's a CPU issue in C/C++, that you intermingle internal and external structures together, that you swap bytes. This has caused no end of trouble over the years.

Those teaching endianess need to stop the old way and adopt the new way.




Bonus: alignment

The thing is that casting integers has never been a good solution. Back in the 1980s and the first RISC processors, like SPARC, integers had to be aligned on even byte boundaries or the program would crash. Formats and protocols would be defined to keep these things aligned most of the time. But every so often, a odd file would misalign things, and the program would mysteriously crash with a "bus" error.

Thankfully, this nonsense has disappeared, but even today a lot of processors have performance problems with unaligned data. In other words, casting a structure on top of data appears to cost zero CPU instructions, but this ignore the often considerable effort it took to align all the integers before this step was reached.




Bonus: sockets

The API for network programming is "sockets". In some cases, you have to use the ntohs() family of macros. For example, when binding to a port, you execute code like the following:

 sin.sin_port = htons(port);

You do this not because the API defines it this way, not because you are parsing data.

Some programmers make the mistake of keeping the byte-swapped versions of IP addresses and port numbers throughout their code. This is wrong. Their code should keep these in the correct format, and only passed through these byte-swapping macros on the Interface to the sockets layer.





Sunday, November 06, 2016

Yes, the FBI can review 650,000 emails in 8 days

In today's news, Comey announces the FBI have reviewed all 650,000 emails found on Anthony Wiener's computer and determined there's nothing new. Some have questioned whether this could be done in 8 days. Of course it could be -- those were 650,000 emails to Wiener, not Hillary.




Reading Wiener's own emails, those unrelated to his wife Huma or Hillary, is unlikely to be productive. Therefore, the FBI is going to filter those 650,000 Wiener emails to get at those emails that were also sent to/from Hillary and Huma.

That's easy for automated tools to do. Just search the From: and To: fields for email addresses known to be used by Hillary and associates. For example, search for hdr29@hrcoffice.com (Hillary's current email address) and ha16@hillaryclinton.com (Huma Abedin's current email).

Below is an example email header from the Podesta dump:

From: Jennifer Palmieri <jpalmieri@hillaryclinton.com>
Date: Sat, 2 May 2015 11:23:56 -0400
Message-ID: <-8018289478115811964@unknownmsgid>
Subject: WJC NBC interview
To: H <hdr29@hrcoffice.com>, John Podesta <john.podesta@gmail.com>, 
 Huma Abedin <ha16@hillaryclinton.com>, Robby Mook <re47@hillaryclinton.com>, 
 Kristina Schake <kschake@hillaryclinton.com>

This is likely to filter down the emails to a manageable few thousand.

Next, filter the emails for ones already in the FBI's possession. The easiest way is using the Message-ID: header. It's a random value created for every email. If a Weiner email has the same Message-ID as an email already retrieved from Huma and Hillary, then the FBI can ignore it.

This is then like to reduce the number of emails need for review to less than a thousand, or less than 100, or even all the way down to zero. And indeed, that's what NBC news is reporting:




The point is is this. Computer geeks have tools that make searching the emails extremely easy. Given those emails, and a list of known email accounts from Hillary and associates, and a list of other search terms, it would take me only a few hours to do reduce the workload from 650,000 emails to only a couple hundred, which a single person can read in less than a day.

The question isn't whether the FBI could review all those emails in 8 days, but why the FBI couldn't have reviewed them all in one or two days. Or even why they couldn't have reviewed them before Comey made that horrendous announcement that they were reviewing the emails.





Thursday, November 03, 2016

In which I have to debunk a second time

So Slate is doubling-down on their discredited story of a secret Trump server. Tip for journalists: if you are going to argue against an expert debunking your story, try to contact that expert first, so they don't have to do what I'm going to do here, showing obvious flaws. Also, pay attention to the data.


The experts didn't find anything

The story claims:
"I spoke with many DNS experts. They found the evidence strongly suggestive of a relationship between the Trump Organization and the bank".
No, he didn't. He gave experts limited information and asked them whether it's consistent with a conspiracy theory. He didn't ask if it was "suggestive" of the conspiracy theory, or that this was the best theory that fit the data.

This is why "experts" quoted in the press need to go through "media training", to avoid getting your reputation harmed by bad journalists who try their best to put words in your mouth. You'll be trained to recognize bad journalists like this, and how not to get sucked into their fabrications.


Jean Camp isn't an expert

On the other hand, Jean Camp isn't an expert. I've never heard of her before. She gets details wrong. Take for example in this blogpost where she discusses lookups for the domain mail.trump-email.com.moscow.alfaintra.net. She says:
This query is unusual in that is merges two hostnames into one. It makes the most sense as a human error in inserting a new hostname in some dialog window, but neglected to hit the backspace to delete the old hostname.
Uh, no. It's normal DNS behavior with non-FQDNs. If the lookup for a name fails, computers will try again, pasting the local domain on the end. In other words, when Twitter's DNS was taken offline by the DDoS attack a couple weeks ago, those monitoring DNS saw a zillion lookups for names like "www.twitter.com.example.com".

I've reproduced this on my desktop by configuring the suffix moscow.alfaintra.net.



I then pinged "mail1.trump-email.com" and captured the packets. As you can see, after the initial lookups fail, Windows tried appending the suffix.



I don't know what Jean Camp is an expert of, but this is sorta a basic DNS concept. It's surprising she'd get it wrong. Of course, she may be an expert in DNS who simply had a brain fart (this happens to all of us), but looking across her posts and tweets, she doesn't seem to be somebody who has a lot of experience with DNS. Sorry for impugning her credibility, but that's the way the story is written. It demands that we trust the quoted "experts". 

Call up your own IT department at Slate. Ask your IT nerds if this is how DNS operates. Note: I'm saying your average, unremarkable IT nerds can debunk an "expert" you quote in your story.

Understanding "spam" and "blacklists"

The new article has a paragraph noting that the IP address doesn't appear on spam blocklists:
Was the server sending spam—unsolicited mail—as opposed to legitimate commercial marketing? There are databases that assiduously and comprehensively catalog spam. I entered the internet protocal address for mail1.trump-email.com to check if it ever showed up in Spamhaus and DNSBL.info. There were no traces of the IP address ever delivering spam.
This is a profound misunderstanding of how these things work.

Colloquially, we call those sending mass marketing emails, like Cendyn, "spammers". But those running blocklists have a narrower definition. If  emails contain an option to "opt-out" of future emails, then it's technically not "spam".

Cendyn is constantly getting added to blocklists when people complain. They spend considerable effort contacting the many organizations maintaining blocklists, proving they do "opt-outs", and getting "white-listed" instead of "black-listed". Indeed, the entire spam-blacklisting industry is a bit of scam -- getting white-listed often involves a bit of cash.

Those maintaining blacklists only go back a few months. The article is in error saying there's no record ever of Cendyn sending spam. Instead, if an address comes up clean, it means there's no record for the past few months. And, if Cendyn is in the white-lists, there would be no record of "spam" at all, anyway.

As somebody who frequently scans the entire Internet, I'm constantly getting on/off blacklists. It's a real pain. At the moment, my scanner address "209.126.230.71" doesn't appear to be on any blacklists. Next time a scan kicks off, it'll probably get added -- but only by a few, because most have white-listed it.


There is no IP address limitation

The story repeats the theory, which I already debunked, that the server has a weird configuration that limits who can talk to it:
The scientists theorized that the Trump and Alfa Bank servers had a secretive relationship after testing the behavior of mail1.trump-email.com using sites like Pingability. When they attempted to ping the site, they received the message “521 lvpmta14.lstrk.net does not accept mail from you.”
No, that's how Listrake (who is the one who actually controls the server) configures all their marketing servers. Anybody can confirm this themselves by ping all the servers in this range:


In case you don't want to do scans yourself, you can look up on Shodan and see that there's at least 4000 servers around the Internet who give the same error message.


Again, go back to Chris Davis in your original story ask him about this. He'll confirm that there's nothing nefarious or weird going on here, that it's just how Listrak has decided to configure all it's spam-sending engines.

Either this conspiracy goes much deeper, with hundreds of servers involved, or this is a meaningless datapoint.


Where did the DNS logs come from?

Tea Leaves and Jean Camp are showing logs of private communications. Where did these logs come from? This information isn't public. It means somebody has done something like hack into Alfa Bank. Or it means researchers who monitor DNS (for maintaing DNS, and for doing malware research) have broken their NDAs and possibly the law.

The data is incomplete and inconsistent. Those who work for other companies, like Dyn, claim it doesn't match their own data. We have good reason to doubt these logs. There's a good chance that the source doesn't have as comprehensive a view as "Tea Leaves" claim. There's also a good chance the data has been manipulated.

Specifically, I have as source who claims records for trump-email.com were changed in June, meaning either my source or Tea Leaves is lying.

Until we know more about the source of the data, it's impossible to believe the conclusions that only Alfa Bank was doing DNS lookups.

By the way, if you are a company like Alfa Bank, and you don't want the "research" community from seeing leaked intranet DNS requests, then you should probably reconfigure your DNS resolvers. You'll want to look into RFC7816 "query minimization", supported by the Unbound and Knot resolvers.


Do the graphs show interesting things?

The original "Tea Leaves" researchers are clearly acting in bad faith. They are trying to twist the data to match their conclusions. For example, in the original article, they claim that peaks in the DNS activity match campaign events. But looking at the graph, it's clear these are unrelated. It display the common cognitive bias of seeing patterns that aren't there.

Likewise, they claim that the timing throughout the day matches what you'd expect from humans interacting back and forth between Moscow and New York. No. This is what the activity looks like, graphing the number of queries by hour:

As you can see, there's no pattern. When workers go home at 5pm in New York City, it's midnight in Moscow. If humans were involved, you'd expect an eight hour lull during that time. Likewise, when workers arrive at 9am in New York City, you expect a spike in traffic for about an hour until workers in Moscow go home. You see none of that here. What you instead see is a random distribution throughout the day -- the sort of distribution you'd expect if this were DNS lookups from incoming spam.

The point is that we know the original "Tea Leaves" researchers aren't trustworthy, that they've convinced themselves of things that just aren't there.


Does Trump control the server in question?

OMG, this post asks the question, after I've debunked the original story, and still gotten the answer wrong.

The answer is that Listrak controls the server. Not even Cendyn controls it, really, they just contract services from Listrak. In other words, not only does Trump not control it, the next level company (Cendyn) also doesn't control it.


Does Trump control the domain in question?

OMG, this new story continues to make the claim the Trump Organization controls the domain trump-email.com, despite my debunking that Cendyn controls the domain.

Look at the WHOIS info yourself. All the contact info goes to Cendyn. It fits the pattern Cendyn chooses for their campaigns.
  • trump-email.com
  • mjh-email.com
  • denihan-email.com
  • hyatt-email.com

Cendyn even spells "Trump Orgainzation" wrong.


There's a difference between a "server" and a "name"

The article continues to make trivial technical errors, like confusing what a server is with what a domain name is. For example:
One of the intriguing facts in my original piece was that the Trump server was shut down on Sept. 23, two days after the New York Times made inquiries to Alfa Bank
The server has never been shutdown. Instead, the name "mail1.trump-email.com" was removed from Cendyn's DNS servers.

It's impossible to debunk everything in these stories because they garble the technical details so much that it's impossible to know what the heck they are claiming.


Why did Cendyn change things after Alfa Bank was notified?

It's a curious coincidence that Cendyn changed their DNS records a couple days after the NYTimes contacted Alfa Bank.

But "coincidence" is all it is. I have years of experience with investigating data breaches. I know that such coincidences abound. There's always weird coincidence that you are certain are meaningful, but which by the end of the investigation just aren't.

The biggest source of coincidences is that IT is always changing things and always messing things up. It's the nature of IT. Thus, you'll always see a change in IT that matches some other event. Those looking for conspiracies ignore the changes that don't match, and focus on the one that does, so it looms suspiciously.

As I've mentioned before, I have source that says Cendyn changed things around in June. This makes me believe that "Tea Leaves" is editing changes to highlight the one in September.

In any event, many people have noticed that the registrar email "Emily McMullin" has the same last name as Evan McMullin running against Trump in Utah. This supports my point: when you do hacking investigations, you find irrelevant connections all over the freakin' place.


"Experts stand by their analysis"

This new article states:
I’ve checked back with eight of the nine computer scientists and engineers I consulted for my original story, and they all stood by their fundamental analysis
Well, of course, they don't want to look like idiots. But notice the subtle rephrasing of the question: the experts stand by their analysis. It doesn't mean the same thing as standing behind the reporters analysis. The experts made narrow judgements, which even I stand behind as mostly correct, given the data they were given at the time. None of them were asked whether the entire conspiracy theory holds up.

What you should ask is people like Chris Davis or Paul Vixie whether they stand behind my analysis in the past two posts. Or really, ask any expert. I've documented things in sufficient clarity. For example, go back to Chris Davis and ask him again about the "limited IP address" theory, and whether it holds up against my scan of that data center above.


Conclusion

Other major news outlets all passed on the story, because even non experts know it's flawed. The data means nothing. The Slate journalist nonetheless went forward with the story, tricking experts, and finding some non-experts.

But as I've shown, given a complete technical analysis, the story falls apart. Most of what's strange is perfectly normal. The data itself (the DNS logs) are untrustworthy. It builds upon unknown things (like how the mail server rejects IP address) as "unknowable" things that confirm the conspiracy, when they are in fact simply things unknown at the current time, which can become knowable with a little research.

What I show in my first post, and this post, is more data. This data shows context. This data explains the unknowns that Slate present. Moreover, you don't have to trust me -- anybody can replicate my work and see for themselves.