Wednesday, December 07, 2016

Orin's flawed argument on IP address privacy

In the PlayPen cases, judges have ruled that if you use the Tor network, then you don't have a reasonable expectation of privacy. It's a silly demonstration of how the law is out of sync with reality, since the entire point of using Tor is privacy.

Law prof Orin Kerr has a post discussing it. His conclusion is correct, that when the FBI exploits 0day and runs malware on your computer, then it's a search under the Fourth Amendment, requiring a warrant upon probable cause.

However, his reasoning is partly flawed. The title of his piece, "Remotely accessing an IP address inside a target computer is a search", is factually wrong. The IP address in question is not inside a target computer. This may be meaningful.

First, let's discuss how the judge reasons that there's no expectation of privacy with Tor. This is a straightforward application if the Third Party Doctrine, that as soon as you give something to a third party, your privacy rights are lost. Since you give your IP address to Tor, you lose privacy rights over it. You don't have a reasonable expectation of privacy: yes, you have an expectation of privacy, but it's not a reasonable one, and thus it's not protected.

The same is true of all your other digital information. Your credit card receipts, phone metadata, email archive, and all the rest of things you want to keep private on the Internet are not (currently) covered by the Fourth Amendment.

If you are thinking this is bullcrap, then you'd be right. Everyone knows the Third Party Doctrine doesn't fit the Internet. We want these things to be private from the government, meaning, that they must get a warrant to access them. But it's going to take a clueful Supreme Court overturning past precedence or an armed revolution in order to change things.

But that doesn't necessarily fit this case.  As Orin Kerr's post points out:
Fourth Amendment law regulates how the government learns information, not what information it learns
In other words, it doesn't matter if the FBI is allowed to get your IP address, they still need a warrant to search your computer. If you've got public information in your house, the FBI still needs a warrant to enter your house in order to get it.

Where Orin's argument is flawed is the fact that the IP address isn't on the computer being searched by the FBI's "NIT" malware. In other cases, the FBI will be able to discover a target's IP address without a search of their computer. His post would be better entitled something like "Infecting with malware is always a search" instead.

The way the Internet works is that computers have a local IP address that's meaningful only on the local network (like the one inside your home). For example, my laptop currently has the address This may, in fact, be the same address as your laptop. That's because the addresses starting with 192,168.x.x is extremely popular for home networks (along with 10.x.x.x). It's like how we both can have the address 1079 Elm Str, just in different cities, since every city has "Elm Street" somewhere.

As data leaves your computer, the local address is translated (network address translation) into a public IP address. Google "what's my ip address", and it will tell you your public IP address. Google knows it, but your computer doesn't.

Instead, it's your home router that knows your public IP address, using your public IP on the Internet and local IP on your home network.

This Cisco router knows my public IP address
It can get even more complicated. When I travel, I use my iPhone as a wifi hotspot. But my iPhone is given a local IP address within the cellphone company's network. This address is shared with hundreds of other cellphone customers. Thus, it's AT&T's routers which knows my public IP address, neither my phone nor my laptop knows it.

Phone doesn't know public IP, only local 10.x.x.x local IP

In the PlayPen case, the FBI discovers the target's public IP address by causing it to transmit information to the FBI. This information goes through the network address translator, and when it arrives on the FBI server, has the public IP address associated with it. In other words, the point where it's discovered is on the FBI's server located in Quantico, not within the "NIT" malware running on the person's computer. The malware on the computer does not "access" the IP address in any fashion --- but by generating traffic from inside the home, it causes the IP address to be revealed outside the home.

Rather than using malware to infect a computer, the FBI might try other ways to discover a suspect's IP address. They might host a PDF or Word document on the server that has a simple image tag pointing to the FBI's server. When the user opens the document, their Acrobat/Word program isn't protected by Tor. There computer will then contact the FBI's server looking for the image, revealing their public IP address. In this example, no exploit or malware is being used. In fact, Tor warns users about this problem. The target is willfully revealing their public IP address purely because they are unaware of the meaning of their actions.

If this were how the FBI were discovering the IP address, rather than using malware, then the judge's reasoning would (probably) be correct. Since the FBI relied upon user stupidity rather than malware, no search was done.

I'd like to see Orin update his post. Either to clarify, contrary to what his title says, that what he really means is "Running malware on a target is always a search". Or conversely, describe how this "image tag" example is, despite my feelings, a search.

As a wholly separate note, I'd like to point out a different flaw in the judge's reasoning. Yes, the entry Tor node knows your IP address, but it doesn't know it belongs to you or is associated with your traffic. Yes, the exit Tor knows your traffic, but it doesn't know your IP address.

Technically, both your traffic and IP address are public (according to the Third Party Doctrine), but the private bit is the fact that the two are related. The "Tor network" isn't a single entity, but a protocol for how various different entities work together. No single entity in the Tor network sees your IP address combined with your activity or identity. Even when the FBI and NSA themselves run Tor nodes, they still can't piece it together. It is a private piece of information.

In other words, the 4 digit PIN number for your ATM card is located in this document, so it's a public number. But which PIN belongs to you is still a secret.

Thus, the judge is wrong. The private information is not the public IP address. The private information is the public IP address combined with the traffic. The person isn't trying to keep their public IP address private, what they are trying to keep private is the fact that this IP address access the PlayPen servers.


This is a stupid post, because it doesn't disagree with Orin's conclusion: FBI running malware always needs a warrant, even if the information they are after is public. However, the technical details are wrong -- the IP address the FBI is after is located nowhere inside the computer they are searching.

Monday, December 05, 2016

That "Commission on Enhancing Cybersecurity" is absurd

An Obama commission has publish a report on how to "Enhance Cybersecurity". It's promoted as having been written by neutral, bipartisan, technical experts. Instead, it's almost entirely dominated by special interests and the Democrat politics of the outgoing administration.

In this post, I'm going through a random list of some of the 53 "action items" proposed by the documents. I show how they are policy issues, not technical issues. Indeed, much of the time the technical details are warped to conform to special interests.

IoT passwords

The recommendations include such things as Action Item 2.1.4:
Initial best practices should include requirements to mandate that IoT devices be rendered unusable until users first change default usernames and passwords. 
This recommendation for changing default passwords is repeated many times. It comes from the way the Mirai worm exploits devices by using hardcoded/default passwords.

But this is a misunderstanding of how these devices work. Take, for example, the infamous Xiongmai camera. It has user accounts on the web server to control the camera. If the user forgets the password, the camera can be reset to factory defaults by pressing a button on the outside of the camera.

But here's the deal with security cameras. They are placed at remote sites miles away, up on the second story where people can't mess with them. In order to reset them, you need to put a ladder in your truck and drive 30 minutes out to the site, then climb the ladder (an inherently dangerous activity). Therefore, Xiongmai provides a RESET.EXE utility for remotely resetting them. That utility happens to connect via Telnet using a hardcoded password.

The above report misunderstands what's going on here. It sees Telnet and a hardcoded password, and makes assumptions. Some people assume that this is the normal user account -- it's not, it's unrelated to the user accounts on the web server portion of the device. Requiring the user to change the password on the web service would have no effect on the Telnet service. Other people assume the Telnet service is accidental, that good security hygiene would remove it. Instead, it's an intended feature of the product, to remotely reset the device. Fixing the "password" issue as described in the above recommendations would simply mean the manufacturer would create a different, custom backdoor that hackers would eventually reverse engineer, creating MiraiV2 botnet. Instead of security guides banning backdoors, they need to come up with standard for remote reset.

That characterization of Mirai as an IoT botnet is wrong. Mirai is a botnet of security cameras. Security cameras are fundamentally different from IoT devices like toasters and fridges because they are often exposed to the public Internet. To stream video on your phone from your security camera, you need a port open on the Internet. Non-camera IoT devices, however, are overwhelmingly protected by a firewall, with no exposure to the public Internet. While you can create a botnet of Internet cameras, you cannot create a botnet of Internet toasters.

The point I'm trying to demonstrate here is that the above report was written by policy folks with little grasp of the technical details of what's going on. They use Mirai to justify several of their "Action Items", none of which actually apply to the technical details of Mirai. It has little to do with IoT, passwords, or hygiene.

Public-private partnerships
Action Item 1.2.1: The President should create, through executive order, the National Cybersecurity Private–Public Program (NCP 3 ) as a forum for addressing cybersecurity issues through a high-level, joint public–private collaboration.
We've had public-private partnerships to secure cyberspace for over 20 years, such as the FBI InfraGuard partnership. President Clinton's had a plan in 1998 to create a public-private partnership to address cyber vulnerabilities. President Bush declared public-private partnerships the "cornerstone of his 2003 plan to secure cyberspace.

Here we are 20 years later, and this document is full of new naive proposals for public-private partnerships There's no analysis of why they have failed in the past, or a discussion of which ones have succeeded.

The many calls for public-private programs reflects the left-wing nature of this supposed "bipartisan" document, that sees government as a paternalistic entity that can help. The right-wing doesn't believe the government provides any value in these partnerships. In my 20 years of experience with government private-partnerships in cybersecurity, I've found them to be a time waster at best and at worst, a way to coerce "voluntary measures" out of companies that hurt the public's interest.

Build a wall and make China pay for it
Action Item 1.3.1: The next Administration should require that all Internet-based federal government services provided directly to citizens require the use of appropriately strong authentication.
This would cost at least $100 per person, for 300 million people, or $30 billion. In other words, it'll cost more than Trump's wall with Mexico.

Hardware tokens are cheap. Blizzard (a popular gaming company) must deal with widespread account hacking from "gold sellers", and provides second factor authentication to its gamers for $6 each. But that ignores the enormous support costs involved. How does a person prove their identity to the government in order to get such a token? To replace a lost token? When old tokens break? What happens if somebody's token is stolen?

And that's the best case scenario. Other options, like using cellphones as a second factor, are non-starters.

This is actually not a bad recommendation, as far as government services are involved, but it ignores the costs and difficulties involved.

But then the recommendations go on to suggest this for private sector as well:
Specifically, private-sector organizations, including top online retailers, large health insurers, social media companies, and major financial institutions, should use strong authentication solutions as the default for major online applications.
No, no, no. There is no reason for a "top online retailer" to know your identity. I lie about my identity. thinks my name is "Edward Williams", for example.

They get worse with:
Action Item 1.3.3: The government should serve as a source to validate identity attributes to address online identity challenges.
In other words, they are advocating a cyber-dystopic police-state wet-dream where the government controls everyone's identity. We already see how this fails with Facebook's "real name" policy, where everyone from political activists in other countries to LGBTQ in this country get harassed for revealing their real names.

Anonymity and pseudonymity are precious rights on the Internet that we now enjoy -- rights endangered by the radical policies in this document. This document frequently claims to promote security "while protecting privacy". But the government doesn't protect privacy -- much of what we want from cybersecurity is to protect our privacy from government intrusion. This is nothing new, you've heard this privacy debate before. What I'm trying to show here is that the one-side view of privacy in this document demonstrates how it's dominated by special interests.

Cybersecurity Framework
Action Item 1.4.2: All federal agencies should be required to use the Cybersecurity Framework. 
The "Cybersecurity Framework" is a bunch of a nonsense that would require another long blogpost to debunk. It requires months of training and years of experience to understand. It contains things like "DE.CM-4: Malicious code is detected", as if that's a thing organizations are able to do.

All the while it ignores the most common cyber attacks (SQL/web injections, phishing, password reuse, DDoS). It's a typical example where organizations spend enormous amounts of money following process while getting no closer to solving what the processes are attempting to solve. Federal agencies using the Cybersecurity Framework are no safer from my pentests than those who don't use it.

It gets even crazier:
Action Item 1.5.1: The National Institute of Standards and Technology (NIST) should expand its support of SMBs in using the Cybersecurity Framework and should assess its cost-effectiveness specifically for SMBs.
Small businesses can't even afford to even read the "Cybersecurity Framework". Simply reading the doc, trying to understand it, would exceed their entire IT/computer budget for the year. It would take a high-priced consultant earning $500/hour to tell them that "DE.CM-4: Malicious code is detected" means "buy antivirus and keep it up to date".

Software liability is a hoax invented by the Chinese to make our IoT less competitive
Action Item 2.1.3: The Department of Justice should lead an interagency study with the Departments of Commerce and Homeland Security and work with the Federal Trade Commission, the Consumer Product Safety Commission, and interested private sector parties to assess the current state of the law with regard to liability for harm caused by faulty IoT devices and provide recommendations within 180 days. 
For over a decade, leftists in the cybersecurity industry have been pushing the concept of "software liability". Every time there is a major new development in hacking, such as the worms around 2003, they come out with documents explaining why there's a "market failure" and that we need liability to punish companies to fix the problem. Then the problem is fixed, without software liability, and the leftists wait for some new development to push the theory yet again.

It's especially absurd for the IoT marketspace. The harm, as they imagine, is DDoS. But the majority of devices in Mirai were sold by non-US companies to non-US customers. There's no way US regulations can stop that.

What US regulations will stop is IoT innovation in the United States. Regulations are so burdensome, and liability lawsuits so punishing, that it will kill all innovation within the United States. If you want to get rich with a clever IoT Kickstarter project, forget about it: you entire development budget will go to cybersecurity. The only companies that will be able to afford to ship IoT products in the United States will be large industrial concerns like GE that can afford the overhead of regulation/liability.

Liability is a left-wing policy issue, not one supported by technical analysis. Software liability has proven to be immaterial in any past problem and current proponents are distorting the IoT market to promote it now.

Cybersecurity workforce
Action Item 4.1.1: The next President should initiate a national cybersecurity workforce program to train 100,000 new cybersecurity practitioners by 2020. 
The problem in our industry isn't the lack of "cybersecurity practitioners", but the overabundance of "insecurity practitioners".

Take "SQL injection" as an example. It's been the most common way hackers break into websites for 15 years. It happens because programmers, those building web-apps, blinding paste input into SQL queries. They do that because they've been trained to do it that way. All the textbooks on how to build webapps teach them this. All the examples show them this.

So you have government programs on one hand pushing tech education, teaching kids to build web-apps with SQL injection. Then you propose to train a second group of people to fix the broken stuff the first group produced.

The solution to SQL/website injections is not more practitioners, but stopping programmers from creating the problems in the first place. The solution to phishing is to use the tools already built into Windows and networks that sysadmins use, not adding new products/practitioners. These are the two most common problems, and they happen not because of a lack of cybersecurity practitioners, but because the lack of cybersecurity as part of normal IT/computers.

I point this to demonstrate yet against that the document was written by policy people with little or no technical understanding of the problem.

Nutritional label
Action Item 3.1.1: To improve consumers’ purchasing decisions, an independent organization should develop the equivalent of a cybersecurity “nutritional label” for technology products and services—ideally linked to a rating system of understandable, impartial, third-party assessment that consumers will intuitively trust and understand. 
This can't be done. Grab some IoT devices, like my thermostat, my car, or a Xiongmai security camera used in the Mirai botnet. These devices are so complex that no "nutritional label" can be made from them.

One of the things you'd like to know is all the software dependencies, so that if there's a bug in OpenSSL, for example, then you know your device is vulnerable. Unfortunately, that requires a nutritional label with 10,000 items on it.

Or, one thing you'd want to know is that the device has no backdoor passwords. But that would miss the Xiongmai devices. The web service has no backdoor passwords. If you caught the Telnet backdoor password and removed it, then you'd miss the special secret backdoor that hackers would later reverse engineer.

This is a policy position chasing a non-existent technical issue push by Pieter Zatko, who has gotten hundreds of thousands of dollars from government grants to push the issue. It's his way of getting rich and has nothing to do with sound policy.

Cyberczars and ambassadors

Various recommendations call for the appointment of various CISOs, Assistant to the President for Cybersecurity, and an Ambassador for Cybersecurity. But nowhere does it mention these should be technical posts. This is like appointing a Surgeon General who is not a doctor.

Government's problems with cybersecurity stems from the way technical knowledge is so disrespected. The current cyberczar prides himself on his lack of technical knowledge, because that helps him see the bigger picture.

Ironically, many of the other Action Items are about training cybersecurity practitioners, employees, and managers. None of this can happen as long as leadership is clueless. Technical details matter, as I show above with the Mirai botnet. Subtlety and nuance in technical details can call for opposite policy responses.


This document is promoted as being written by technical experts. However, nothing in the document is neutral technical expertise. Instead, it's almost entirely a policy document dominated by special interests and left-wing politics. In many places it makes recommendations to the incoming Republican president. His response should be to round-file it immediately.

I only chose a few items, as this blogpost is long enough as it is. I could pick almost any of of the 53 Action Items to demonstrate how they are policy, special-interest driven rather than reflecting technical expertise.

Thursday, December 01, 2016

Electoral college should ignore Lessig

Reading this exchange between law profs disappoints me. [1] [2] [3] [4] [5]

The decision Bush v Gore cites the same principle as Lessig, that our system is based on "one person one vote". But it uses that argument to explain why votes should not be changed once they are cast:
Having once granted the right to vote on equal terms, the State may not, by later arbitrary and disparate treatment, value one person's vote over that of another.
Lessig cites the principle of "one person one vote", but in a new and novel way. He applies in an arbitrary way that devalues some of the votes that have already been cast. Specifically, he claims that votes cast for state electors should now be re-valued as direct votes for a candidate.

The United States isn't a union of people. It's a union of states. It says so right in the name. Compromises between the power of the states and power of the people have been with us for forever. That's why states get two Senators regardless of size, but Representatives to the House are assigned proportional to population. The Presidential election is expressly a related compromise, assigning the number of electors to a state equal to the number of Senators plus Representatives.

The Constitution doesn't even say electors should be chosen using a vote. It's up to the states to decide. All states have chosen election, but they could've demanded a wrestling match or juggling contest instead. The point is that the Constitution, historical papers, and 200 years of history rejects Lessig's idea that the President should be elected with a popular vote.

Moreover, this election shows the value of election by states. The tension nowadays is between big urban areas and rural areas. In the city, when workers lose their jobs due to immigration or trade, they can go down the street and get another job. In a rural area, when the factory shuts down, the town is devastated, and there are no other jobs to be had. The benefits of free trade are such that even Trump can't roll them back -- but as a nation we need to address the disproportionate impact changes have on rural communities. That rural communities can defend their interests is exactly why our Constitution is the way it is -- and why the President isn't chosen with a popular vote.

Hillary did not win the popular vote. No popular vote was held. Instead, we had state-by-state votes for electors. It's implausible that the per-candidate votes would have been the same had this been a popular vote. Candidates would have spent their time and money campaigning across the entire country instead of just battleground states. Voters would have different motivations on which candidates to choose and on whether they should abstain. There is nothing more clearly "disparate and arbitrary" treatment of votes than claiming a your vote for an elector  (or abstention) will now instead be treated as a national vote for the candidate.

Hillary got only 48% of the vote, what we call a plurality. Counting abstentions, that's only 26% of the vote. The rules of the Electoral College demands the winner get an absolute majority, meaning 50% even with abstentions, or almost double what Hillary got in votes. So among the arbitrary rules that Lessig has pulled out of his hat is that a plurality is now sufficient. Even though 74% of voters did not vote for her, Lessig uses the principle of "one person one vote" means she is the unambiguous choice of the people.

Even if you accept all this, there is still the problem that our election system isn't accurate. As Bush v Gore noted, around 2% of ballots nationwide didn't clearly show a choice between a presidential candidate. Others have pointed to weather in different parts of the country as having a significant impact on voter turnout. In science, we call this a measurement error. It means that any vote within 2% is scientifically a tie. That's more than the difference between Hillary and Trump. Yes, elections must still choose a winner despite a tie. However, an Electoral College evaluating the "sense of the people" (as Lessig cites Federalist #68) is bound by no such limitation. That they see no clear winner among the popular vote is best view to take -- not that Hillary won some sort of mandate.

My point isn't to show that Lessig is wrong so much to show that his argument is arbitrary. Had the positions been reversed, with Hillary getting the electoral vote and Trump the popular, Lessig could cite the same principle of "one person one vote" and the same "Federalist #68" in order to demonstrate why the Electoral College should still choose Hillary. In other words, Lessig would argue that the principle means (as in Bush v Gore) that Hillary's electors not devalue the votes cast for them by treating them as popular vote. Lessig would argue that since Trump didn't get a statistically significant absolute majority, then there was no clear "sense of the people".

America is in danger of populism, which ravages our institutions that make our country prosperous, stable, and "great". Trump is populist on the right, but Lessig is a populist on the left. Lessig ran for the presidency on the left on a platform no less populist than Trump's. This current piece, demanding we follow arbitrary rules to get desired results, is no less an attack on the institution of the "Rule of Law" and "Equal Protection" than Trump's attacks.

What "should" the Electoral College do? Whatever the heck they want. I would point out that Federalist #68 does warn about the influence of "foreign powers" and of men using the "little arts of popularity" to gain the Presidency. This matches Trump accurately. I would hope that at least some Trump-electors consider this and change their votes. Historically, that we haven't seen more electors change their votes seems to be a bit of a problem.

Sunday, November 27, 2016

No, it’s Matt Novak who is a fucking idiot

I keep seeing this Gizmodo piece entitled “Snowden is a fucking idiot”. I understand the appeal of the piece. The hero worship of Edward Snowden is getting old. But the piece itself is garbage.

The author, Matt Novak, is of the new wave of hard-core leftists intolerant of those who disagree with them. His position is that everyone is an idiot who doesn’t agree with his views: Libertarians, Republicans, moderate voters who chose Trump, and even fellow left-wingers that aren’t as hard-core.

If you carefully read his piece, you’ll see that Novak doesn’t actually prove Snowden is wrong. Novak doesn’t show how Snowden disagrees with facts, but only how Snowden disagrees with the left-wing view of the world, "libertarian garbage" as Novak puts it. It’s only through deduction that we come to the conclusion: those who aren’t left-wing are idiots, Snowden is not left-wing, therefore Snowden is an idiot.

The question under debate in the piece is:
technology is more important than policy as a way to protect our liberties
In other words, if you don’t want the government spying on you, then focus on using encryption (use Signal) rather than trying to change the laws so they can’t spy on you.

On a factual basis (rather than political), Snowden is right. If you live in Germany and don’t want the NSA spying on you there is little policy-wise that you can do about it, short of convincing Germany to go to war against the United States to get the US to stop spying.

Likewise, for all those dissenters in countries with repressive regimes, technology precedes policy. You can’t effect change until you first can protect yourselves from the state police who throws you in jail for dissenting. Use Signal.

In our own country, Snowden is right about “politics”. Snowden’s leak showed how the NSA was collecting everyone’s phone records to stop terrorism. Privacy organizations like the EFF supported the reform bill, the USA FREEDOM ACT. But rather than stopping the practice, the “reform” opened up the phone records to all law enforcement (FBI, DEA, ATF, IRS, etc.) for normal law enforcement purposes.

Imagine the protestors out there opposing the Dakota Access Pipeline. The FBI is shooting down their drones and blasting them with water cannons. Now, because of the efforts of the EFF and other privacy activists, using the USA FREEDOM ACT, the FBI is also grabbing everyone’s phone records in the area. Ask yourself who is the fucking idiot here: the guy telling you to use Signal, or the guy telling you to focus on “politics” to stop this surveillance.

Novak repeats the hard-left version of the creation of the Internet:
The internet has always been monitored by the state. It was created by the fucking US military and has been monitored from day one. Surveillance of the internet wasn’t invented after September 11, 2001, no matter how many people would like to believe that to be the case.
No, the Internet was not created by the US military. Sure, the military contributed to the Internet, but the majority of contributions came from corporations, universities, and researchers. The left-wing claim that the government/military created the Internet involves highlighting their contributions while ignoring everyone else’s.

The Internet was not “monitored from day one”, because until the 1990s, it wasn’t even an important enough network to monitor. As late as 1993, the Internet was dwarfed in size and importance by numerous other computer networks – until the web took off that year, the Internet was considered a temporary research project. Those like Novak writing the history of the Internet are astonishingly ignorant of the competing networks of those years. They miss XNS, AppleTalk, GOSIP, SNA, Novel, DECnet, Bitnet, Uunet, Fidonet, X.25, Telenet, and all the other things that were really important during those years.

And, mass Internet surveillance did indeed come only after 9/11. The NSA’s focus before that was on signals and telephone lines, because that’s where all the information was.  When 9/11 happened, they were still trying to catch up to the recent growth of the Internet. Virtually everything Snowden documents came after 9/11. Sure, they had programs like FAIRVIEW that were originally created to get telephone information in the 1970s, but these programs only started delivering mass Internet information after 9/11. Sure, the NSA occasionally got emails before 9/11, but nothing like the enormous increase in collection afterwards.

What I’ve shown here is that Matt Novak is a fucking idiot. He gets basic facts wrong about how the Internet works. He doesn’t prove Snowden’s actually wrong by citing evidence, only that Snowden is wrong because he disagrees with what leftists like Novak believe to be right. All the actual evidence supports Snowden in this case.

It doesn't mean we should avoid politics. Technology and politics are different things, it's not either-or. Whether we do one has no impact on deciding to do the other. But if you are a DAP protester, use Signal instead of unencrypted messaging or phone, instead of waiting for activists to pass legislation.

Monday, November 21, 2016

The false-false-balance problem

Until recently, journalism in America prided itself on objectivity -- to report the truth, without taking sides. That's because big debates are always complexed and nuanced, and that both sides are equally reasonable. Therefore, when writing an article, reporters attempt to achieve balance by quoting people/experts/proponents on both sides of an issue.

But what about those times when one side is clearly unreasonable? You'd never try to achieve balance by citing those who believe in aliens and big-foot, for example.Thus, journalists have come up with the theory of false-balance to justify being partisan and one-sided on certain issues.

Typical examples where journalists cite false-balance is reporting on anti-vaxxers, climate-change denialists, and Creationists. More recently, false-balance has become an issue in the 2016 Trump election.

But this concept of false-balance is wrong. It's not that anti-vaxxers, denialists, Creationists, and white supremacists are reasonable. Instead, the issue is that the left-wing has reframed the debate. They've simplified it into something black-and-white, removing nuance, in a way that shows their opponents as being unreasonable. The media then adopts the reframed debate.

Let's talk anti-vaxxers. One of the policy debates is whether the government has the power to force vaccinations on people (or on people's children). Reasonable people say the government doesn't have this power. Many (if not most) people hold this opinion while agreeing that vaccines are both safe and effective (that they don't cause autism).

Consider this February 2015 interview with Chris Christy. He's one of the few politicians who have taken the position that government can override personal choice, such as in the case of an outbreak. Yet, when he said "parents need to have some measure of choice in things as well, so that's the balance that the government has to decide", he was broadly reviled as an anti-vaxxer throughout the media. The press reviled other Republican candidates the same way, even while ignoring almost identical statements made at the same time by the Obama administration. They also ignored clearly anti-vax comments from both Hillary and Obama during the 2008 election.

Yes, we can all agree that anti-vaxxers are a bunch of crazy nutjobs. In calling for objectivity, we aren't saying that you should take them seriously. Instead, we are pointing out the obvious bias in the way the media attacked Republican candidates as being anti-vaxxers, and then hiding behind "false-balance".

Now let's talk evolution. The issue is this: Darwinism has been set up as some sort of competing religion against belief in God(s). High-schools teach children to believe in Darwinism, but not to understand Darwinism. Few kids graduate understanding Darwinism, which is why it's invariably misrepresented in mass-media (X-Men, Planet of the Apes, Waterworld, Godzilla, Jurassic Park, etc.). The only movie I can recall getting evolution correct is Idiocracy.

Also, evolution has holes in it. This isn't a bad thing in science, every scientific theory has holes. Science isn't a religion. We don't care about the holes. That some things remain unexplained by a theory doesn't bother us. Science has no problem with gaps in knowledge, where we admit "I don't know". It's religion that has "God of the gaps", where ignorance isn't tolerated, and everything unexplained is explained by a deity.

The hole in evolution is how the cell evolved. The fossil record teaches us a lot about multi-cellular organisms over the last 400-million years, but not much about how the cell evolved in the 4-billion years on planet Earth before that. I can point to radio isotope dating and fossil finds to prove dinosaurs existed 250,000 million to 60 million years ago, thus disproving your crazy theory of a 10,000 year-old Earth. But I can't point to anything that disagrees with your view that a deity created the original cellular organisms. I don't agree with that theory, but I can't disprove it, either.

The point is that Christians have a good point that Darwinism is taught as a competing religion. You see this in the way books that deny holes in knowledge, insisting that Darwinism explains even how cells evolved, and that doubting Darwin is blasphemy. 

The Creationist solution is wrong, we can't teach religion in schools. But they have a reasonable concern about religious Darwinism. The solution there is to do a better job teaching it as a science. If kids want to believe that one of the deities created the first cells, then that's okay, as long as they understand the fossil record and radioisotope dating.

Now let's talk Climate Change. This is a tough one, because you people have lost your collective minds. The debate is over how much change? how much danger? how much costs?. The debate is not over Is it true?. We all agree it's true, even most Republicans. By keeping the debate between the black-and-white "Is global warming true?", the left-wing can avoid the debate "How much warming?".

Consider this exchange from one of the primary debates:

Moderator: ...about climate change...
RUBIO: Because we’re not going to destroy our economy ...
Moderator: Governor Christie, ... what do you make of skeptics of climate change such as Senator Rubio?
CHRISTIE: I don’t think Senator Rubio is a skeptic of climate change.
RUBIO: I'm not a denier/skeptic of climate change.

The media (in this case CNN) is so convinced that Republican deny climate change that they can't hear any other statement. Rubio clearly didn't deny Climate Change, but the moderator was convinced that he did. Every statement is seen as outright denial, or code words for denial. Thus, convinced of the falseness of false-balance, the media never sees the fact that most Republicans are reasonable.

Similar proof of Republican non-denial is this page full of denialism quotes. If you actually look at the quotes, you'll see that when taken in context, virtually none of the statements deny climate change. For example, when Senator Dan Sulliven says "no concrete scientific consensus on the extent to which humans contribute to climate change", he is absolutely right. There is 97% consensus that mankind contributes to climate change, but there is widespread disagreement on how much.

That "97% consensus" is incredibly misleading. Whenever it's quoted, the speaker immediately moves the bar, claiming that scientists also agree with whatever crazy thing the speaker wants, like hurricanes getting worse (they haven't -- at least, not yet).

There's no inherent reason why Republicans would disagree with addressing Climate Change. For example, Washington State recently voted on a bill to impose a revenue neutral carbon tax. The important part is "revenue neutral": Republicans hate expanding government, but they don't oppose policies that keep government the same side. Democrats opposed this bill, precisely because it didn't expand the size of government. That proves that Democrats are less concerned with a bipartisan approach to addressing climate change, but instead simply use it as a wedge issue to promote their agenda of increased regulation and increased spending. 

If you are serious about address Climate Change, then agree that Republicans aren't deniers, and then look for bipartisan solutions.


The point here is not to try to convince you of any political opinion. The point here is to describe how the press has lost objectivity by adopting the left-wing's reframing of the debate. Instead of seeing balanced debate between two reasonable sides, they see a warped debate between a reasonable (left-wing) side and an unreasonable (right-wing) side. That the opposing side is unreasonable is so incredible seductive they can never give it up.

That Christie had to correct the moderator in the debate should teach you that something is rotten in journalism. Christie understood Rubio's remarks, but the debate moderator could not. Journalists cannot even see the climate debate because they are wedded to the left-wing's corrupt view of the debate.

The issue of false-balance is wrong. In debates that evenly divide the population, the issues are complex and nuanced, both sides are reasonable. That's the law. It doesn't matter what the debate is. If you see the debate simplified to the point where one side is obviously unreasonable, then it's you who has a problem.

Dinner with Rajneeshees

One evening I answered the doorbell to find a burgundy clad couple on the doorstep. They were followers of the Bagwan Shree Rajneesh, whose cult had recently purchased a large ranch in the eastern part of the state. No, they weren't there to convert us. They had come for dinner. My father had invited them.

My father was a journalist, who had been covering the controversies with the cult's neighbors. Yes, they were a crazy cult which later would breakup after committing acts of domestic terrorism.  But this couple was a pair of young professionals (lawyers) who, except for their clothing, looked and behaved like normal people. They would go on to live normal lives after the cult.

Growing up, I lived in two worlds. One was the normal world, which encourages you to demonize those who disagree with you. On the political issues that concern you most, you divide the world into the righteous and the villains. It's not enough to believe the other side wrong, you most also believe them to be evil.

The other world was that of my father, teaching me to see the other side of the argument. I guess I grew up with my own Atticus Finch (from To Kill a Mockingbird), who set an ideal. In much the same way that Atticus told his children that they couldn't hate even Hitler, I was told I couldn't hate even the crazy Rajneeshees.

Monday, November 14, 2016

Comments for my biracial niece

I spent the night after Trump’s victory consoling my biracial niece worried about the election. Here are my comments. You won’t like them, expecting the opposite given the title. But it’s what I said.

I preferred Hillary, but that doesn’t mean Trump is an evil choice.

Don’t give into the hate. You get most of your news via social media sites like Facebook and Twitter, which are at best one-sided and unfair. At worst, they are completely inaccurate. Social media posts are driven by emotion, not logic. Sometimes that emotion is love of cute puppies. Mostly it’s anger, fear, and hate. Instead of blindly accepting what you read, challenge it. Find the original source. Find a better explanation. Search for context.

Don’t give into the hate. The political issues that you are most concerned about are not simple and one-sided with obvious answers. They are complex and nuanced. Just because somebody disagrees with you doesn’t mean they are unreasonable or evil. In today’s politics, it has become the norm that we can’t simply disagree with somebody, but must also vilify and hate them. We’ve redefined politics to be the fight between the virtuous (whatever side we are on) and the villains (the other side). The reality is that both sides are equally reasonable, equally virtuous.

Don’t give into the hate. Learn “critical thinking”. Learn how “cherry picking” the fringe of the opposing side is used to tarnish the mainstream. Learn how “strawman arguments” makes the other side sound dumb. Learn how “appeal to emotion” replaces logic. Learn how “ad hominem” statements attack the credibility of opponent’s arguments. Learn how issues are simplified into “black vs. white” options rather than the nuance and complexity that actually exists.

Don’t give into the hate. The easy argument is that it’s okay to be hateful and bigoted toward Trump and his supporters because they are bigoted against you. No, it’s not okay to hate anybody, not even Hitler, as Atticus Finch explains in “To Kill A Mockingbird”. In that book, Atticus even tries to understand, and not hate, Robert Ewell, the racist antagonist in the book who eventually tries to stab Scout (Atticus’s daughter). Trump’s supporters may be wrong, but it’s a wrongness largely based on ignorance, not malice. Yes, they probably need to be kindly educated, but they don’t deserve punishment and hate.

America is the same country it was last week. It's citizens haven't changed, only one man in an office has changed. The President has little actual power, either to fix things (as his supporters want) or to break things (as his opponents fear). We have strong institutions, from Congress, to the Courts, to the military, that will hold him check. The biggest worries are that he's the first President in history with no government experience, and that he's strongly "populist" (which historically has been damaging for countries). We should be watchful, and more willing to stand up and fight when Trump does something bad. However, we shouldn't give into hate.

How to teach endian

On /r/programming is this post about byte-order/endianness. It gives the same information as most documents on the topic. It is wrong. It's been wrong for over 30 years. Here's how it should be taught.

One of the major disciplines in computer science is parsing/formatting. This is the process of converting the external format of data (file formats, network protocols, hardware registers) into the internal format (the data structures that software operates on).

It should be a formal computer-science discipline, because it's actually a lot more difficult than you'd expect. That's because the majority of vulnerabilities in software that hackers exploit are due to parsing bugs. Since programmers don't learn about parsing formally, they figure it out for themselves, creating ad hoc solutions that are prone to bugs. For example, programmers assume external buffers cannot be larger than internal ones, leading to buffer overflows.

An external format must be well-defined. What the first byte means must be written down somewhere, then what the second byte means, and so on. For Internet protocols, these formats are written in RFCs, such as RFC 791 for the "Internet Protocol". For file formats, these are written in documents, such as those describing GIF files, JPEG files, MPEG files, and so forth.

Among the issues is how integers should be represented. The definition must include the size, whether signed/unsigned, what the bits means (almost always 2s-compliment), and the byte-order. Integers that have values above 255 must be represented with more than one byte. Whether those bytes go left-to-right or right-to-left is known as byte-order.

We also called this endianness, where one form is big-endian and the other form is little-endian. This is a joke, referring back to Jonathan Swift's tale Gulliver's Travels, where two nations were at war arguing whether an egg should be cracked on the big end or the little end. The joke refers to the Holy Wars in computing where two sides argued strongly for one byte-order or the other. The commentary using the term "endianess" is that neither format matters.

However, big-endian is how humans naturally process numbers. If we have the hex value 0x2211, then we expect that representing this number in a file/protocol will consist of one byte with the value 0x22 followed by another byte with the value 0x11. In a little-endian format specification, however, the order of bytes will be reversed, with a value of 0x2211 represented with 0x11 followed by 0x22.

This is further confused by the fact that the nibbles in the byte will still be written in conventional, big-endian order. In other words, the big-endian format for the number 0x1234 is 0x12 0x34. however, the little-endian format is 0x34 0x12  -- not 0x43 0x21 as you might naively expect trying to swap everything around in your mind.

If little-endian is so confusing to the human mind, why would anybody ever use it? The answer is that it can be more efficient for logic circuits. Or at least, back in the 1970s, when CPUs had only a few thousand logic gates, it could be more efficient. Therefore, a lot of internal processing was little-endian, and this bled over into external formats as well.

On the other hand, most network protocols and file formats remain big-endian. Format specifications are written for humans to understand, and big-endian is easier for us humans.

So once you understand the byte-order issue in external formats, the next problem is figuring out how to parse it, to convert it into an internal data structure. Well, we first have to understand how to parse things in general.

There are two ways of parsing thing: buffered or streaming. In the buffered model, you read in the entire input first (like the entire file, or the entire network packet), then parse it. In the streaming mode, you read a byte at a time, parse that byte, then read in the next byte. Stream mode is best for very large files or for streaming data across TCP network connections.

However, buffered parsing is the general way most people do it, so I'll assume that in this guide.

Let's assume you've read in the file (or network data) into a buffer we'll call buf. Your parse that buffer at the current offset until you reach the end.

Given that, then the way you'd parse a big-endian integer x is the following line of code:

 x = buf[offset] * 256 + buf[offset+1];

Or, if you prefer logical operators, you might do it as:

 x = buf[offset]<<8 | buf[offset+1];

Compilers always translate multiplication by powers-of-2 into shift instructions, so either statement will perform the same. Some compilers are smart enough to recognize this pattern as parsing an integer, and might replace this with loading two bytes from memory and byte-swapping instead.

For a little-endian integer in the external data, you'd reverse how you parse this, like one of the following two statements.

x = buf[offset+1] * 256 + buf[offset];
x = buf[offset] + buf[offset+1] * 256;

If we were talking about JavaScript, C#, or a bunch of other languages, at this point the conversation about endianess would end. But if talking about C/C++, we've got some additional wrinkles to deal with.

The problem with C is that it's a low-level language. That means it exposes the internal format of integers to the programmer. In other words, the above code focuses on the external representation of integers, and doesn't care about the internal representation. It doesn't care if you are using an x86 little-endian CPU or some RISC big-endian CPU.

But in C, you can parse an integer by relying upon the internal CPU representation. It would look something like the following:

 x = *(short*)(buf + offset);

This code produces different results on a little-endian machine and a big-endian machine. If the two bytes are 0x22 and 0x11, then on a big-endian machine this produces a short integer with a value of 0x2211, but a little-endian machine produces the value of 0x1122.

If the external format is big-endian, then on a little-endian machine, you'll have to byte-swap the result. In other words, the code would look something like:

 x = *(short*)(buf + offset);
 x = (x >> 8) | ((x & 0xFF) << 8);

Of course, you'd never write code that looks like this. Instead, you'd use a macro, as follows:

 x = ntohs(*(short*)(buf + offset));

The macro means network-to-host-short, where network byte-order is big-endian, and host byte-order is undefined. On a little-endian host CPU, the bytes are swapped as shown above. On a big-endian CPU, the macro is defined as nothing. This macro is defined in standard sockets libraries, like <arpa/inet.h>. There are a broad range of similar macros in other libraries for byte swapping integers.

In truth, this is not how it's really done, parsing an individual integer at a time. Instead, what programmers do is define a packed C structure that corresponds to the external format they are trying to parse, then cast the buffer into that structure.

For example, in Linux is the include file <netinet/ip.h> which defines the Internet protocol header:

struct ip {
u_char ip_hl:4, /* header length */
ip_v:4; /* version */
u_char ip_v:4, /* version */
ip_hl:4; /* header length */
u_char ip_tos; /* type of service */
short ip_len; /* total length */
u_short ip_id; /* identification */
short ip_off; /* fragment offset field */
u_char ip_ttl; /* time to live */
u_char ip_p; /* protocol */
u_short ip_sum; /* checksum */
struct in_addr ip_src,ip_dst; /* source and dest address */

To "parse" the header, you'd do something like:

 strict ip *hdr = (struct ip *)buf;
 printf("checksum = 0x%04x\n", ntohs(ip->ip_sum));

This is considered the "elegant" way of doing things, because there is no "parsing" at all. On big-endian CPUs, it's also a no-op -- it costs precisely zero instructions in order to "parse" the header, since both the internal and external structures map exactly.

In C, though, the exact format of structures in undefined. There is often padding between structure members to keep integers aligned on natural boundaries. Therefore, compilers have directives to declare a structure as "packed" to get rid of such padding, this strictly defining the internal structure to match the external structure.

But this is the wrong wrong wrong way of doing it. Just because it's possible in C doesn't mean it's a good idea.

Some people think it's faster. It's not really faster. Even low-end ARM CPUs are super fast these days, multiple issue with deep pipelines. What determines their speed is more often things like branch mispredictions and long chain dependencies. The number of instructions is almost an afterthought. Therefore, the difference in performance between the "zero overhead" mapping of a structure on top of external data, versus parsing a byte at a time, is almost immeasurable.

On the other hand, there is the cost in "correctness". The C language does not define the result of casting an integer as shown in the above examples. As wags have pointed out, instead of returning the expected two-byte number, acceptable behavior is to erase the entire hard disk.

In the real world, undefined code has lead to compiler problems as they try to optimize around issues. Sometimes important lines of code are removed from a program because the compiler strictly interprets the rules of the C language standard. Using undefined behavior in C truly produces undefined results -- quite at odds from what the programmer expected.

The result of parsing a byte at a time is defined. The result of casting integers and structures is not. Therefore, that practice should be avoided. It confuses compilers. It confuses static and dynamic analyzers that try to verify the correctness of code.

Moreover, there is the practical matter that casting such things confuses programmers. Programmers understand parsing external formats fairly well, but mixing internal/external endianess causes endless confusion. It causes no end to buggy code. It causes no end to ugly code. I read a lot of open-source code. Code that parses integers the right way is consistently much easier to read than code that uses macros like ntohs(). I've seen code where the poor confused programmer keeps swapping integers back and forth, not understanding what's going on, and simply adding another byte-swap whenever the input to the function was in the wrong order.


There is the right way to teach endianess: it's a parser issue, dealing with external data formats/protocols. You deal with in in C/C++ the same way as in JavaScript or C# or any other language.

Then there is wrong way to teach endianess, that it's a CPU issue in C/C++, that you intermingle internal and external structures together, that you swap bytes. This has caused no end of trouble over the years.

Those teaching endianess need to stop the old way and adopt the new way.

Bonus: alignment

The thing is that casting integers has never been a good solution. Back in the 1980s and the first RISC processors, like SPARC, integers had to be aligned on even byte boundaries or the program would crash. Formats and protocols would be defined to keep these things aligned most of the time. But every so often, a odd file would misalign things, and the program would mysteriously crash with a "bus" error.

Thankfully, this nonsense has disappeared, but even today a lot of processors have performance problems with unaligned data. In other words, casting a structure on top of data appears to cost zero CPU instructions, but this ignore the often considerable effort it took to align all the integers before this step was reached.

Bonus: sockets

The API for network programming is "sockets". In some cases, you have to use the ntohs() family of macros. For example, when binding to a port, you execute code like the following:

 sin.sin_port = htons(port);

You do this not because the API defines it this way, not because you are parsing data.

Some programmers make the mistake of keeping the byte-swapped versions of IP addresses and port numbers throughout their code. This is wrong. Their code should keep these in the correct format, and only passed through these byte-swapping macros on the Interface to the sockets layer.

Sunday, November 06, 2016

Yes, the FBI can review 650,000 emails in 8 days

In today's news, Comey announces the FBI have reviewed all 650,000 emails found on Anthony Wiener's computer and determined there's nothing new. Some have questioned whether this could be done in 8 days. Of course it could be -- those were 650,000 emails to Wiener, not Hillary.

Reading Wiener's own emails, those unrelated to his wife Huma or Hillary, is unlikely to be productive. Therefore, the FBI is going to filter those 650,000 Wiener emails to get at those emails that were also sent to/from Hillary and Huma.

That's easy for automated tools to do. Just search the From: and To: fields for email addresses known to be used by Hillary and associates. For example, search for (Hillary's current email address) and (Huma Abedin's current email).

Below is an example email header from the Podesta dump:

From: Jennifer Palmieri <>
Date: Sat, 2 May 2015 11:23:56 -0400
Message-ID: <-8018289478115811964@unknownmsgid>
Subject: WJC NBC interview
To: H <>, John Podesta <>, 
 Huma Abedin <>, Robby Mook <>, 
 Kristina Schake <>

This is likely to filter down the emails to a manageable few thousand.

Next, filter the emails for ones already in the FBI's possession. The easiest way is using the Message-ID: header. It's a random value created for every email. If a Weiner email has the same Message-ID as an email already retrieved from Huma and Hillary, then the FBI can ignore it.

This is then like to reduce the number of emails need for review to less than a thousand, or less than 100, or even all the way down to zero. And indeed, that's what NBC news is reporting:

The point is is this. Computer geeks have tools that make searching the emails extremely easy. Given those emails, and a list of known email accounts from Hillary and associates, and a list of other search terms, it would take me only a few hours to do reduce the workload from 650,000 emails to only a couple hundred, which a single person can read in less than a day.

The question isn't whether the FBI could review all those emails in 8 days, but why the FBI couldn't have reviewed them all in one or two days. Or even why they couldn't have reviewed them before Comey made that horrendous announcement that they were reviewing the emails.