Sunday, November 27, 2016

No, it’s Matt Novak who is a fucking idiot

I keep seeing this Gizmodo piece entitled “Snowden is a fucking idiot”. I understand the appeal of the piece. The hero worship of Edward Snowden is getting old. But the piece itself is garbage.

The author, Matt Novak, is of the new wave of hard-core leftists intolerant of those who disagree with them. His position is that everyone is an idiot who doesn’t agree with his views: Libertarians, Republicans, moderate voters who chose Trump, and even fellow left-wingers that aren’t as hard-core.

If you carefully read his piece, you’ll see that Novak doesn’t actually prove Snowden is wrong. Novak doesn’t show how Snowden disagrees with facts, but only how Snowden disagrees with the left-wing view of the world, "libertarian garbage" as Novak puts it. It’s only through deduction that we come to the conclusion: those who aren’t left-wing are idiots, Snowden is not left-wing, therefore Snowden is an idiot.

The question under debate in the piece is:
technology is more important than policy as a way to protect our liberties
In other words, if you don’t want the government spying on you, then focus on using encryption (use Signal) rather than trying to change the laws so they can’t spy on you.

On a factual basis (rather than political), Snowden is right. If you live in Germany and don’t want the NSA spying on you there is little policy-wise that you can do about it, short of convincing Germany to go to war against the United States to get the US to stop spying.

Likewise, for all those dissenters in countries with repressive regimes, technology precedes policy. You can’t effect change until you first can protect yourselves from the state police who throws you in jail for dissenting. Use Signal.

In our own country, Snowden is right about “politics”. Snowden’s leak showed how the NSA was collecting everyone’s phone records to stop terrorism. Privacy organizations like the EFF supported the reform bill, the USA FREEDOM ACT. But rather than stopping the practice, the “reform” opened up the phone records to all law enforcement (FBI, DEA, ATF, IRS, etc.) for normal law enforcement purposes.

Imagine the protestors out there opposing the Dakota Access Pipeline. The FBI is shooting down their drones and blasting them with water cannons. Now, because of the efforts of the EFF and other privacy activists, using the USA FREEDOM ACT, the FBI is also grabbing everyone’s phone records in the area. Ask yourself who is the fucking idiot here: the guy telling you to use Signal, or the guy telling you to focus on “politics” to stop this surveillance.

Novak repeats the hard-left version of the creation of the Internet:
The internet has always been monitored by the state. It was created by the fucking US military and has been monitored from day one. Surveillance of the internet wasn’t invented after September 11, 2001, no matter how many people would like to believe that to be the case.
No, the Internet was not created by the US military. Sure, the military contributed to the Internet, but the majority of contributions came from corporations, universities, and researchers. The left-wing claim that the government/military created the Internet involves highlighting their contributions while ignoring everyone else’s.

The Internet was not “monitored from day one”, because until the 1990s, it wasn’t even an important enough network to monitor. As late as 1993, the Internet was dwarfed in size and importance by numerous other computer networks – until the web took off that year, the Internet was considered a temporary research project. Those like Novak writing the history of the Internet are astonishingly ignorant of the competing networks of those years. They miss XNS, AppleTalk, GOSIP, SNA, Novel, DECnet, Bitnet, Uunet, Fidonet, X.25, Telenet, and all the other things that were really important during those years.

And, mass Internet surveillance did indeed come only after 9/11. The NSA’s focus before that was on signals and telephone lines, because that’s where all the information was.  When 9/11 happened, they were still trying to catch up to the recent growth of the Internet. Virtually everything Snowden documents came after 9/11. Sure, they had programs like FAIRVIEW that were originally created to get telephone information in the 1970s, but these programs only started delivering mass Internet information after 9/11. Sure, the NSA occasionally got emails before 9/11, but nothing like the enormous increase in collection afterwards.

What I’ve shown here is that Matt Novak is a fucking idiot. He gets basic facts wrong about how the Internet works. He doesn’t prove Snowden’s actually wrong by citing evidence, only that Snowden is wrong because he disagrees with what leftists like Novak believe to be right. All the actual evidence supports Snowden in this case.

It doesn't mean we should avoid politics. Technology and politics are different things, it's not either-or. Whether we do one has no impact on deciding to do the other. But if you are a DAP protester, use Signal instead of unencrypted messaging or phone, instead of waiting for activists to pass legislation.

Monday, November 21, 2016

The false-false-balance problem

Until recently, journalism in America prided itself on objectivity -- to report the truth, without taking sides. That's because big debates are always complexed and nuanced, and that both sides are equally reasonable. Therefore, when writing an article, reporters attempt to achieve balance by quoting people/experts/proponents on both sides of an issue.

But what about those times when one side is clearly unreasonable? You'd never try to achieve balance by citing those who believe in aliens and big-foot, for example.Thus, journalists have come up with the theory of false-balance to justify being partisan and one-sided on certain issues.

Typical examples where journalists cite false-balance is reporting on anti-vaxxers, climate-change denialists, and Creationists. More recently, false-balance has become an issue in the 2016 Trump election.

But this concept of false-balance is wrong. It's not that anti-vaxxers, denialists, Creationists, and white supremacists are reasonable. Instead, the issue is that the left-wing has reframed the debate. They've simplified it into something black-and-white, removing nuance, in a way that shows their opponents as being unreasonable. The media then adopts the reframed debate.


Let's talk anti-vaxxers. One of the policy debates is whether the government has the power to force vaccinations on people (or on people's children). Reasonable people say the government doesn't have this power. Many (if not most) people hold this opinion while agreeing that vaccines are both safe and effective (that they don't cause autism).

Consider this February 2015 interview with Chris Christy. He's one of the few politicians who have taken the position that government can override personal choice, such as in the case of an outbreak. Yet, when he said "parents need to have some measure of choice in things as well, so that's the balance that the government has to decide", he was broadly reviled as an anti-vaxxer throughout the media. The press reviled other Republican candidates the same way, even while ignoring almost identical statements made at the same time by the Obama administration. They also ignored clearly anti-vax comments from both Hillary and Obama during the 2008 election.

Yes, we can all agree that anti-vaxxers are a bunch of crazy nutjobs. In calling for objectivity, we aren't saying that you should take them seriously. Instead, we are pointing out the obvious bias in the way the media attacked Republican candidates as being anti-vaxxers, and then hiding behind "false-balance".


Now let's talk evolution. The issue is this: Darwinism has been set up as some sort of competing religion against belief in God(s). High-schools teach children to believe in Darwinism, but not to understand Darwinism. Few kids graduate understanding Darwinism, which is why it's invariably misrepresented in mass-media (X-Men, Planet of the Apes, Waterworld, Godzilla, Jurassic Park, etc.). The only movie I can recall getting evolution correct is Idiocracy.

Also, evolution has holes in it. This isn't a bad thing in science, every scientific theory has holes. Science isn't a religion. We don't care about the holes. That some things remain unexplained by a theory doesn't bother us. Science has no problem with gaps in knowledge, where we admit "I don't know". It's religion that has "God of the gaps", where ignorance isn't tolerated, and everything unexplained is explained by a deity.

The hole in evolution is how the cell evolved. The fossil record teaches us a lot about multi-cellular organisms over the last 400-million years, but not much about how the cell evolved in the 4-billion years on planet Earth before that. I can point to radio isotope dating and fossil finds to prove dinosaurs existed 250,000 million to 60 million years ago, thus disproving your crazy theory of a 10,000 year-old Earth. But I can't point to anything that disagrees with your view that a deity created the original cellular organisms. I don't agree with that theory, but I can't disprove it, either.

The point is that Christians have a good point that Darwinism is taught as a competing religion. You see this in the way books that deny holes in knowledge, insisting that Darwinism explains even how cells evolved, and that doubting Darwin is blasphemy. 

The Creationist solution is wrong, we can't teach religion in schools. But they have a reasonable concern about religious Darwinism. The solution there is to do a better job teaching it as a science. If kids want to believe that one of the deities created the first cells, then that's okay, as long as they understand the fossil record and radioisotope dating.


Now let's talk Climate Change. This is a tough one, because you people have lost your collective minds. The debate is over how much change? how much danger? how much costs?. The debate is not over Is it true?. We all agree it's true, even most Republicans. By keeping the debate between the black-and-white "Is global warming true?", the left-wing can avoid the debate "How much warming?".

Consider this exchange from one of the primary debates:

Moderator: ...about climate change...
RUBIO: Because we’re not going to destroy our economy ...
Moderator: Governor Christie, ... what do you make of skeptics of climate change such as Senator Rubio?
CHRISTIE: I don’t think Senator Rubio is a skeptic of climate change.
RUBIO: I'm not a denier/skeptic of climate change.

The media (in this case CNN) is so convinced that Republican deny climate change that they can't hear any other statement. Rubio clearly didn't deny Climate Change, but the moderator was convinced that he did. Every statement is seen as outright denial, or code words for denial. Thus, convinced of the falseness of false-balance, the media never sees the fact that most Republicans are reasonable.

Similar proof of Republican non-denial is this page full of denialism quotes. If you actually look at the quotes, you'll see that when taken in context, virtually none of the statements deny climate change. For example, when Senator Dan Sulliven says "no concrete scientific consensus on the extent to which humans contribute to climate change", he is absolutely right. There is 97% consensus that mankind contributes to climate change, but there is widespread disagreement on how much.

That "97% consensus" is incredibly misleading. Whenever it's quoted, the speaker immediately moves the bar, claiming that scientists also agree with whatever crazy thing the speaker wants, like hurricanes getting worse (they haven't -- at least, not yet).

There's no inherent reason why Republicans would disagree with addressing Climate Change. For example, Washington State recently voted on a bill to impose a revenue neutral carbon tax. The important part is "revenue neutral": Republicans hate expanding government, but they don't oppose policies that keep government the same side. Democrats opposed this bill, precisely because it didn't expand the size of government. That proves that Democrats are less concerned with a bipartisan approach to addressing climate change, but instead simply use it as a wedge issue to promote their agenda of increased regulation and increased spending. 

If you are serious about address Climate Change, then agree that Republicans aren't deniers, and then look for bipartisan solutions.


Conclusion

The point here is not to try to convince you of any political opinion. The point here is to describe how the press has lost objectivity by adopting the left-wing's reframing of the debate. Instead of seeing balanced debate between two reasonable sides, they see a warped debate between a reasonable (left-wing) side and an unreasonable (right-wing) side. That the opposing side is unreasonable is so incredible seductive they can never give it up.

That Christie had to correct the moderator in the debate should teach you that something is rotten in journalism. Christie understood Rubio's remarks, but the debate moderator could not. Journalists cannot even see the climate debate because they are wedded to the left-wing's corrupt view of the debate.

The issue of false-balance is wrong. In debates that evenly divide the population, the issues are complex and nuanced, both sides are reasonable. That's the law. It doesn't matter what the debate is. If you see the debate simplified to the point where one side is obviously unreasonable, then it's you who has a problem.



Dinner with Rajneeshees

One evening I answered the doorbell to find a burgundy clad couple on the doorstep. They were followers of the Bagwan Shree Rajneesh, whose cult had recently purchased a large ranch in the eastern part of the state. No, they weren't there to convert us. They had come for dinner. My father had invited them.

My father was a journalist, who had been covering the controversies with the cult's neighbors. Yes, they were a crazy cult which later would breakup after committing acts of domestic terrorism.  But this couple was a pair of young professionals (lawyers) who, except for their clothing, looked and behaved like normal people. They would go on to live normal lives after the cult.

Growing up, I lived in two worlds. One was the normal world, which encourages you to demonize those who disagree with you. On the political issues that concern you most, you divide the world into the righteous and the villains. It's not enough to believe the other side wrong, you most also believe them to be evil.

The other world was that of my father, teaching me to see the other side of the argument. I guess I grew up with my own Atticus Finch (from To Kill a Mockingbird), who set an ideal. In much the same way that Atticus told his children that they couldn't hate even Hitler, I was told I couldn't hate even the crazy Rajneeshees.

Monday, November 14, 2016

Comments for my biracial niece

I spent the night after Trump’s victory consoling my biracial niece worried about the election. Here are my comments. You won’t like them, expecting the opposite given the title. But it’s what I said.


I preferred Hillary, but that doesn’t mean Trump is an evil choice.

Don’t give into the hate. You get most of your news via social media sites like Facebook and Twitter, which are at best one-sided and unfair. At worst, they are completely inaccurate. Social media posts are driven by emotion, not logic. Sometimes that emotion is love of cute puppies. Mostly it’s anger, fear, and hate. Instead of blindly accepting what you read, challenge it. Find the original source. Find a better explanation. Search for context.

Don’t give into the hate. The political issues that you are most concerned about are not simple and one-sided with obvious answers. They are complex and nuanced. Just because somebody disagrees with you doesn’t mean they are unreasonable or evil. In today’s politics, it has become the norm that we can’t simply disagree with somebody, but must also vilify and hate them. We’ve redefined politics to be the fight between the virtuous (whatever side we are on) and the villains (the other side). The reality is that both sides are equally reasonable, equally virtuous.

Don’t give into the hate. Learn “critical thinking”. Learn how “cherry picking” the fringe of the opposing side is used to tarnish the mainstream. Learn how “strawman arguments” makes the other side sound dumb. Learn how “appeal to emotion” replaces logic. Learn how “ad hominem” statements attack the credibility of opponent’s arguments. Learn how issues are simplified into “black vs. white” options rather than the nuance and complexity that actually exists.

Don’t give into the hate. The easy argument is that it’s okay to be hateful and bigoted toward Trump and his supporters because they are bigoted against you. No, it’s not okay to hate anybody, not even Hitler, as Atticus Finch explains in “To Kill A Mockingbird”. In that book, Atticus even tries to understand, and not hate, Robert Ewell, the racist antagonist in the book who eventually tries to stab Scout (Atticus’s daughter). Trump’s supporters may be wrong, but it’s a wrongness largely based on ignorance, not malice. Yes, they probably need to be kindly educated, but they don’t deserve punishment and hate.

America is the same country it was last week. It's citizens haven't changed, only one man in an office has changed. The President has little actual power, either to fix things (as his supporters want) or to break things (as his opponents fear). We have strong institutions, from Congress, to the Courts, to the military, that will hold him check. The biggest worries are that he's the first President in history with no government experience, and that he's strongly "populist" (which historically has been damaging for countries). We should be watchful, and more willing to stand up and fight when Trump does something bad. However, we shouldn't give into hate.

How to teach endian

On /r/programming is this post about byte-order/endianness. It gives the same information as most documents on the topic. It is wrong. It's been wrong for over 30 years. Here's how it should be taught.

One of the major disciplines in computer science is parsing/formatting. This is the process of converting the external format of data (file formats, network protocols, hardware registers) into the internal format (the data structures that software operates on).

It should be a formal computer-science discipline, because it's actually a lot more difficult than you'd expect. That's because the majority of vulnerabilities in software that hackers exploit are due to parsing bugs. Since programmers don't learn about parsing formally, they figure it out for themselves, creating ad hoc solutions that are prone to bugs. For example, programmers assume external buffers cannot be larger than internal ones, leading to buffer overflows.

An external format must be well-defined. What the first byte means must be written down somewhere, then what the second byte means, and so on. For Internet protocols, these formats are written in RFCs, such as RFC 791 for the "Internet Protocol". For file formats, these are written in documents, such as those describing GIF files, JPEG files, MPEG files, and so forth.

Among the issues is how integers should be represented. The definition must include the size, whether signed/unsigned, what the bits means (almost always 2s-compliment), and the byte-order. Integers that have values above 255 must be represented with more than one byte. Whether those bytes go left-to-right or right-to-left is known as byte-order.

We also called this endianness, where one form is big-endian and the other form is little-endian. This is a joke, referring back to Jonathan Swift's tale Gulliver's Travels, where two nations were at war arguing whether an egg should be cracked on the big end or the little end. The joke refers to the Holy Wars in computing where two sides argued strongly for one byte-order or the other. The commentary using the term "endianess" is that neither format matters.

However, big-endian is how humans naturally process numbers. If we have the hex value 0x2211, then we expect that representing this number in a file/protocol will consist of one byte with the value 0x22 followed by another byte with the value 0x11. In a little-endian format specification, however, the order of bytes will be reversed, with a value of 0x2211 represented with 0x11 followed by 0x22.

This is further confused by the fact that the nibbles in the byte will still be written in conventional, big-endian order. In other words, the big-endian format for the number 0x1234 is 0x12 0x34. however, the little-endian format is 0x34 0x12  -- not 0x43 0x21 as you might naively expect trying to swap everything around in your mind.

If little-endian is so confusing to the human mind, why would anybody ever use it? The answer is that it can be more efficient for logic circuits. Or at least, back in the 1970s, when CPUs had only a few thousand logic gates, it could be more efficient. Therefore, a lot of internal processing was little-endian, and this bled over into external formats as well.

On the other hand, most network protocols and file formats remain big-endian. Format specifications are written for humans to understand, and big-endian is easier for us humans.


So once you understand the byte-order issue in external formats, the next problem is figuring out how to parse it, to convert it into an internal data structure. Well, we first have to understand how to parse things in general.

There are two ways of parsing thing: buffered or streaming. In the buffered model, you read in the entire input first (like the entire file, or the entire network packet), then parse it. In the streaming mode, you read a byte at a time, parse that byte, then read in the next byte. Stream mode is best for very large files or for streaming data across TCP network connections.

However, buffered parsing is the general way most people do it, so I'll assume that in this guide.

Let's assume you've read in the file (or network data) into a buffer we'll call buf. Your parse that buffer at the current offset until you reach the end.

Given that, then the way you'd parse a big-endian integer x is the following line of code:

 x = buf[offset] * 256 + buf[offset+1];

Or, if you prefer logical operators, you might do it as:

 x = buf[offset]<<8 | buf[offset+1];

Compilers always translate multiplication by powers-of-2 into shift instructions, so either statement will perform the same. Some compilers are smart enough to recognize this pattern as parsing an integer, and might replace this with loading two bytes from memory and byte-swapping instead.

For a little-endian integer in the external data, you'd reverse how you parse this, like one of the following two statements.

x = buf[offset+1] * 256 + buf[offset];
x = buf[offset] + buf[offset+1] * 256;

If we were talking about JavaScript, C#, or a bunch of other languages, at this point the conversation about endianess would end. But if talking about C/C++, we've got some additional wrinkles to deal with.


The problem with C is that it's a low-level language. That means it exposes the internal format of integers to the programmer. In other words, the above code focuses on the external representation of integers, and doesn't care about the internal representation. It doesn't care if you are using an x86 little-endian CPU or some RISC big-endian CPU.

But in C, you can parse an integer by relying upon the internal CPU representation. It would look something like the following:

 x = *(short*)(buf + offset);

This code produces different results on a little-endian machine and a big-endian machine. If the two bytes are 0x22 and 0x11, then on a big-endian machine this produces a short integer with a value of 0x2211, but a little-endian machine produces the value of 0x1122.

If the external format is big-endian, then on a little-endian machine, you'll have to byte-swap the result. In other words, the code would look something like:

 x = *(short*)(buf + offset);
 #ifdef LITTLE_ENDIAN
 x = (x >> 8) | ((x & 0xFF) << 8);
 #endif

Of course, you'd never write code that looks like this. Instead, you'd use a macro, as follows:

 x = ntohs(*(short*)(buf + offset));

The macro means network-to-host-short, where network byte-order is big-endian, and host byte-order is undefined. On a little-endian host CPU, the bytes are swapped as shown above. On a big-endian CPU, the macro is defined as nothing. This macro is defined in standard sockets libraries, like <arpa/inet.h>. There are a broad range of similar macros in other libraries for byte swapping integers.

In truth, this is not how it's really done, parsing an individual integer at a time. Instead, what programmers do is define a packed C structure that corresponds to the external format they are trying to parse, then cast the buffer into that structure.

For example, in Linux is the include file <netinet/ip.h> which defines the Internet protocol header:

struct ip {
#if BYTE_ORDER == LITTLE_ENDIAN 
u_char ip_hl:4, /* header length */
ip_v:4; /* version */
#endif
#if BYTE_ORDER == BIG_ENDIAN 
u_char ip_v:4, /* version */
ip_hl:4; /* header length */
#endif
u_char ip_tos; /* type of service */
short ip_len; /* total length */
u_short ip_id; /* identification */
short ip_off; /* fragment offset field */
u_char ip_ttl; /* time to live */
u_char ip_p; /* protocol */
u_short ip_sum; /* checksum */
struct in_addr ip_src,ip_dst; /* source and dest address */
};

To "parse" the header, you'd do something like:

 strict ip *hdr = (struct ip *)buf;
 printf("checksum = 0x%04x\n", ntohs(ip->ip_sum));

This is considered the "elegant" way of doing things, because there is no "parsing" at all. On big-endian CPUs, it's also a no-op -- it costs precisely zero instructions in order to "parse" the header, since both the internal and external structures map exactly.

In C, though, the exact format of structures in undefined. There is often padding between structure members to keep integers aligned on natural boundaries. Therefore, compilers have directives to declare a structure as "packed" to get rid of such padding, this strictly defining the internal structure to match the external structure.

But this is the wrong wrong wrong way of doing it. Just because it's possible in C doesn't mean it's a good idea.

Some people think it's faster. It's not really faster. Even low-end ARM CPUs are super fast these days, multiple issue with deep pipelines. What determines their speed is more often things like branch mispredictions and long chain dependencies. The number of instructions is almost an afterthought. Therefore, the difference in performance between the "zero overhead" mapping of a structure on top of external data, versus parsing a byte at a time, is almost immeasurable.

On the other hand, there is the cost in "correctness". The C language does not define the result of casting an integer as shown in the above examples. As wags have pointed out, instead of returning the expected two-byte number, acceptable behavior is to erase the entire hard disk.

In the real world, undefined code has lead to compiler problems as they try to optimize around issues. Sometimes important lines of code are removed from a program because the compiler strictly interprets the rules of the C language standard. Using undefined behavior in C truly produces undefined results -- quite at odds from what the programmer expected.

The result of parsing a byte at a time is defined. The result of casting integers and structures is not. Therefore, that practice should be avoided. It confuses compilers. It confuses static and dynamic analyzers that try to verify the correctness of code.

Moreover, there is the practical matter that casting such things confuses programmers. Programmers understand parsing external formats fairly well, but mixing internal/external endianess causes endless confusion. It causes no end to buggy code. It causes no end to ugly code. I read a lot of open-source code. Code that parses integers the right way is consistently much easier to read than code that uses macros like ntohs(). I've seen code where the poor confused programmer keeps swapping integers back and forth, not understanding what's going on, and simply adding another byte-swap whenever the input to the function was in the wrong order.

Conclusion

There is the right way to teach endianess: it's a parser issue, dealing with external data formats/protocols. You deal with in in C/C++ the same way as in JavaScript or C# or any other language.

Then there is wrong way to teach endianess, that it's a CPU issue in C/C++, that you intermingle internal and external structures together, that you swap bytes. This has caused no end of trouble over the years.

Those teaching endianess need to stop the old way and adopt the new way.




Bonus: alignment

The thing is that casting integers has never been a good solution. Back in the 1980s and the first RISC processors, like SPARC, integers had to be aligned on even byte boundaries or the program would crash. Formats and protocols would be defined to keep these things aligned most of the time. But every so often, a odd file would misalign things, and the program would mysteriously crash with a "bus" error.

Thankfully, this nonsense has disappeared, but even today a lot of processors have performance problems with unaligned data. In other words, casting a structure on top of data appears to cost zero CPU instructions, but this ignore the often considerable effort it took to align all the integers before this step was reached.




Bonus: sockets

The API for network programming is "sockets". In some cases, you have to use the ntohs() family of macros. For example, when binding to a port, you execute code like the following:

 sin.sin_port = htons(port);

You do this not because the API defines it this way, not because you are parsing data.

Some programmers make the mistake of keeping the byte-swapped versions of IP addresses and port numbers throughout their code. This is wrong. Their code should keep these in the correct format, and only passed through these byte-swapping macros on the Interface to the sockets layer.





Sunday, November 06, 2016

Yes, the FBI can review 650,000 emails in 8 days

In today's news, Comey announces the FBI have reviewed all 650,000 emails found on Anthony Wiener's computer and determined there's nothing new. Some have questioned whether this could be done in 8 days. Of course it could be -- those were 650,000 emails to Wiener, not Hillary.




Reading Wiener's own emails, those unrelated to his wife Huma or Hillary, is unlikely to be productive. Therefore, the FBI is going to filter those 650,000 Wiener emails to get at those emails that were also sent to/from Hillary and Huma.

That's easy for automated tools to do. Just search the From: and To: fields for email addresses known to be used by Hillary and associates. For example, search for hdr29@hrcoffice.com (Hillary's current email address) and ha16@hillaryclinton.com (Huma Abedin's current email).

Below is an example email header from the Podesta dump:

From: Jennifer Palmieri <jpalmieri@hillaryclinton.com>
Date: Sat, 2 May 2015 11:23:56 -0400
Message-ID: <-8018289478115811964@unknownmsgid>
Subject: WJC NBC interview
To: H <hdr29@hrcoffice.com>, John Podesta <john.podesta@gmail.com>, 
 Huma Abedin <ha16@hillaryclinton.com>, Robby Mook <re47@hillaryclinton.com>, 
 Kristina Schake <kschake@hillaryclinton.com>

This is likely to filter down the emails to a manageable few thousand.

Next, filter the emails for ones already in the FBI's possession. The easiest way is using the Message-ID: header. It's a random value created for every email. If a Weiner email has the same Message-ID as an email already retrieved from Huma and Hillary, then the FBI can ignore it.

This is then like to reduce the number of emails need for review to less than a thousand, or less than 100, or even all the way down to zero. And indeed, that's what NBC news is reporting:




The point is is this. Computer geeks have tools that make searching the emails extremely easy. Given those emails, and a list of known email accounts from Hillary and associates, and a list of other search terms, it would take me only a few hours to do reduce the workload from 650,000 emails to only a couple hundred, which a single person can read in less than a day.

The question isn't whether the FBI could review all those emails in 8 days, but why the FBI couldn't have reviewed them all in one or two days. Or even why they couldn't have reviewed them before Comey made that horrendous announcement that they were reviewing the emails.





Thursday, November 03, 2016

In which I have to debunk a second time

So Slate is doubling-down on their discredited story of a secret Trump server. Tip for journalists: if you are going to argue against an expert debunking your story, try to contact that expert first, so they don't have to do what I'm going to do here, showing obvious flaws. Also, pay attention to the data.


The experts didn't find anything

The story claims:
"I spoke with many DNS experts. They found the evidence strongly suggestive of a relationship between the Trump Organization and the bank".
No, he didn't. He gave experts limited information and asked them whether it's consistent with a conspiracy theory. He didn't ask if it was "suggestive" of the conspiracy theory, or that this was the best theory that fit the data.

This is why "experts" quoted in the press need to go through "media training", to avoid getting your reputation harmed by bad journalists who try their best to put words in your mouth. You'll be trained to recognize bad journalists like this, and how not to get sucked into their fabrications.


Jean Camp isn't an expert

On the other hand, Jean Camp isn't an expert. I've never heard of her before. She gets details wrong. Take for example in this blogpost where she discusses lookups for the domain mail.trump-email.com.moscow.alfaintra.net. She says:
This query is unusual in that is merges two hostnames into one. It makes the most sense as a human error in inserting a new hostname in some dialog window, but neglected to hit the backspace to delete the old hostname.
Uh, no. It's normal DNS behavior with non-FQDNs. If the lookup for a name fails, computers will try again, pasting the local domain on the end. In other words, when Twitter's DNS was taken offline by the DDoS attack a couple weeks ago, those monitoring DNS saw a zillion lookups for names like "www.twitter.com.example.com".

I've reproduced this on my desktop by configuring the suffix moscow.alfaintra.net.



I then pinged "mail1.trump-email.com" and captured the packets. As you can see, after the initial lookups fail, Windows tried appending the suffix.



I don't know what Jean Camp is an expert of, but this is sorta a basic DNS concept. It's surprising she'd get it wrong. Of course, she may be an expert in DNS who simply had a brain fart (this happens to all of us), but looking across her posts and tweets, she doesn't seem to be somebody who has a lot of experience with DNS. Sorry for impugning her credibility, but that's the way the story is written. It demands that we trust the quoted "experts". 

Call up your own IT department at Slate. Ask your IT nerds if this is how DNS operates. Note: I'm saying your average, unremarkable IT nerds can debunk an "expert" you quote in your story.

Understanding "spam" and "blacklists"

The new article has a paragraph noting that the IP address doesn't appear on spam blocklists:
Was the server sending spam—unsolicited mail—as opposed to legitimate commercial marketing? There are databases that assiduously and comprehensively catalog spam. I entered the internet protocal address for mail1.trump-email.com to check if it ever showed up in Spamhaus and DNSBL.info. There were no traces of the IP address ever delivering spam.
This is a profound misunderstanding of how these things work.

Colloquially, we call those sending mass marketing emails, like Cendyn, "spammers". But those running blocklists have a narrower definition. If  emails contain an option to "opt-out" of future emails, then it's technically not "spam".

Cendyn is constantly getting added to blocklists when people complain. They spend considerable effort contacting the many organizations maintaining blocklists, proving they do "opt-outs", and getting "white-listed" instead of "black-listed". Indeed, the entire spam-blacklisting industry is a bit of scam -- getting white-listed often involves a bit of cash.

Those maintaining blacklists only go back a few months. The article is in error saying there's no record ever of Cendyn sending spam. Instead, if an address comes up clean, it means there's no record for the past few months. And, if Cendyn is in the white-lists, there would be no record of "spam" at all, anyway.

As somebody who frequently scans the entire Internet, I'm constantly getting on/off blacklists. It's a real pain. At the moment, my scanner address "209.126.230.71" doesn't appear to be on any blacklists. Next time a scan kicks off, it'll probably get added -- but only by a few, because most have white-listed it.


There is no IP address limitation

The story repeats the theory, which I already debunked, that the server has a weird configuration that limits who can talk to it:
The scientists theorized that the Trump and Alfa Bank servers had a secretive relationship after testing the behavior of mail1.trump-email.com using sites like Pingability. When they attempted to ping the site, they received the message “521 lvpmta14.lstrk.net does not accept mail from you.”
No, that's how Listrake (who is the one who actually controls the server) configures all their marketing servers. Anybody can confirm this themselves by ping all the servers in this range:


In case you don't want to do scans yourself, you can look up on Shodan and see that there's at least 4000 servers around the Internet who give the same error message.


Again, go back to Chris Davis in your original story ask him about this. He'll confirm that there's nothing nefarious or weird going on here, that it's just how Listrak has decided to configure all it's spam-sending engines.

Either this conspiracy goes much deeper, with hundreds of servers involved, or this is a meaningless datapoint.


Where did the DNS logs come from?

Tea Leaves and Jean Camp are showing logs of private communications. Where did these logs come from? This information isn't public. It means somebody has done something like hack into Alfa Bank. Or it means researchers who monitor DNS (for maintaing DNS, and for doing malware research) have broken their NDAs and possibly the law.

The data is incomplete and inconsistent. Those who work for other companies, like Dyn, claim it doesn't match their own data. We have good reason to doubt these logs. There's a good chance that the source doesn't have as comprehensive a view as "Tea Leaves" claim. There's also a good chance the data has been manipulated.

Specifically, I have as source who claims records for trump-email.com were changed in June, meaning either my source or Tea Leaves is lying.

Until we know more about the source of the data, it's impossible to believe the conclusions that only Alfa Bank was doing DNS lookups.

By the way, if you are a company like Alfa Bank, and you don't want the "research" community from seeing leaked intranet DNS requests, then you should probably reconfigure your DNS resolvers. You'll want to look into RFC7816 "query minimization", supported by the Unbound and Knot resolvers.


Do the graphs show interesting things?

The original "Tea Leaves" researchers are clearly acting in bad faith. They are trying to twist the data to match their conclusions. For example, in the original article, they claim that peaks in the DNS activity match campaign events. But looking at the graph, it's clear these are unrelated. It display the common cognitive bias of seeing patterns that aren't there.

Likewise, they claim that the timing throughout the day matches what you'd expect from humans interacting back and forth between Moscow and New York. No. This is what the activity looks like, graphing the number of queries by hour:

As you can see, there's no pattern. When workers go home at 5pm in New York City, it's midnight in Moscow. If humans were involved, you'd expect an eight hour lull during that time. Likewise, when workers arrive at 9am in New York City, you expect a spike in traffic for about an hour until workers in Moscow go home. You see none of that here. What you instead see is a random distribution throughout the day -- the sort of distribution you'd expect if this were DNS lookups from incoming spam.

The point is that we know the original "Tea Leaves" researchers aren't trustworthy, that they've convinced themselves of things that just aren't there.


Does Trump control the server in question?

OMG, this post asks the question, after I've debunked the original story, and still gotten the answer wrong.

The answer is that Listrak controls the server. Not even Cendyn controls it, really, they just contract services from Listrak. In other words, not only does Trump not control it, the next level company (Cendyn) also doesn't control it.


Does Trump control the domain in question?

OMG, this new story continues to make the claim the Trump Organization controls the domain trump-email.com, despite my debunking that Cendyn controls the domain.

Look at the WHOIS info yourself. All the contact info goes to Cendyn. It fits the pattern Cendyn chooses for their campaigns.
  • trump-email.com
  • mjh-email.com
  • denihan-email.com
  • hyatt-email.com

Cendyn even spells "Trump Orgainzation" wrong.


There's a difference between a "server" and a "name"

The article continues to make trivial technical errors, like confusing what a server is with what a domain name is. For example:
One of the intriguing facts in my original piece was that the Trump server was shut down on Sept. 23, two days after the New York Times made inquiries to Alfa Bank
The server has never been shutdown. Instead, the name "mail1.trump-email.com" was removed from Cendyn's DNS servers.

It's impossible to debunk everything in these stories because they garble the technical details so much that it's impossible to know what the heck they are claiming.


Why did Cendyn change things after Alfa Bank was notified?

It's a curious coincidence that Cendyn changed their DNS records a couple days after the NYTimes contacted Alfa Bank.

But "coincidence" is all it is. I have years of experience with investigating data breaches. I know that such coincidences abound. There's always weird coincidence that you are certain are meaningful, but which by the end of the investigation just aren't.

The biggest source of coincidences is that IT is always changing things and always messing things up. It's the nature of IT. Thus, you'll always see a change in IT that matches some other event. Those looking for conspiracies ignore the changes that don't match, and focus on the one that does, so it looms suspiciously.

As I've mentioned before, I have source that says Cendyn changed things around in June. This makes me believe that "Tea Leaves" is editing changes to highlight the one in September.

In any event, many people have noticed that the registrar email "Emily McMullin" has the same last name as Evan McMullin running against Trump in Utah. This supports my point: when you do hacking investigations, you find irrelevant connections all over the freakin' place.


"Experts stand by their analysis"

This new article states:
I’ve checked back with eight of the nine computer scientists and engineers I consulted for my original story, and they all stood by their fundamental analysis
Well, of course, they don't want to look like idiots. But notice the subtle rephrasing of the question: the experts stand by their analysis. It doesn't mean the same thing as standing behind the reporters analysis. The experts made narrow judgements, which even I stand behind as mostly correct, given the data they were given at the time. None of them were asked whether the entire conspiracy theory holds up.

What you should ask is people like Chris Davis or Paul Vixie whether they stand behind my analysis in the past two posts. Or really, ask any expert. I've documented things in sufficient clarity. For example, go back to Chris Davis and ask him again about the "limited IP address" theory, and whether it holds up against my scan of that data center above.


Conclusion

Other major news outlets all passed on the story, because even non experts know it's flawed. The data means nothing. The Slate journalist nonetheless went forward with the story, tricking experts, and finding some non-experts.

But as I've shown, given a complete technical analysis, the story falls apart. Most of what's strange is perfectly normal. The data itself (the DNS logs) are untrustworthy. It builds upon unknown things (like how the mail server rejects IP address) as "unknowable" things that confirm the conspiracy, when they are in fact simply things unknown at the current time, which can become knowable with a little research.

What I show in my first post, and this post, is more data. This data shows context. This data explains the unknowns that Slate present. Moreover, you don't have to trust me -- anybody can replicate my work and see for themselves.









Tuesday, November 01, 2016

Debunking Trump's "secret server"

According to this Slate article, Trump has a secret server for communicating with Russia. Even Hillary has piled onto this story.

This is nonsense. The evidence available on the Internet is that Trump neither (directly) controls the domain "trump-email.com", nor has access to the server. Instead, the domain was setup and controlled by Cendyn, a company that does marketing/promotions for hotels, including many of Trump's hotels. Cendyn outsources the email portions of its campaigns to a company called Listrak, which actually owns/operates the physical server in a data center in Philidelphia.