Friday, October 28, 2016

Configuring Raspberry Pi as a router

I'm setting up a little test network for IoT devices, one isolated a bit from my home network. This is a perfect job for a computer like the Raspberry Pi (or similar computers, such as the Odroid-C2, which is what I'm actually using here). I thought I'd blog the setup details in case anybody else wanted to setup their own isolated home network.

Monday, October 24, 2016

Lamers: the problem with bounties

In my last two posts, I pointed out that the anti-spam technique known as "DKIM" cryptographically verifies emails. This can be used to verify that some of the newsworthy emails are, indeed, correct and haven't been doctored. I offer a 1 btc (one bitcoin, around ~$600 at current exchange rates) bounty if anybody can challenge this assertion.

Unfortunately, bounties attract lamers who think they deserve the bounty. 

This guy insists he wins the bounty because he can add spaces to the email, and add fields like "Cc:" that DKIM doesn't check. Since DKIM ignores extra spaces and only checks important fields, these changes pass. The guy claims it's "doctored" because technically, he has changed things, even though he hasn't actually changed any of the important things (From, Date, Subject, and body content).

No. This doesn't qualify for the bounty. It doesn't call into question whether the Wikileaks emails say what they appear to say. It's so obvious that people have already contacted me and passed on it, knowing it wouldn't win the bounty. If I'd pay out this bounty for this lameness, one of the 10 people who came up with the idea before this lamer would get this bounty, not him. It'd probably go to this guy -- who knows it's lame, who isn't seeking the bounty, but would get it anyway before the lamer above:

Let me get ahead of the lamers and point to more sophisticated stuff that also doesn't count. The following DKIM verified email appears to say that Hillary admitting she eats kittens. This would be newsworthy if true, and a winner of this bounty if indeed it could trick people.
This is in fact also very lame. I mean, it's damn convincing, but only to lamers. You can see my trick by looking at the email on pastebin ( and comparing it to the original (

The trick is that I've added extra From/Subject fields before the DKIM header, so DKIM doesn't see them. DKIM only sees the fields after. It tricks other validation tools, such as this online validator. However, email readers (Thunderbird, Outlook, Apple Mail) see the first headers, and display something that DKIM hasn't checked.

I've taken a screenshot of the raw email to show both From fields:

Since I don't rely upon the "magic" of tools to verify DKIM, but look at the whole package, I'll see the added From/Subject fields. Far from fooling anybody, such modifications will be smoking gun that somebody has attempted illicit modifications. Not just me, mostly anybody viewing the "raw source" that Wikileaks provides would instantly see shenanigans.

The Wikileaks emails can be verified with crypto, using DKIM. Anybody who can doctor an email in such a way that calls this into question, such that they could pass something incriminating through ("I eat kittens"), they win the full bounty. Any good attempts, with something interesting and innovative, wins partial bounty.

Lamers, though, are unwelcome.

BTW, the same is true for bug bounties. Bounties are becoming standard in the infosec industry, whereby companies give people money if they can show ways hackers might hack products. Paying bounties allows companies to fix products before they actually get hacked. Even the U.S. military offers bounties if you can show ways to hack their computers. Lamers are a pox on bug bounty systems -- administrators must spend more time telling lamers to go away than they spend on dealing with real issues. No bounties rules are so tight that lamers can't find a way to subvert the rules without finding anything that matches the intent.

Sunday, October 23, 2016

Politifact: Yes we can fact check Kaine's email

This Politifact post muddles over whether the Wikileaks leaked emails have been doctored, specifically the one about Tim Kaine being picked a year ago. The post is wrong -- we can verify this email and most of the rest.

In order to bloc spam, emails nowadays contain a form of digital signatures that verify their authenticity. This is automatic, it happens on most modern email systems, without users being aware of it.

This means we can indeed validate most of the Wikileaks leaked DNC/Clinton/Podesta emails. There are many ways to do this, but the easiest is to install the popular Thunderbird email app along with the DKIM Verifier addon. Then go to the Wikileaks site and download the raw source of the email

As you see in the screenshot below, the DKIM signature verifies as true.

If somebody doctored the email, such as changing the date, then the signature would not verify. I try this in the email below, changing the date from 2015 to 2016. This causes the signature to fail.

There are some ways to forge DKIM-signed emails, specifically if the sender uses short keys. When short keys are used, hackers can "crack" them, and sign fraudulent emails. This doesn't apply to GMail, which uses strong 2048 bit keys, as demonstrated in the following screenshot. (No, the average person isn't supposed to understand this screen shot, but experts can).

What this means is that the only way this email could've been doctored is if there has been an enormous, nation-state level hack of Google to steal their signing key. It's possible, of course, but extraordinarily improbable. It's conspiracy-theory level thinking. Google GMail has logs of which emails went through its systems -- if there was a nation-state attack able to forge them, Google would know, and they'd be telling us. (For one thing, they'd be forcing password resets on all our accounts).

Since DKIM verifies this email and most of the others, we conclude that Kaine is "pants on fire" lying about this specific email, and "mostly untrue" in his claim that the Wikileaks emails have been doctored.

On the other hand, Wikileaks only shows us some of the emails. We don't see context. We don't see other staffers certain it's going to be somebody else for VP. We don't see related email discusses that cast this one in a different light. So of course whether this (verified) email means they'd firmly chosen Kaine is "mostly unproven". The purpose of this document isn't diagnosing what the emails mean, only the claims by Hillary's people that these emails have been "doctored".

As a side note, I offer a 1-BTC (one bit coin, ~$600 at today's exchange rate) bounty to anybody who can prove me wrong. If you can doctor the above email, then you win the bounty. Some rules apply (i.e. it needs to be a real doctored email, not a trick). I offer this bounty because already people are trying to cast doubt on whether DKIM works, without offering any evidence. Put up or shut up. Lamers aren't welcome.

Friday, October 21, 2016

Yes, we can validate the Wikileaks emails

Recently, WikiLeaks has released emails from Democrats. Many have repeatedly claimed that some of these emails are fake or have been modified, that there's no way to validate each and every one of them as being true. Actually, there is, using a mechanism called DKIM.

Some notes on today's DNS DDoS

Some notes on today's DNS outages due to DDoS.

We lack details. As a techy, I want to know the composition of the traffic. Is it blindly overflowing incoming links with junk traffic? Or is it cleverly sending valid DNS requests, overloading the ability of servers to respond, and overflowing outgoing link (as responses are five times or more as big as requests). Such techy details and more make a big difference. Was Dyn the only target? Why were non-Dyn customers effected?

Nothing to do with the IANA handover. So this post blames Obama for handing control of DNS to the Russians, or some such. It's silly, and not a shred of truth to it. For the record, I'm (or was) a Republican and opposed handing over the IANA. But the handover was a symbolic transition of a minor clerical function to a body that isn't anything like the U.N. The handover has nothing to do with either Obama or today's DDoS. There's no reason to blame this on Obama, other than the general reason that he's to blame for everything bad that happened in the last 8 years.

It's not a practice attack. A Bruce Schneier post created the idea of hacking doing "practice" DDoS. That's not how things work. Using a botnot for DDoS always degrades it, as owners of machines find the infections and remove them. The people getting the most practice are the defenders, who learn more from the incident than the attackers do.

It's not practice for Nov. 8. I tweeted a possible connection to the election because I thought it'd be self-evidently a troll, but a lot of good, intelligent, well-meaning people took it seriously. A functioning Internet is not involved in counting the votes anywhere, so it's hard to see how any Internet attack can "rig" the election. DDoSing news sources like CNN might be fun -- a blackout of news might make some people go crazy and riot in the streets. Imagine if Twitter went down while people were voting. With this said, we may see DDoS anyway -- lots of kids control large botnets, so it may happen on election day because they can, not because it changes anything.

Dyn stupidly uses BIND. According to "version.bind" queries, Dyn (the big DNS provider that is a major target) uses BIND. This is the most popular DNS server software, but it's wrong. It 10x to 100x slower than alternatives, meaning that they need 100x more server hardware in order to deal with DDoS attacks. BIND is also 10x more complex -- it strives to be the reference implementation that contains all DNS features, rather than a simple bit of software that just handles this one case. BIND should never be used for Internet-facing DNS, packages like KnotDNS and NSD should be used instead.

Fixing IoT. The persistent rumor is that an IoT botnet is being used. So everything is calling for regulations to secure IoT devices. This is extraordinarily bad. First of all, most of the devices are made in China and shipped to countries not in the United States, so there's little effect our regulations can have. Except they would essentially kill the Kickstarter community coming up with innovative IoT devices. Only very large corporations can afford the regulatory burden involved. Moreover, it's unclear what "security" means. There no real bug/vulnerability being exploited here other than default passwords -- something even the US government has at times refused to recognize as a security "vulnerability".

Fixing IoT #2. People have come up with many ways default passwords might be solved, such as having a sticker on the device with a randomly generated password. Getting the firmware to match a printed sticker during manufacturing is a hard, costly problem. I mean, they do it all the time for other reasons, but it starts to become a burden for cheaper device. But in any event, the correct solution is connecting via Bluetooth. That seems to be the most popular solution these days from Wimo to Echo. Most of the popular WiFi chips come with Bluetooth, so it's really no burden for make devices this way.

It's not IoT. The Mirai botnet primarily infected DVRs connected to security cameras. In other words, it didn't infect baby monitors or other IoT devices insider your home, which are protected by your home firewall anyway. Instead, Mirai infected things that were outside in the world that needed their own IP address.

DNS failures cause email failures. When DNS goes down, legitimate email gets reclassified as spam, and dropped by spam filters

It's all about that TTL. You don't contact a company's DNS server directly. Instead, you contact your ISPs "cache". How long something stays in that cache is determined by what's known as the TTL or "time to live". Long TTLs mean that if a company wants to move servers around, they'll have to wait until for until caches have finally aged out old data. Short TTLs mean changes propagate quickly. Any company that had 24 hours as their TTL was mostly unaffected by the attack. Twitter has a TTL of 205 seconds, meaning it only takes 4 minutes of DDoS against the DNS server to take Twitter offline. One strategy, which apparently Cisco OpenDNS uses, is to retain old records in its cache if it can't find new ones, regardless of the TTL. Using their servers, instead of your ISPs, can fix DNS DDoS for you:

Why not use anycast?

The attack took down only east-coast operations, attacking only part of Dyn's infrastructure located there. Other DNS providers, such as Google's famed resolver, do not have a single location. They instead us anycasting, routing packets to one of many local servers, in many locations, rather than a single server in one location. In other words, if you are in Australia and use Google's resolver, you'll be sending requests to a server located in Australia, and not in Google's headquarters.

The problem with anycasting is it technically only works for UDP. That's because each packet finds its own way through the Internet. Two packets sent back-to-back to may, in fact, hit different servers. This makes it impossible to establish a TCP connection, which requires all packets be sent to the same server. Indeed, when I test it here at home, I get back different responses to the same DNS query done back-to-back to, hinting that my request is being handled by different servers.

Historically, DNS has used only UDP, so that hasn't been a problem. It still isn't a problem for "root servers", which server only simple responses. However, it's becoming a problem for normal DNS servers, which give complex answers that can require multiple packets to hold a response. This is true for DNSSEC and things like DKIM (email authentication). That TCP might sometimes fail therefore means things like email authentication sometimes fail. That it will probably work 99 times out of 100 means that 1% of the time it fails -- which is unacceptable.

There are ways around this. An anycast system could handle UDP directly and pass all TCP to a centralized server somewhere, for example. This allows UDP at max efficiency while still correctly with the uncommon TCP. The point is, though, that for Dyn to make anycast work requires careful thinking and engineering. It's not a simple answer.

Wednesday, October 19, 2016

Cliché: Security through obscurity (again)

This post keeps popping up in my timeline. It's wrong. The phrase "security through/by security" has become such a cliché that it's lost all meaning. When somebody says it, they are almost certainly saying a dumb thing, regardless if they support it or are trying to debunk it.

Why cybersecurity certifications suck

Here's a sample question from a GIAC certification test. It demonstrates why such tests suck.
The important deep knowledge you should know about traceroute how it send packets with increasing TTLs to trace the route.

But that's not what the question is asking. Instead, it's asking superfluous information about the default behavior, namely about Linux defaults. It's a trivia test, not a knowledge test. If you've recently studied the subject, your course book probably tells you that Linux traceroute defaults to UDP packets on transmit. So, those who study for the test will do well on the question.

But those with either a lot of deep knowledge or practical experience will find this question harder. Windows and Linux use different defaults (Windows uses ICMP ECHOs, Linux uses UDP). Personally, I'm not sure which is which (well, I am now, 'cause I looked it up, but I'm likely to forget it again soon, because it's a relatively unimportant detail).

Those with deep learning have another problem with the word "protocol". This question uses "protocol" in one sense, where only UDP, TCP, and ICMP are valid "protocols".

But the word can be used in another sense, where "Echo" and "TTL" are also valid "protocols". A protocol is a set of rules that govern things. Thus we say phrases like "slow start protocol" for how TCP handles initial congestion, even though this "protocol" has no protocol header or particular fields. In much the same way, TTL is a "protocol" or "set of rules" for handling routing loops that traceroute exploits. That Linux uses the TTL protocol when transmitting packets is a perfectly valid answer to this question, albeit not the conventional one.

Exams suck because those writing the exams themselves often lack experience and deep knowledge. They are only one short step ahead of their students.

This leaves such test prejudiced toward those who have recent read (and who are likely soon to forget) a textbook. The tests are prejudiced against those who the tests are intended to highlight, those with experience and deep knowledge.

I'm not really trying to beat up on the GIAC tests here. I'm simply demonstrating the problem in our industry. We want to be able to certify people like doctors and lawyers, real "professions" where if things go wrong, people's lives can be ruined. We are far from that. All certification tests are entry-level only. Our trade has not existed long enough to become a full trustworthy "profession".

Tuesday, October 18, 2016

Trump on cybersecurity: vacuous and populist

Trump has published his policy on cybersecurity. It demonstrates that he and his people do not understand the first thing about cybersecurity.

Specifically, he wants “the best defense technologies” and “cyber awareness training for all government employees”. These are well known bad policies in the cybersecurity industry. They are the sort of thing the intern with a degree from Trump University would come up with.

Awareness training is the knee-jerk response to any problem. Employees already spend a lot of their time doing mandatory training for everything from environmental friendly behavior, to sexual harassment, to Sarbannes-Oxley financial compliance, to cyber-security. None of it has proven effective, but organizations continue to force it, either because they are required to, or they are covering their asses. No amount of training employees to not click on email attachments helps. Instead, the network must be secure enough that reckless clicking on attachments pose no danger.

Belief in a technological Magic Pill that will stop hackers is common among those who know nothing about cybersecurity. Such pills don’t exist. The least secure networks already have “the best defense technologies”. Things like anti-virus, firewalls, and intrusion prevention systems do not stop hackers by themselves – but area instead tools that knowledgeable teams use in order to make their jobs easier. It’s like how a chisel doesn’t make a sculpture by itself, but is instead just a tool used by the artist. The government already has all the technology it needs. It’s problems instead derive from the fact that they try to solve their problems the way Trump does – by assigning the task to some Trump University intern.

Lastly, Trump suggests that on the offensive side, we need to improve our offensive abilities, in order to create a cyber deterrence. We already do that. The United States is by far the #1 nation in offensive capabilities. In 2015, Obama forced China to the table, to sign an agreement promising they’d stop hacking us. Since then, China has kept the agreement, and has dropped out of the news as being the source of cyber attacks. Privately, many people in government tell me its because we did some major cyber attack in China that successfully deterred them.

Trump promises to be a strong leader who hires effective people. He demonstrates this nowhere. In my area of expertise, he and his people demonstrate a shocking ignorance of the issues. It's typical populist rhetoric: when China and Russia rape our computers, he'll blame it on some sort of rigged system, not his own incompetence.

Disclaimer: I don't care about Trump's locker room comments, or any of the other things that get the mass media upset. I oppose Trump because he's a vacuous populist, as I demonstrate here.

Wednesday, October 12, 2016

WTF Yahoo/FISA search in kernel?

A surprising detail in the Yahoo/FISA email search scandal is that they do it with a kernel module. I thought I’d write up some (rambling) notes.

What the government was searching for

As described in the previoius blog post, we’ll assume the government is searching for the following string, and possibly other strings like it within emails:

### Begin ASRAR El Mojahedeen v2.0 Encrypted Message ###

I point this out because it’s simple search identifying things. It’s not natural language processing. It’s not searching for phrases like “bomb president”.

Also, it's not AV/spam/childporn processing. Those look at different things. For example, filtering message containing childporn involves calculating a SHA2 hash of email attachments and looking up the hashes in a table of known bad content (or even more in-depth analysis). This is quite different from searching.

The Kernel vs. User Space

Operating systems have two parts, the kernel and user space. The kernel is the operating system proper (e.g. the “Linux kernel”). The software we run is in user space, such as browsers, word processors, games, web servers, databases, GNU utilities [sic], and so on.

The kernel has raw access to the machine, memory, network devices, graphics cards, and so on. User space has virtual access to these things. The user space is the original “virtual machines”, before kernels got so bloated that we needed a third layer to virtualize them too.

This separation between kernel and user has two main benefits. The first is security, controlling which bit of software has access to what. It means, for example, that one user on the machine can’t access another’s files. The second benefit is stability: if one program crashes, the others continue to run unaffected.

Downside of a Kernel Module

Writing a search program as a kernel module (instead of a user space module) defeats the benefits of user space programs, making the machine less stable and less secure.

Moreover, the sort of thing this module does (parsing emails) has a history of big gapping security flaws. Parsing stuff in the kernel makes cybersecurity experts run away screaming in terror.

On the other hand, people have been doing security stuff (SSL implementations and anti-virus scanning) in the kernel in other situations, so it’s not unprecedented. I mean, it’s still wrong, but it’s been done before.

Upside of a Kernel Module

If doing this is as a kernel module (instead of in user space) is so bad, then why does Yahoo do it? It’s probably due to the widely held, but false, belief that putting stuff in the kernel makes it faster.

Everybody knows that kernels are faster, for two reasons. First is that as a program runs, making a system call switches context, from running in user space to running in kernel space. This step is expensive/slow. Kernel modules don’t incur this expense, because code just jumps from one location in the kernel to another. The second performance issue is virtual memory, where reading memory requires an extra step in user space, to translate the virtual memory address to a physical one. Kernel modules access physical memory directly, without this extra step.

But everyone is wrong. Using features like hugepages gets rid of the cost of virtual memory translation cost. There are ways to mitigate the cost of user/kernel transitions, such as moving data in bulk instead of a little bit at a time. Also, CPUs have improved in recent years, dramatically reducing the cost of a kernel/user transition.

The problem we face, though, is inertia. Everyone knows moving modules into the kernel makes things faster. It's hard getting them to un-learn what they've been taught.

Also, following this logic, Yahoo may already have many email handling functions in the kernel. If they've already gone down the route of bad design, then they'd have to do this email search as a kernel module as well, to avoid the user/kernel transition cost.

Another possible reason for the kernel-module is that it’s what the programmers knew how to do. That’s especially true if the contractor has experience with other kernel software, such as NSA implants. They might’ve read Phrack magazine on the topic, which might have been their sole education on the subject. []

How it was probably done

I don’t know Yahoo’s infrastructure. Presumably they have front-end systems designed to balance the load (and accelerate SSL processing), and back-end systems that do the heavy processing, such as spam and virus checking.

The typical way to do this sort of thing (search) is simply tap into the network traffic, either as a separate computer sniffing (eavesdropping on) the network, or something within the system that taps into the network traffic, such as a netfilter module. Netfilter is the Linux firewall mechanism, and has ways to easily “hook” into specific traffic, either from user space or from a kernel module. There is also a related user space mechanism of hooking network APIs like recv() with a preload shared library.

This traditional mechanism doesn’t work as well anymore. For one thing, incoming email traffic is likely encrypted using SSL (using STARTTLS, for example). For another thing, companies are increasingly encrypting intra-data-center traffic, either with SSL or with hard-coded keys.

Therefore, instead of tapping into network traffic, the code might tap directly into the mail handling software. A good example of this is Sendmail’s milter interface, that allows the easy creation of third-party mail filtering applications, specifically for spam and anti-virus.

But it would be insane to write a milter as a kernel module, since mail handling is done in user space, thus adding unnecessary user/kernel transitions. Consequently, we make the assumption that Yahoo’s intra-data-center traffic in unencrypted, and that for FISA search thing, they wrote something like a kernel-module with netfilter hooks.

How it should’ve been done

Assuming the above guess is correct, that they used kernel netfilter hooks, there are a few alternatives.

They could do user space netfilter hooks instead, but they do have a performance impact. They require a transition from the kernel to user, then a second transition back into the kernel. If the system is designed for high performance, this might be a noticeable performance impact. I doubt it, as it’s still small compared to the rest of the computations involved, but it’s the sort of thing that engineers are prejudiced against, even before they measure the performance impact.

A better way of doing it is hooking the libraries. These days, most software uses shared libraries (.so) to make system calls like recv(). You can write your own shared library, and preload it. When the library function is called, you do your own processing, then call the original function.

Hooking the libraries then lets you tap into the network traffic, but without any additional kernel/user transition.

Yet another way is simple changes in the mail handling software that allows custom hooks to be written.

Third party contractors

We’ve been thinking in terms of technical solutions. There is also the problem of politics.

Almost certainly, the solution was developed by outsiders, by defense contractors like Booz-Allen. (I point them out because of the whole Snowden/Martin thing). This restricts your technical options.

You don’t want to give contractors access to your source code. Nor do you want to the contractors to be making custom changes to your source code, such as adding hooks. Therefore, you are looking at external changes, such as hooking the network stack.

The advantage of a netfilter hook in the kernel is that it has the least additional impact on the system. It can be developed and thoroughly tested by Booz-Allen, then delivered to Yahoo!, who can then install it with little effort.

This is my #1 guess why this was a kernel module – it allowed the most separation between Yahoo! and a defense contractor who wrote it. In other words, there is no technical reason for it -- but a political reason.

Let’s talk search

There two ways to search things: using an NFA and using a DFA.

An NFA is the normal way of using regex, or grep. It allows complex patterns to be written, but it requires a potentially large amount of CPU power (i.e. it’s slow). It also requires backtracking within a message, thus meaning the entire email must be reassembled before searching can begin.

The DFA alternative instead creates a large table in memory, then does a single pass over a message to search. Because it does only a single pass, without backtracking, the message can be streamed through the search module, without needing to reassemble the message. In theory, anything searched by an NFA can be searched by a DFA, though in practice some unbounded regex expressions require too much memory, so DFAs usually require simpler patterns.

The DFA approach, by the way, is about 4-gbps per 2.x-GHz Intel x86 server CPU. Because no reassembly is required, it can tap directly into anything above the TCP stack, like netfilter. Or, it can tap below the TCP stack (like libpcap), but would require some logic to re-order/de-duplicate TCP packets, to present the same ordered stream as TCP.

DFAs would therefore require little or no memory. In contrast, the NFA approach will require more CPU and memory just to reassemble email messages, and the search itself would also be slower.

The naïve approach to searching is to use NFAs. It’s what most people start out with. The smart approach is to use DFAs. You see that in the evolution of the Snort intrusion detection engine, where they started out using complex NFAs and then over the years switched to the faster DFAs.

You also see it in the network processor market. These are specialized CPUs designed for things like firewalls. They advertise fast regex acceleration, but what they really do is just convert NFAs into something that is mostly a DFA, which you can do on any processor anyway. I have a low opinion of network processors, since what they accelerate are bad decisions. Correctly designed network applications don’t need any special acceleration, except maybe SSL public-key crypto.

So, what the government’s code needs to do is a very lightweight parse of the SMTP protocol in order to extract the from/to email addresses, then a very lightweight search of the message’s content in order to detect if any of the offending strings have been found. When the pattern is found, it then reports the addresses it found.


I don’t know Yahoo’s system for processing incoming emails. I don’t know the contents of the court order forcing them to do a search, and what needs to be secret. Therefore, I’m only making guesses here.

But they are educated guesses. In 9 times out of 10 in situations similar to Yahoo, I’m guessing that a “kernel module” would be the most natural solution. It’s how engineers are trained to think, and it would likely be the best fit organizationally. Sure, it really REALLY annoys cybersecurity experts, but nobody cares what we think, so that doesn’t matter.

Thursday, October 06, 2016

What the Yahoo NSA might've looked for

The vague story about Yahoo searching emails for the NSA was cleared up today with various stories from other outlets [1]. It seems clear a FISA court order was used to compel Yahoo to search all their customer's email for a pattern (or patterns). But there's an important detail still missing: what specifically were they searching for? In this post, I give an example.

The NYTimes article explains the search thusly:
Investigators had learned that agents of the foreign terrorist organization were communicating using Yahoo’s email service and with a method that involved a “highly unique” identifier or signature, but the investigators did not know which specific email accounts those agents were using, the officials said.
What they are likely referring it is software like "Mujahideen Secrets", which terrorists have been using for about a decade to encrypt messages. It includes a unique fingerprint/signature that can easily be searched for, as shown below.

In the screenshot below, I use this software to type in a secret message:

I then hit the "encrypt" button, and get the following, a chunk of random looking text:

This software encrypts, but does not send/receive messages. You have to do that manually yourself. It's intended that terrorists will copy/paste this text into emails. They may also paste the messages into forum posts. Encryption is so good that nobody, not even the NSA, can crack properly encrypted messages, so it's okay to post them to public forums, and still maintain secrecy.

In my case, I copy/pasted this encrypted message into an email message from one of my accounts and sent to to one of my Yahoo! email accounts. I received the message shown below:

The obvious "highly unique signature" the FBI should be looking for, to catch this software, is the string:
### Begin ASRAR El Mojahedeen v2.0 Encrypted Message ###
Indeed, if this is the program the NSA/FBI was looking for, they've now caught this message in their dragnet of incoming Yahoo! mail. This is a bit creepy, which is why I added a plea to the message, in unencrypted form, asking them not to rendition or drone strike me. Since the NSA can use such signatures to search traffic from websites, as well as email traffic, there's a good change you've been added to their "list" simply for reading this blog post. For fun, send this blogpost to family or friends you don't particularly like, in order to get them on the watch list as well.

The thing to note about this is that the string is both content and metadata. As far as the email system is concerned, it is content like anything else you might paste into a message. As far as the terrorists are concerned, the content is encrypted, and this string is just metadata describing how the content was encrypted. I suspect the FISA court might consider content and metadata differently, and that they might issue such an order to search for this metadata while not being willing to order searches of patterns within content.

Regardless of what FISA decides, though, this is still mass surveillance of American citizens. All Yahoo! mail is scanned for such a pattern. I'm no sure how this can possibly be constitutional. Well, I do know how -- we can't get any details about what the government is doing, because national security, and thus we have no "standing" in the court to challenge what they are doing.

Note that one reason Yahoo! may have had to act in 2015 is because after the Snowden revelations, and at the behest of activists, email providers started to use STARTTLS encryption between email servers. If the NSA had servers passively listening to email traffic before, they'd need to be replaced with a new system that tapped more actively into the incoming email stream, behind the initial servers. Thus, we may be able to blame activists for this system (or credit, as the case may be :).

In any case, while the newer stories do a much better job at describe what details are available, no story is complete on this issue. This blogpost suggests one possible scenario that matches the available descriptions, to show more concretely what's going on.

If you want to be troublemaker, add the above string to as your email signature, so that it gets sent as part of every email you send. It's hard to imagine the NSA or GCHQ aren't looking for this string, so it'll jam up their system.

Tuesday, October 04, 2016

The Yahoo-email-search story is garbage

Joseph Menn (Reuters) is reporting that Yahoo! searched emails for the NSA. The details of the story are so mangled that it's impossible to say what's actually going on.

The first paragraph says this:
Yahoo Inc last year secretly built a custom software program to search all of its customers' incoming emails
The second paragraph says this:
The company complied with a classified U.S. government demand, scanning hundreds of millions of Yahoo Mail accounts
Well? Which is it? Did they "search incoming emails" or did they "scan mail accounts"? Whether we are dealing with emails in transmit, or stored on the servers, is a BFD (Big Fucking Detail) that you can't gloss over and confuse in a story like this. Whether searches are done indiscriminately across all emails, or only for specific accounts, is another BFD.

The third paragraph seems to resolve this, but it doesn't:
Some surveillance experts said this represents the first case to surface of a U.S. Internet company agreeing to an intelligence agency's request by searching all arriving messages, as opposed to examining stored messages or scanning a small number of accounts in real time.
Who are these "some surveillance experts"? Why is the story keeping their identities secret? Are they some whistleblowers afraid for their jobs? If so, then that should be mentioned. In reality, they are unlikely to be real surveillance experts, but just some random person that knows slightly more about the subject than Joseph Menn, and their identities are being kept secret in order to prevent us from challenging these experts -- which is a violation of journalistic ethics.

And, are they analyzing the raw information the author sent them? Or are they opining on the garbled version of events that we see in the first two paragraphs.

The confusion continues:
It is not known what information intelligence officials were looking for, only that they wanted Yahoo to search for a set of characters. That could mean a phrase in an email or an attachment, said the sources, who did not want to be identified.
What the fuck is a "set of characters"??? Is this an exact quote for somewhere? Or something the author of the story made up? The clarification of what this "could mean" doesn't clear this up, because if that's what it "actually means", then why not say this to begin with?

It's not just technical terms, but also legal ones:
The request to search Yahoo Mail accounts came in the form of a classified edict sent to the company's legal team, according to the three people familiar with the matter.
What the fuck is a "classified edict"? An NSL? A FISA court order? What? This is also a BFD.

We outsiders already know about the NSA/FBI's ability to ask for strong selectors (email addresses). What what we don't know about is their ability to search all emails, regardless of account, for arbitrary keywords/phases. If that's what's going on, then this would be a huge story. But the story doesn't make it clear that this is actually what's going on -- just strongly implies it.

There are many other ways to interpret this story. For example, the government may simply be demanding that when Yahoo satisfies demands for emails (based on email addresses), that it does so from the raw incoming stream, before it hits spam/malware filters. Or, they may be demanding that Yahoo satisfies their demands with more secrecy, so that the entire company doesn't learn of the email addresses that a FISA order demands. Or, the government may be demanding that the normal collection happen in real time, in the seconds that emails arrive, instead of minutes later.

Or maybe this isn't an NSA/FISA story at all. Maybe the DHS has a cybersecurity information sharing program that distributes IoCs (indicators of compromise) to companies under NDA. Because it's a separate program under NDA, Yahoo would need to setup a email malware scanning system separate from their existing malware system in order to use those IoCs. (@declanm's stream has further variations on this scenario).

My point is this: the story is full of mangled details that really tell us nothing. I can come up with multiple, unrelated scenarios that are consistent with the content in the story. The story certainly doesn't say that Yahoo did anything wrong, or that the government is doing anything wrong (at least, wronger than we already know).

I'm convinced the government is up to no good, strong arming companies like Yahoo into compliance. The thing that's stopping us from discovering malfeasance is poor reporting like this.

Saturday, October 01, 2016

No, Trump's losses doesn't allow tax avoidance

The New York Times is reporting that Tump lost nearly a billion dollars in 1995, and this would enable tax avoidance for 18 years. No, it doesn't allow "avoidance". This is not how taxes work.

Let's do a little story problem:

  • You invest in a broad basket of stocks for $100,000
  • You later sell them for $110,000
  • Capital gains rate on this is 20%
  • How much taxes do you owe?

Obviously, since you gained $10,000 net, and tax rate is 20%, you then owe $2,000 in taxes.

But this is only because losses offset gains. All the stocks in your basket didn't go up 10%. Some went up more, some actually lost money. It's not unusual that the losing stocks might go down $50,000, while the gainers go up $60,000, thus giving you the 10% net return, if you are investing in high-risk/high-reward stocks.

What if instead we change the tax code to only count the winners, ignoring the losing stocks. Now, instead of owing taxes on $10,000, you owe taxes on $60,000. At 20% tax rate, this comes out to $12,000 in taxes -- which is actually more than you earned on your investments.

Taxing only investments that win, while ignoring losers, is bad tax policy. It would mean, essentially, taxing investments at greater than 100% rate. This would mean people would stop investing, because it would only lose money. It's a stupid tax policy, which is why no country does it. All countries tax the net gain on investments, gains minus losses.

In the above story problem, we bought and sold the stock all at once. In the real world, people buy and sell a little bit at time over the years. It doesn't change the basic math. For that reason, losses in one year can be carried forward to offset gains in later years. You can't (easily) do the reverse, offset previous years, because you've already paid the taxes. You don't want the government giving Trump a $200-million tax refund check when he loses $1-billion.

Thus, there's nothing wrong with offsetting $1 billion gains in later years with $1 billion in losses. He's not avoiding taxes on the gains for 18 years -- it instead means that he has no gains over that 18 year period (assuming after the loss, he fails to earn $1 billion to catch back up). That he might have been earning no money, net, for 20 years is the big story -- not that he's taking advantage of some loophole in the tax law.

Offsetting future gains with past losses is not a loophole. Everybody who invests, and hence sometimes has losses, does it. Every country's tax code, like France, Sweden, or any socialist paradise you care to name, works the same way.

That's why Trump is going to win this election. The press knows how taxes work, but they intentionally twist the story to make Trump look bad. The real story with these returns is that Trump is, in fact, a shitty investor, not that he's a tax cheat.

By the way, I am a tax cheat. I had losses in the 2009 crash. Instead of immediately using those losses to offset gains in 2010 and 2011, I waited until Obamacare came into effect, which raised my tax rates. Only then did I claim the losses against gains, saving an extra few percent on my tax bill, and screwing the government out of a few thousand dollars (in a totally legal way).

There's a few bad tax loopholes in the system, like the ones hedge fund managers use, but overall, you really can't avoid paying taxes. You can shift things around a bit to change which taxes you pay, such as the above example, but that the rich use tax loopholes to avoid taxes is a myth. Indeed, in terms of taxes payments received by the government, most of them come from the rich -- at a higher rate than they come from the poor -- minus the odd hedge fund manager.