Friday, February 05, 2016

Twitter has to change

Today, Twitter announced that instead of the normal timeline of newest messages on top, they will prioritize messages they think you'll be interested in. This angers a lot of people, but my guess it's it's something Twitter has to do.

Let me give you an example. Edward @Snowden has 1.4 million followers on Twitter. Yesterday, he retweeted a link to one of my blogposts. You'd think this would've caused a flood of traffic to my blog, but it hasn't. That post still has fewer than 5000 pageviews, and is only the third most popular post on my blog this week. More people come from Reddit and news.ycombinator.com than from Twitter.

I suspect the reason is that the older twitter gets, the more people people follow. (...the more persons each individual Twitter customer will follow). I'm in that boat. If you tweeted something more than 10 minutes since the last time I checked Twitter, I will not have seen it. I read fewer than 5% of what's possible in my timeline. That's something Twitter can actually measure, so they already know it's a problem.

Note that the Internet is littered with websites that were once dominant in their day, but which  failed to change and adapt. Internet oldtimers will remember Slashdot as a good example.

Thus, Twitter has to evolve. There's a good change their attempts will fail, and they'll shoot themselves. On the other hand, not attempting is guaranteed failure.

Wednesday, February 03, 2016

Lawfare thinks it can redefine π, and backdoors

There is gulf between how people believe law to work (from watching TV shows like Law and Order) and how law actually works. You lawyer people know what I'm talking about. It's laughable.

The same is true of cyber: there's a gulf between how people think it works and how it actually works.

This Lawfare blogpost thinks it's come up with a clever method to get their way in the crypto-backdoor debate, by making carriers like AT&T responsible only for the what ("deliver interpretable signal in response to lawful wiretap order") without defining the how (crypto backdoors, etc.). This pressure would come in the form of removing current liability protections they now enjoy for not being responsible for what customers transmit across their network. Or as the post paraphrases the proposal:
Don’t expect us to protect you from liability for third-party conduct if you actively design your systems to frustrate government efforts to monitor that third-party conduct.
The post is proud of its own smarts, as if they've figured out how to outwit mathematicians and redefine pi (π). But their solution is nonsense, based on a hopelessly naive understanding of how the Internet works. It appears all they know about the Internet is what they learned from watching CSI:Cyber.

The Internet is end-to-end. End-to-end is the technology shift that made the Internet happen, as compared to alternative directions cyberspace might have taken.

What that means is AT&T doesn't encrypt traffic. Apple's iPhone don't encrypt traffic. Instead, it's the app installed on the phone that does the encryption. Neither AT&T nor Apple can stop encryption from happening.

You think that because most people use iMessage or Snapchat, that all you have to do is turn the screws on them in order to force them to comply with backdoors. That won't work, because the bad guys will stop using those apps and install different encrypted apps, like Signal. You imagine that it's just a game of wack-a-mole, and eventually you'll pressure all apps into compliance. But Signal is open-source. If it disappeared tomorrow, I'd still have a copy of the source, which I can compile into my own app I'll call Xignal. I'll continue making encrypted phone calls with my own app. Even if no source existed today, I could write my own source within a couple months to do this. Indeed, writing an encrypted chat app is typical homework assignment colleges might assign computer science students. (You people still haven't come to grips with the fact that in cyberspace, we are living with the equivalent of physicists able to whip up a-bombs in their basements).

Running arbitrary software is a loose end that will defeat every solution you can come up with. It's math. The only way forward to fix the "going dark" problem is to ban software code. But that you can't do without destroying the economy and converting the country into a dystopic, Orwellian police state.

You think that those of us who oppose crypto backdoors are hippies with a knee-jerk rejection of any government technological mandate. That's not true. The populists at the EFF love technological mandates in their favor, such as NetNeutrality mandates, or bans on exporting viruses to evil regimes (though they've recently walked back on that one).

Instead, we reject this specific technological mandate, because we know cyber. We know it won't work. We can see that you'll never solve your "going dark" problem, but in trying to, you'll cause a constant erosion of both the economic utility of the Internet and our own civil liberties.

I apologize for the tone of this piece, saying you are stupid about cyber, but that's what it always comes down to. The author of that piece has impressive Washington D.C. think-tanky credentials, but misfires on the basic end-to-end problem. And all think-tanky pieces on this debate are going to happen the same way, because as soon as they bring technologists in to consult on the problem, their desired op-eds become stillborn before anybody sees them.




Note: I get the π analogy from a tweet by @quinnorton, I don't know who came up with analogy originally.

Tuesday, February 02, 2016

They are deadly serious about crypto backdoors

Julian Sanchez (@normative) has an article questioning whether the FBI is serious about pushing crypto backdoors, or whether this is all a ploy pressuring companies like Apple to give them access. I think they are serious -- deadly serious.

The reason they are only half-heartedly pushing backdoors at the moment is that they believe we, the opposition, aren't serious about the issue. After all, the 4rth Amendment says that a "warrant of probable cause" gives law enforcement unlimited power to invade our privacy. Since the constitution is on their side, only irrelevant hippies could ever disagree. There is no serious opposition to the proposition. It'll all work itself out in the FBI's favor eventually. Among the fascist class of politicians, like the Dianne Feinsteins and Lindsay Grahams of the world, belief in this principle is rock solid. They have absolutely no doubt.

But the opposition is deadly serious. By "deadly" I mean this is an issue we are willing to take up arms over. If congress were to pass a law outlawing strong crypto, I'd move to a non-extradition country, declare the revolution, and start working to bring down the government. You think the "Anonymous" hackers were bad, but you've seen nothing compared to what the tech community would do if encryption were outlawed.

On most policy questions, there are two sides to the debate, where reasonable people disagree. Crypto backdoors isn't that type of policy question. It's equivalent to techies what trying to ban guns would be to the NRA.

So the FBI trundles along, as if the opposition were hippies instead of ardent revolutionaries.

Eventually, though, things will come to a head where the FBI pushes forward. There will eventually be another major terrorist attack in the United States, and the terrorist will have been using encrypted communications. At that point, we are going to see the deadly seriousness of the FBI on the issue, and the deadly seriousness of the opposition. And by "deadly" I mean exactly that -- violence and people getting killed.

Julian Sanchez is probably right that at this point, the FBI isn't pushing too hard, and is willing to just pressure companies to get what they want (recovered messages from iCloud backups), and to give populist activists like the EFF easy wins (avoiding full backdoors) to take the pressure off. But in the long run, I believe this issue will become violent.

Is packet-sniffing illegal? (OmniCISA update)

In the news recently, Janet Napolitano (formerly head of DHS, now head of California's university system) had packet-sniffing software installed at the UC Berkeley campus to monitor all its traffic. This brings up the age old question: is such packet-sniffing legal, or a violation of wiretap laws.

Setting aside the legality question for the moment, I should first point out that's its perfectly normal. Almost all organizations use "packet-sniffers" to help manage their network. Almost all organizations have "intrusion detection systems" (IDS) that monitor network traffic looking for hacker attacks. Learning how to use packet-sniffers like "Wireshark" is part of every network engineer's training.

Indeed, while the news articles describes this as some special and nefarious plot by Napolitano, the reality is that it's probably just an upgrade of packet-sniffer systems that already exist.

Ironical, much packet-sniffing practice comes from UC Berkele. It's famous for having created "BPF", the eponymously named "Berkeley Packet Filter", a standard for packet-sniffing included in most computers. Whatever packet-sniffing system Berkeley purchased to eavesdrop on its networks is almost certainly including Berkeley's own BPF software.


Now for the legal question. Even if everyone is doing it, it doesn't necessarily mean it's legal. But the wiretap law does appear to contain an exception for packet-sniffing. Section 18 U.S. Code § 2511 (2) (a) (i) says:
It shall not be unlawful ... to intercept ... while engaged in any activity which is a necessary incident to the rendition of his service or to the protection of the rights or property of the provider of that service
In other words, you can wiretap your own network in order to keep it running and protect it against hackers. There is a lengthy academic paper that discusses this in more details: http://spot.colorado.edu/~sicker/publications/issues.pdf
 
At least, that's the state of things before OmniCISA ("Cybersecurity Act of 2015"). Section 104 (a) (1) says:
Notwithstanding any other provision of law, a private entity may, for cybersecurity purposes, monitor ... an information system of such private entity;
In other words, regardless of other laws, you may monitor your computers (including the network) for the purpose of cybersecurity.

As I read OmniCISA, I see that the intent is just this, to clarify that what organizations are already doing is in fact legal. When I read the text of the bill, and translate legalese into technology, I see that what it's really talking about is just standard practice of monitoring log files and operating IDSs, IPSs, and firewalls. It also describes the standard practice of outsourcing security operations to a managed provider (the terms we would use, not how the bill described it). Much of what we've been doing is ambiguous under the law, since it's confusing as heck, so OmniCISA clarifies this.

Thus, the argument about whether packet-sniffing was legal before is now moot: according to OmniCISA, you can now packet-sniff your networks for cybersecurity, such as using IDSs.

Monday, February 01, 2016

Some notes on the Norse collapse

Recently, cybersec company "Norse Security" imploded. Their leaders and most the employees were fired, and their website is no longer available. I thought I'd write up some notes on this.

All VC-funded startups are a scam

Here's how VCs think. They see that there is a lot of industry buzz around "threat intel". They'll therefore fund a company in that space. This company will spend a 5% of that money to create a cool prototype, and 95% in marketing and sales. They'll have fancy booths at trade shows. They'll have a PR blitz to all the reporters who cover the industry. They'll bribe Gartner to be named a Cool Vendor or Magic Quadrant Leader. They'll win industry kudos. They have some early sales 'wins' with some major customers. These customers will give glowing reviews of the product they bought -- even before turning it on.

In other words, it's a perfect "Emperor Has No Clothes" story, where neither customers, nor Gartner, nor the press is competent to realize the Emperor is not wearing clothes.

VCs know it's a scam, but they are hoping it'll become real. As a well-known leader in this space, employees with the needed expertise will flock to the company. Or, they'll find other another company (often started by engineers instead of sales/marketing) that has a real product, and buy it out. What was once snake oil thus becomes something real, eventually.

The entire tech industry is built this way, not just infosec. VCs invest in sales, marketing, and operations people who can build a brand, channels, and competently manage people and money flows. They see those skills as the rare ones, and technical expertise as more a fungible quantity that can be acquired later, for simple wages rather than large amounts of stock in the company.

Norse was especially scammy-looking, with their real time map of attacks on the Internet. It was really cool, and everybody enjoyed looking at it, but nobody could figure out what value it had. It quickly obtained a reputation of snake oil.

It's rarely all snake oil

As a tech expert, I've looked into the details of infosec products. I usually find something really cool, something great.

But that "thing" is narrow. The market for that thing is too small to build a large company. The 'snake oil' bit comes from trying to make this small thing appear larger than it really is, to sell to a wider market.

Indeed, all companies do this, regardless of product. A great example is anti-virus company. They each have great technologies, and are useful to some extent, but still cannot detect advanced viruses. Their hype overstates their efficacy. But it's not necessarily their fault. Their customers are unable to understand the technical features of their products, and use it properly, exploiting what makes their technology great. You can't expect companies to communicate better with customers when the customers are unable to understand.

I don't know what technology Norse had. I assume there was something great underneath it all -- but not something useful to the larger market.

Threat intel is particularly hard to productize

All cybersecurity technologies are hard to productize, but threat intel even more so. The reality is that you can't see threats coming.

If it's somebody attacking the entire Internet, looking for low hanging fruit to exploit (mass port scanning, mass phishing attacks, etc.), then threat intelligence can certainly warn you of the impending attack. But the proper response to such intelligence is to ignore it. You can't get your underwear in a bunch over every such attack -- you'll just waste a lot of time and energy responding to attackers who aren't really a threat to you. I watch people get upset over my own mass scans, and I have to laugh, because they are doing infosec wrong. Scan yourself for low hanging fruit (indeed, use my tool), but ignore such attackers.

Conversely, when you are targeted, hackers come in low and slow, and will often evade the radar. As a pentester, I notice this. Even when they have "appliances" designed to detect me, I still get away with silent penetration. Defending the network isn't a product you can buy. You should be managing your network, like getting email and paged whenever a privileged domain-admin account is created. You shouldn't be buying magic pills that will somehow solve this "threat intelligence" problem for you. The intelligence you get from existing logs and firewalls is often enough.

I've been doing "threat intel" for 20 years. I still don't know how to make it into a product that will appeal to a large market, which is why I haven't founded a company trying to commercialize the technology.

All VC companies rush toward the cliff

Norse spectacularly imploded, suddenly firing a bunch of people and taking their website offline.

From one perspective, this is normal. It's how VC funding works. When VCs give you money, they want you to spend it all. They don't want you to save the money.

It's the hardest thing for people to understand about startups. They think in terms of their own finances. They want to save money for a rainy day, in case things don't go as plan. That's not the purpose of venture capital. Instead, it's a "venture" that will either succeed or fail. If, in the end, you can't figure out how to create a business out of the venture, then shut it down and sell off what little assets remain.

A zombie company remaining barely alive is no different than a failed company from an investor's point of view. Either way, it's not going to generate profits that can pay back the original investment.

You think yea, but maybe after a few years of zombie existence, they'll eventually get lucky. No, this isn't how business works. In a few years, technology changes, and will require a new investment, a new venture to promote that new technology. You would never give that new investment to a zombie company, which is weighed down by other concerns. Instead, you'd give that investment to a new company that can focus on it.

In mature markets, market share doesn't change very fast. You see that in the car industry, for example. Ventures are land grabs in new markets, trying to establish market share before the new market becomes mature. If your zombie company failed to get market share, then it's never going to win more.

Thus, in a new market, the goal is to invest money as fast as possible to achieve size and market share. If you fail, then fail quickly and move on. Don't linger.

Destiny is acquisition, not implosion

Norse imploded, abruptly firing their employees and shutting down their website. That's rare. It means the VCs weren't paying attention.

In the normal course of events, companies don't implode like this. If they run out of cash, they'll go back to the VCs for more -- enough to sell off the company to somebody else.

The VCs give companies a couple chances. The first chance will likely fail, but along the way, the company will have built up things like brand awareness and market share. A second round will come in, retool the company, replace the leadership, and try a second time.

Then the last round of investment comes. If the company was successful, then the last round is to pay for all the costs needed to take the company public. More often, the company has failed, run out of money. At this point, the VCs invest to slap a new coat of paint and sell it off to some sucker.

Acquisition aren't always for this reason. Sometimes is a fast growing company being wildly successful, so a larger company buys them out before their competitor can get bigger.

Sometimes companies are acquired for even stranger reasons. At larger companies, when an executive leaves, and a new executive takes power, they are always frustrated with the organization beneath them. The new executive is an outsider, and the organization underneath opposes their orders. Not outright, of course, but passive-aggressively. Therefore, what the executive does is buy a company, then use this "one time event" to replace the managers underneath them with managers from the new company. If you look at how a lot of acquisitions happen, it appears from the outside as if the smaller company acquired/hijacked the larger company.

The point is that companies should never actually implode. There's value there to be exploited. VCs should come in with a "down round" that takes the majority of ownership in the company, slap some lipstick onto the pig, and sell it off to some sucker.

By the way, as outsiders, we really can't see what's happening in acquisitions. Sometimes it's because the companies were successful, and it's an up-round where early employees profit heavily form their stock options. Sometimes it's a down-round, where except for the founders, the options are worthless. When the company your friend works for gets acquired, you don't know what happened. It's usually announced in such a way you think congratulations are in order, but in fact condolences are.

Conclusion

As you can see, I have a low opinion of cybersecurity products in general, and threat intel in particular. I see them all going the way of Norse -- not actually imploding, but being gobbled up by bigger companies and disappearing form the landscape as separate entities.






Wednesday, January 27, 2016

Net ring-buffers are essential to an OS

Even by OpenBSD standards, this rejection of 'netmap' is silly and clueless.

BSD is a Linux-like operating system that powers a lot of the Internet, from Netflix servers to your iPhone. One variant of BSD focuses on security, called "OpenBSD". A lot of security-related projects get their start on OpenBSD. In theory, it's for those who care a lot about security. In practice, virtually nobody uses it, because it makes too many sacrifices in the name of security.

"Netmap" is a user-space network ring-buffer. What that means is the hardware delivers network packets directly to an application, bypassing the operating system's network stack. Netmap currently works on FreeBSD and Linux. There are projects similar to this known as "PF_RING" and "Intel DPDK".


The problem with things like netmap is that it means the network hardware no longer is a shareable resource, but instead must be reserved for a single application. This violates many principles of a "general purpose operating system".

In addition, it ultimately means that the application is going to have to implement it's own TCP/IP stack. That means it's going to repeat all the same mistakes of the past, such as "ping of death" when a packet reassembles to more then 65536 bytes. This introduces a security problem.

But these criticisms are nonsense.

Take "microkernels" like Hurd or IBM mainframes. These things already put the networking stack in user space, for security reasons. I've crashed the network stack on mainframes -- the crash only affects the networking process and not the kernel or other apps. No matter how bad a user-mode TCP/IP stack is written, any vulnerabilities affect just that process, and not the integrity of the system. User-mode isolation is a security feature. That today's operating-systems don't offer user-mode stacks is a flaw.

Today's computers are no longer multi-purpose, multi-user machines. While such machines do exist, most computers today are dedicated to a single purpose, such as supercomputer computations, or a domain controller, or memcached, or a firewall. Since single-purpose, single-application computers are the norm, "general purpose" operating systems need to be written to include that concept. There needs to be a system whereby apps can request exclusive access to hardware resources, such as GPUs, FPGAs, hardware crypto accelerators, and of course, network adapters.

These user-space ring-mode network drivers operate with essentially zero overhead. You have no comprehension of how fast this can be. It means networking can operating 10 times to even a 100 times faster than trying to move packets through the kernel. I've been writing such apps for over 20 years, and have constantly struggled against disbelief as people simply cannot believe that machines can run this fast.

In todays terms, it means it's relatively trivial to use a desktop system (quad-core, 3 GHz) to create a 10-gbps firewall that passes 30 million packets/second (bidirectional), at wire speed. I'm assuming 10 million concurrent TCP connections here, with 100,000 rules. This is between 10 and 100 times faster than you can get through the OpenBSD kernel, even if you simply configured it to simply bridge two adapters with no inspection.

There are many reasons for the speed. One is hardware. In modern desktops, the 10gbps network hardware DMAs the packet directly into the CPU's cache -- actually bypassing memory. A packet can arrive, be inspected, then forwarded out the other adapter before the cache writes back changes to DRAM.

Another reason is the nature of ring-buffers themselves. The kernel's drivers also use ring-buffers in the Ethernet hardware drivers. The problem is that the kernel must remove the packet from the driver's ring-buffers, either by making a copy of it, or by allocating a replacement buffer. This is actually a huge amount of overhead. You think it's insignificant, because you compare this overhead with the rest of kernel packet processing. But I'm comparing it against the zero overhead of netmap. In netmap, the packet stays within the buffer until the app is done with it.

Arriving TCP packets perform a long "pointer walk", following a chain of pointers to get from the network structures to file descriptor structures. At scale (millions of concurrent TCP connections), these things no longer fit within cache. That means each time you follow pointer you cause a cache miss, and must halt and wait 100 nanoseconds for memory.

In a specialized user-mode stack, this doesn't happen. Instead, you put everything related to the TCP control block into a 1k chunk of memory, then pre-allocated an array of 32 million of them (using 32-gigs of RAM). Now there is only a single cache miss per packet. But actually, there are zero, because with a ring buffer, you can pre-parse future packets in the ring and issue "prefetch" instructions, such that the TCP block is already in the cache by the time you need it.

These performance issues are inherent to the purpose of the kernel. As soon as you think in terms of multiple users of the TCP/IP stack, you inherently accept processing overhead and cache misses. No amount of optimizations will ever solve this problem.

Now let's talk multicore synchronization. On OpenBSD, it rather sucks. Adding more CPUs to the system often makes the system go slower. In user-mode stacks, synchronization often has essentially zero overhead (again that number zero). Modern network hardware will hash the address/ports of incoming packets, giving each CPU/thread their own stream. Thus, our hypothetical firewall would process packets as essentially 8 separate firewalls that only rarely need to exchange information (when doing deep inspection on things like FTP to open up dynamic ports).


Now let's talk applications. The OpenBSD post presumes that apps needing this level of speed are rare. The opposite is true. They are painfully common, increasingly becoming the norm.

Supercomputers need this for what they call "remote DMA". Supercomputers today are simply thousand of desktop machines, with gaming graphics cards, hooked up to 10gbps Ethernet, running in parallel. Often one process on one machine needs to send bulk data to another process on another machine. Normal kernel TCP/IP networking is too slow, though some now have specialized "RDMA" drivers trying to compensate. Pure user-space networking is just better.

My "masscan" port scanner transmits at a rate of 30 million packets per second. It's so fast it'll often melt firewalls, routers, and switches that fail to keep up.

Web services, either the servers themselves, or frontends like varnish and memcached, are often limitted by the kernel resources, such as maximum number of connections. They would be vastly improved with user-mode stacks on top of netmap.

Back in the day, I created the first "intrusion prevention system" or "IPS". It ran on a dual core 3 GHz machine at maximum gigabit speeds, including 2 million packets-per-second. We wrote our own user-space ring-buffer driver, since things like netmap and PF_RING didn't exist back then.

Intel has invested a lot in its DPDK system, which is a better than netmap, for creating arbitrary network-centric devices out of standard desktop/server systems. That we have competing open-source ring-buffer drivers (netmap, PF_RING, DPDK) plus numerous commercial versions means that there is a lot of interest in such things.

Conclusion

Modern network machines, whether web servers or firewalls, have two parts: the control-plane where you SSH into the box and manage it, and the data-plane, which delivers high-throughput data through the box. These things have different needs. Unix was originally designed to be a control-plane system for network switches. Trying to make it into a data-plane system is 30 years out of date. The idea persists because of the clueless thinking as expressed by the OpenBSD engineers above.

User-mode stacks based on ring-buffer drivers are the future for all high-performance network services. Eventually OpenBSD will add netmap or something similar, but as usually, they'll be years behind everyone else.

How not to be a better programmer

Over at r/programming is this post on "How to be a better programmer". It's mostly garbage.


Don't repeat yourself (reuse code)


Trying to reuse code is near the top of reasons why big projects fail. The problem is that while the needs of multiple users of a module may sound similar, they are often different in profound ways that cannot be reconciled. Trying to make the same bit of code serve divergent needs is often more complex and buggy than multiple modules written from the ground up for each specific need.

Yes, we adhere to code cleanliness principles (modularity, cohesion) that makes reuse easier. Yes, we should reuse code when the needs match close enough. But that doesn't mean we should bend over backwards trying to shove a square peg through a round hole, and the principle that all pegs/holes are the same.


Give variables/methods clear names


Programmers hate to read other code because the variable names are unclear. Hence the advice to use "clear names" that aren't confusing.

But of course, programmers already think they are being clear. No programmer thinks to themselves "I'm going to be deliberately obtuse here so that other programmers won't understand". Therefore, telling them to use clear names won't work, because they think they already are doing that.

The problem is that programmers are introverts and fail to put themselves in another's shoes, trying to see their code as others might. Hence, they fail to communicate well with those other programmers. There's no easy way to overcome this. Those of us who spend a lot of time reading code just have to get use to this problem.

One piece of advice is to make names longer. Cryptographers write horrible code, because they insist on using one letter variable names, as they do in mathematics. I've never had a problem with names being too long, but names being too short is a frequent problem.

One of the more clueful bits of advice I've heard is that variable names should imply their purpose, not the other way around. Too often, programmers choose a name that makes sense once you know what the variable is, but tells you nothing about variable if you don't already know what it is.


Don't use magic numbers or string literals


Wrong. There are lots of reasons to use magic numbers and literals.

If I'm writing code to parse external file-formats or network-protocols, then the code should match the specification. It's not going to change. When checking the IP (Internet Protocol) header to see if it's version 4, then using the number '4' is perfectly acceptable. Trying to create a constant, such as enum {IpVersionFour = 4}; is moronic and makes the code harder to read.

While it's true that newbie programmers often do the wrong kind of magic numbers, that doesn't apply to good programmers. I see magic numbers all the time in Internet code, and they almost always make the code easier to understand. Likewise, I frequently see programmers bend over backwards to avoid magic numbers that makes the code harder to read.

In short, if you are an experienced programmer, ignore this dictum.


Don't be afraid to ask for help


Oh, god, the horror. Engineering organizations are divided into the "helper" and "helpee" sections. The "helpees" are chronically asking for help, to the point where they are basically asking better programmers to finish and debug their code for them.

Asking for help is a good thing if, when reading a book on a technical subject (networking, cryptography, OpenCL, etc.) , you want the local expert in the subject to help overcome some confusion. Or, it's good to ask for help on how to use that confusing feature of the debugger.

But stop asking for others to do your work for you. It's your responsibility to debug your own code. It's your responsibility to be an expert in the programming language you are using. It's your responsibility for writing the code, unit tests, and documentation.


If you see some buggy or messy code, fix it


No, no, no, no.

This advice only makes sense in modules that already have robust unit/regression that will quickly catch any bugs introduced by such cleanups. But if the code is messy, then chances are the tests are messy to.

Avoid touching code that doesn't have robust tests. Instead, go in and write those unit tests. Unstable code prone to bugs can remain so when the tests are robust. The tests act as a safety net, preventing bugs from appearing.

Only once the unit/regression tests are robust can you start doing arbitrary cleanups.


Share knowledge and help others


This is bad for several reasons.

When programmers don't complete their code on schedule (i.e. the norm), one of their excuses is that they were helping others.

Engineering organizations are dominated by political battles as engineers fight for things. This often masquerades as "sharing knowledge", as you help others understand the power of LISP over C++, for example.

As pointed out above, the lazy/bad programmers will exploit your good nature to shift their responsibilities onto you. That's toxic bad.

The upshot is this. You have a job to complete your code on schedule. Only once you have done that do you have done that, then you've got time to become a subject matter expert in something (networking, crypto, graphics), and have time to share your expertise on these subjects with other.


Conclusion

Beware anything that boils programming down to simple rules like "don't use magic numbers". Code is more subtle than that.

The way to become a better programmer is this: (1) write lots of code, (2) work on big projects (more than 10kloc), (3) spend more time reading open-source. Over time, you'll figure out for yourself what to do, and what not to do.

Monday, January 18, 2016

Flawed From the Start & Missing the Mark: Georgia's Proposed Anti-Drone Legislation

Bad state laws can have the same chilling effect on technology as bad federal laws.  In this guest post, friend of Errata Elizabeth Wharton (@lawyerliz) discusses the latest anti-drone law introduced here in the Georgia legislature and how one bill manages to kill innovation across several key Georgia industries. 




By Elizabeth Wharton 
Georgia’s newly proposed anti-drone legislation is an economic and research buzz kill.  The bill, HB 779, through poorly crafted provisions places unnecessary red tape for use of drones by the film industry and by cellular, telephone, and cable utility companies.  It also completely shuts down Georgia's aerospace defense industry research (and related funding) conducted by universities including Georgia Tech and all related manufacturing by companies such as Lockheed Martin.  Biting the industry hands that bring billions of dollars into Georgia’s economy seems a bold move for state legislators, particularly during an election year.    

Gaps between technology policy and technology practice at the federal level such as the Commerce Department’s proposed Wassenaar Arrangement rules, extend to the states as well.  With over 168 drone-related bills considered by 45 states in 2015 according to the National Conference of State Legislatures, 2016 is already off to a quick start.  California lawmakers want to require "tiny" drone license plates and for operators to leave their contact information behind after an "accident." In the latest policy disconnect, the devil went down to Georgia but had to leave his Star Wars X-Wing Fighter drone at home (it included a replica of a weapon), he faced jail time for his school-sponsored drone research project, and couldn't fly his other drones for fear of inadvertently capturing RF signals from a neighbor’s iPad.  When your legislative goal is to encourage economic development in the aerospace and technology industries and the end result has the exact opposite effect, this is a failure.

After spending four months of meetings and hearings on the "Use of Drones" in Georgia, members of a Georgia House Study Committee introduced House Bill 779.  The Committee's Final Report (copy available here) recommend that commercial uses of drones should not be "over regulated at the state level" and the state should "avoid passing legislation which might ….. cause the process to be more onerous and thus drive business to other states." Further, "Georgia's goal is to remain competitive and to allow for expansion of this industry…" 

Georgia's film industry generated a $6 billion impact on Georgia's economy in 2015, the aerospace industry had a total economic impact of $50.8 billion in 2013 accounting for 5.3% of the state's GDP.  Transportation logistics also plays a key part in Georgia's economy, Atlanta is home to the busiest passenger airport in the world and Savannah boasts the 4th largest and fastest growing container port in the U.S.  Georgia has heavily recruited telephone and cable service providers to roll out new products such as Google Fiber and Comcast's Ultra-Fast Internet within Georgia before doing so in other states. Several of the key service providers and experts in each of these industries testified or otherwise met with the members of the Study Committee that crafted HB 779, explaining their existing uses and the potential beneficial applications of unmanned aircraft systems. 

HB 779 provides that it will regulate the use of unmanned aircraft systems and the resulting captured images, prohibit operations in connection with hunting and fishing, and to prohibit the possession of, operation of, manufacturing of, and transportation of unmanned aircraft systems with a weapon attached.  

In its current form, HB 779 will halt or chill use of drones for film projects and safety inspections, shut down ongoing university research projects, and drive out manufacturing and shipping of aerospace drone equipment. 

Using words without understanding their application within the technology spells trouble.    

At the heart of the main provisions in HB 779 is its definition of "image." "Image" is broadly defined to include electromagnetic waves and "other conditions existing on or about real property in this state or an individual located on such property."  HB 779 would prohibit using an unmanned aircraft system "to capture an image of a private place or an individual in a private place," knowingly using an image in a manner prohibited by the statute, possessing an image known to have been captured in violation of the statute, and disclosing, distributing, or otherwise using an image known to have been captured in violation of the statute. 

This definition of "image" and resulting application within the statute becomes problematic in part within the context of how unmanned aircraft systems, cell phones, and all other "connected devices" function.  In each instance, the devices use some form of electromagnetic wave to communicate and connect.  These radio frequency (RF) signals are constantly being sent and received.  The resulting communication data is automatically transmitted and saved by the devices.  The Federal Communications Commission (FCC) deems the RF signals from the fitness tracker around my wrist or the signals sent from an individual’s pacemaker, for example, to be one and the same as the individual.  Here, the RF signal from my fitness tracker captured by an unmanned aircraft system flying overhead could expose the drone's operator to civil penalties when they sync and send the flight data.  Each captured signal (image) equates to a separate offense under the language of HB 779.

Georgia isn't the only state to trip over this concept, legislation passed in Florida, Texas, and other states also use a similar definition of "image." Cutting and pasting from other state’s legislation does not always equal good policy, here it perpetuates the use of inaccurate technology terminology. 

Adding Hurdles & Increasing Costs on Georgia's Film Industry and Technology-Related Utility Companies

In addition to missing the basic underlying technology mark, HB 779's definition of "image" and "private place" creates costly hurdles for film, cable utility companies, and telephone communication utility companies. HB 779 carves out liability protections for images captured in connection with specific projects but as with legislation passed in other states, exception lists always overlook a few.  The list of exemptions here includes law enforcement, electric or natural gas utility providers, fire-fighting operations, real estate sales and financing, and rescue operations. Noticeably absent, television or film production uses and inspection and maintenance operations by telephone, cable, or cell phone tower companies (all key industries in Georgia).

Use of unmanned aircraft systems outside of the exemption list requires the extra time and expense of tracking down every person and property owner whose image has been captured during the flight.  Without such consent, the image must be immediately deleted or face civil penalties.  The penalties accumulate per for each image, quickly adding up.  For example, a television crew or film company captures footage of a condominium high-rise. Each condo unit within the building is a separate parcel of real property.  Under HB 779, the company would have to contact every single condominium owner whose property could be clearly seen in film footage or risk civil penalties from the homeowners if they do not start all over and reshoot the footage without a clear picture of the building.  Cell phone tower operators and cable line operators would have to obtain permission from every property owner and person along their infrastructure lines or the areas surrounding their towers at that particular flight time prior to using for inspections and repairs. 

A weapon ban heard around the state, halting all research, manufacturing, and shipping throughout Georgia's aerospace defense industry.

Singlehandedly shutting down an entire (and growing) sector of the aerospace defense industry within the state should raise a few eyebrows, particularly for legislators who represent districts that count the research institutions, aviation manufacturers, or logistics hubs among their constituents or supporters.  Under HB 779, any sale, transportation, manufacturing, possession, or operation of unmanned aircraft systems that have been equipped with a "weapon" would constitute a felony, punishable by up to 3 years in prison and a fine of up to $100,000.  "Weapon" is defined to include a device or object that could cause or looks like it could cause or that is a replica of something that could cause serious bodily injury against a person.  Shipping a drone with a replica of a weapon (think the Star Wars themed X-Wing Fighter toy drones) or the perception that it could be a weapon on the drone is enough to trigger jail under HB 779.  The proposed ban contains zero exceptions and zero exemptions.  

Eight of the top 10 defense contractors in the country have operations within Georgia according to the Georgia Department of Economic Development.  Georgia universities and colleges including Georgia Institute of Technology and Middle Georgia State University receive research funding grants for the development and testing of defense-related projects.  The Port of Savannah is shipping hub, equipment arriving into the port is then transported through Georgia on its way to the final destination (civil or military).  Georgia Tech students use Fort Benning facilities for their drone research.   Moody Air Force Base in Valdosta, GA is home to several cutting-edge unmanned aircraft technology projects.  Contractors, students, and other civilian suppliers transporting unmanned aircraft systems to and from the military installations using Georgia roads, rail, or airways would be jailed and fined. Lockheed Martin would be grounded from manufacturing or shipping most of its unmanned aircraft systems in and through Georgia.  Not exactly the welcome mat that the Georgia Center of Innovation in Aerospace has been marketing.

Go back to the drawing board, Georgia (and quit copying from other state's bad legislation).

When legislation harms your state’s economic drivers and grounds Star Wars toys, then aerospace manufacturers, research institutions, electric and communications providers, transportation logistics companies, and Georgia voters take notice.  HB 779 cuts off the hand that provides 5.3% of Georgia’s GDP and slices the fingers from the other hand that represent the state’s main economic development priorities all in one fell swoop.  Go back to the drawing board Georgia, and this time don't copy off the flawed legislative papers from surrounding states.


Elizabeth is a business and policy attorney specializing in information security and unmanned systems.  While Elizabeth is an attorney, nothing in this post is intended as legal advice.  If you need legal advice, get your own lawyer.

Saturday, January 16, 2016

Some notes C in 2016

On r/programming was this post called "How to C (as of 2016)". It has some useful advice, but also some bad advice. I thought I'd write up comments on the topic. As somebody mentioned while I was writing this, only responsible programmers should be writing in C. Irresponsible programmers should write other languages that have more training wheels. These are the sorts of things responsible programmers do.


Use a debugger

The #1 thing you aren't doing, that you should be doing, is stepping through each line of code in a source level debugger as soon as you write it. If you only pull out the debugger to solve particularly difficult problems, then you are doing it wrong.

That means using an IDE like Visual Studio, XCode, or Eclipse. If you are only using an editor (without debugging capabilities), you are doing it wrong. I mention this because so many people are coding in editors that don't have debuggers. I don't even.

It's a concern for all language, but especially with C. When memory gets corrupted, you need to be able to dump structures and memory in order to see that. Why is x some weird value like 37653? Using printf() style debugging won't tell you, but looking at the hexdump of the stack will clearly show you how the entire chunk of memory was overwritten.


And debug your own code


Because C has no memory protection, a bug in one place can show up elsewhere, in an unrelated part of code. This makes debugging some problems really hard. In such cases, many programmers throw up their hands, say "I can't fix this", and lean on other programmers to debug their problem for them.

Don't be that person. Once you've gone through the pain of such bugs, you quickly learn to write better code. This includes better self-checking code that makes such bugs show up quicker, or better unit tests that cover boundary cases.


Code offensively

I once worked on a project where the leaders had decided to put "catch(...)" (in C++) everywhere, so that the program wouldn't crash. Exceptions, even memory corruption, would be silently masked an the program would continue. They thought they were making the code more robust. They thought it was defensive programming, a good principle.

No, that isn't defensive programming, just stupid programming. I masks bugs, making them harder to find in the long run.

You want to do the reverse. You want offensive code such that bugs cannot survive long undetected.

One way is assert(), double checking assumptions that you know must always be true. This catches bugs before they have a chance to mysteriously corrupt memory. Indeed, when debugging mystery C bugs, I'll often begin by adding assert() everywhere I suspect their might be a problem. (Although, don't go overboard on asserts)

The best offensive coding is unit tests. Any time something is in doubt, write a unit test that stresses it. The C language has a reputation for going off in the weeds when things scale past what programmers anticipated, so write a test for such cases.


Code for quality


On a related note, things like unit tests, regression tests, and even fuzz testing are increasingly becoming the norm. If you have an open-source project, you should expect to have "make test" that adequately tests it. This should unit test the code with high code coverage. It's become the standard for major open-source projects. Seriously, "unit test with high code coverage" should be the starting point for any new project. You'll see that in all my big open-source projects, where I start writing unit tests early and often (albeit, because I'm lazy, I have inadequate code coverage).

AFL fuzzer is relatively new, but it's proving itself useful at exposing bugs in all sorts of open-source projects. C is the world's most dangerous language for parsing external input. Crashing because of a badly formatted file or bad network packet was common in the past. But in 2016, such nonsense is no longer tolerated. If you aren't nearly certain no input will crash your code, you are doing it wrong.

And, if you think "quality" is somebody else's problem, then you are doing it wrong.


Stop with the globals

When I work with open-source C/C++ projects, I tear my hair out from all the globals. The reason your project is tough to debug and impossible to make multithreaded is that you've littered with global variables. The reason refactoring your code is a pain is because you overuse globals.

There's occasionally a place for globals, such the debug/status logging system, but otherwise it's a bad, bad thing.


A bit OOP, a bit functional, a bit Java

As they say, "you can program in X in any language", referring to the fact that programmers often fail to use the language as it was intended, but instead try to coerce it into some other language they are familiar with. But that's just saying that dumb programmers can be dumb in any language. Sometimes counter-language paradigms are actually good.

The thing you want from object-oriented programming is how a structure conceptually has both data and methods that act on the data. For  struct Foobar, you create a series of functions that look like foo_xxxx(). You have a constructor foo_create(), a destructor foo_destroy(), and a bunch of functions get act on the structure.

Most importantly, when reasonable, define struct Foobar in the C file, not the header file. Make the functions public, but keep the precise format of the structure hidden. Everywhere else refer to the structure using forward references. This is especially important for libraries, where exporting their headers destroys ABI compatibility (because structure size changes). If you must export the structure, then put a version or size as its first parameter.

No, the goal here isn't to emulate the full OOP paradigm of inheritance and polymorphism. Instead, it's just a good way of modularizing code that's similar to OOP.

Similarly, there some good ideas to pull from functional programming, namely that functions don't have "side effects". They consume their inputs, and return an output, changing nothing else. The majority of your functions should look like this. A function that looks like "void foobar(void);" is the opposite of this principle, being a side-effect only function.

One area of side-effects to avoid is global variables. Another is system calls that affect the state of the system. Globals are similar to deep variables within structures, where you call something like "int foobar(struct Xyz *p);" that hunts deeply in p to find the parameters it acts on. It's better to bring them up to the top, such as calling "foobar(p->length, p->socket->status, p->bbb)". Yes, it makes the parameter lists long and annoying, but now the function "foobar()" depends on simple types, not a complex structure.

Part of this functional attitude to programming is being aggressively const correct, where pointers are passed in as const, so that the function can't change them. It communicates clearly which parts are output (the return value and non-const pointers), and which are the inputs.

C is a low-level systems language, but except for dire circumstances, you should avoid those C-isms. Instead of C specific, you should write your code in terms of the larger C-like language ecosystem, where code could be pasted into JavaScript, Java, C#, and so on.

That means no pointer arithmetic. Yes, in the 1980s, this made code slightly faster, but since 1990s, it provides no benefit, especially with modern optimizing compilers. It makes code hard to read. Whenever there is a cybersecurity vulnerability in open-source (Hearbleed, Shellshock, etc.), it's almost always in pointer-arithmetic code. Instead, define an integer index variable, and iterate through arrays that way -- as if you were writing this in Java.

This ideal also means stop casting structs/integers for network-protocol/file-format parsing. Yes, that networking book you love so much taught you to do this, using things like "noths(*(short*)p)", but it was wrong then when the book as written, and is wronger now. Parse the integers like you would have to in Java, such as "p[0] * 256 + p[1]". You think casting a packed structure on top of data to parse it is the most "elegant" way of doing it, but it's not.


Ban unsafe functions


Stop using deprecated functions like strcpy() and spritnf(). When I find a security vulnerability, it'll be in these functions. Moreover, it makes your code horribly expensive to audit, because I'm going to have to look at every one of these and make sure you don't have a buffer-overflow. You may know it's safe, but it'll take me a horrendously long time to figure out for myself. Instead, use strlcpy()/strcpy_s(), and snprintf()/sprintf_s().

More generally, you really need to be really comfortable knowing what both a buffer-overflow and integer-overflow is. Go read OpenBSD's reallocarray(), understand why it solves the integer-overflow problem, then using it instead of malloc() in all your code. If you have to, copy the reallocarray() source from OpenBSD and stick it in your code.

You know how your code mysteriously crashes on some input? Unsafe code is probably the reason. Also, it's why hackers break into your code. Do the right things, and these problems disappear.

The "How to C" post above tells you to use calloc() everywhere. That's wrong, it still leaves open the integer overflow bug on many platforms. Also, get used to variable-sized thingies, which means using realloc() a lot -- hence reallocarray().

There's much more to writing secure code, but if you do these, you'll solve most problems. In general, always distrust input, even when it's from a local file or USB port you control.


Stop with the weird code


What every organization should do is organize an after-work meeting, where anybody can volunteer to come and hash out an agreement for a common style-guide for the code written in the organization. Then fire every employee who shows up. It's a stupid exercise.

The only correct "style" is to make code look unsurprisingly like the rest of the code on the Internet. This applies to private code, as well as any open-source you do. The only decision you have to make is to pick an existing, well-known style guide to follow, like Linux, BSD, WebKit, or Gnu.

The nice thing about other languages, especially Python, is that there isn't the plethora of common styles like there is in C. That was one of the shocking things about the Hearbleed vulnerability, that OpenSSL uses the "Whitesmiths" style of braces, which was at one time common but is now rare and weird. LibreSSL restyled it to the BSD format. That's probably a good decision: if your C style is fringe/old, it may be worth restyling it to something common/average.

You know that really cool thing you've thought of, and think everyone will adopt once they see the beauty of it in your code? Yea, remove that thing, it just pisses everyone off. Or, if you must use that technique (it happens sometimes), document it


The future is multicore


CPUs aren't going to get any faster. Instead, we'll increasingly get more CPU cores on the chip. That doesn't mean you need to worry about making your code multithreaded today, but it means you should probably consider what that might look like in the future.

No, mutexes and critical sections don't make your code more multicore. Sure, they solve the safety problem, but an enormous cost to performance. It means your code might get faster for 2 or 3 cores, but after that, adding cores will instead make your software go slower. Fixing this, getting multicore scalability is second only to security in importance to the C programmer today.

One of these days I'll write a huge document on multicore scalability, but in the meanwhile, just follow the advice above: get rid of global variables and the invisible sharing of deep data structures. When we have to go in later and refactor the code to make it scale, our jobs will be significantly easier.


Stop using true/false for success/failure

The "How to C" document tells you that success always means true. This is garbage. True means true, success means success. They don't mean each other. You can't avoid the massive code out there that returns zero on success and some other integer for failure.

Yea, it sucks that there is no standard, but there is never going to be one. Instead, the wrong advice of the "How to C" doc is a good example of the "Stop with weird code" principle. The author thinks if only we could get everyone to do it his way, if we just do it hard enough in our own code to enlighten others, then the problem is going to be solved. That's bogus, programmers aren't ever going to agree on the same way. Your code has to exist in a world where where ambiguity exists, where both true and 0 are common indicators of success, despite being opposite values. The way to do that is unambiguously define SUCCESS and FAILURE values.

When code does this:

   if (foobar(x,y)) {
      ...;
   } else {
      ...;
   }

There's no way that I, reading your code, can easily know which case is "success" and which is failure. There are just too many standards. Instead, do something like this:

   if (foobar(x,y) == Success) {
      ...;
   } else {
      ...;
   }


The thing about integers

The "How to C" guide claims there's no good reason to use naked "int" or "unsigned", and that you should use better defined types like int32_t or uint32_t. This is nonsense. The input to many library functions is 'int' or 'long' and compilers are getting to be increasingly type-savvy, warning you about the difference even when both are the same size.

Frankly, getting integers 'wrong' isn't a big source of problems. That even applies to 64-bit and 32-bit issues. Yes, using 'int' to hold a pointer will break 64-bit code (use intptr_t or ptrdiff_t or size_t instead), but I'm astonished how little this happens in practice. Just mmap() the first 4-gigabytes as invalid pages on startup and run your unit/regression test suite, and you'll quickly resolve any problems. I don't need to really recommend to you how best to fix it.

But the biggest annoyance in code is that programmers want to redefine integer types. Stop doing that. I know having "u32" in your code makes it pretty for you, but it just makes it more annoying for me, who has to read your code. Please use something standard, such as "uint32_t" or "unsigned int". Worse, don't arbitrarily create integer types like "filesize". I know you want to decorate this integer with additional meaning, but the point of C programming is "low level", and this just annoys the heck out of programmers.

Use static and dynamic analysis

The old hotness in C was "warning levels" and "lint", but in modern C we have "static analysis". Clang has dramatically increased the state-of-the-art of the sorts of things compilers can warn about, and gcc is busy catching up. Unknown to many, Microsoft has also had some Clang-quality statis-analysis in its compilers. XCode's ability to visualize Clang's analysis is incredible, though you get the same sort of thing with Clang's web tools.

But that's just basic static analysis. There are many security-focused tools that take static analysis to the next level, like Coverity, Veracode, and HP Fortify.

All these things produce copious "false positives", but that's a misnomer. Fixing code to accommodate the false positive cleans up the code and makes it dramatically more robust. In other words, such things are often "this code is confusing", and the solution is to clean it up. Coding under the constraint of a static analyzer makes you a better programmer.


Dependency hell

In corporations, after a few years, projects become unbuildable, except on the current build system. There's just too many, poorly documented, dependencies. One company I worked for joked they should just give their source to their competitors, because they'd never be able to figure out how to get the thing to build.

And companies perpetuate this for very good sounding reasons. Another company proposed standardizing on compiler versions, to avoid the frequent integration problems from different teams using different compilers. But that's just solving a relatively minor problem by introducing a major problem down the road. Solving integration problems keeps the code healthy.

Open-source has related problems. Dependencies are rarely fully document, and indeed are often broken. Many is the time you end up having to install two incompatible versions of the same dependency in order to get the resulting code finally compiled.

The fewer the dependencies, the more popular the code. There are ways to achieve this.

  • Remove the feature. As a whole, most dependencies exist for some feature that only 1% of the users want, but which burden the other 99% of the user base.
  • Include just the source file you need. Instead of depending on the entire OpenSSL library (and its dependencies), just include the sha2.c file if that's the only functionality from OpenSSL that you need.
  • Include their entire source in your tree. For example, Lua is a great scripting language in 25kloc that really needs no updates. Instead of forcing users to hunt down the right lua-dev dependency, just include the Lua source with your source.
  • Load libraries at runtime, through the use of dlopen(), and include their interface .h files as part of your project source. That means they aren't burdened by the dependency unless they use that feature. Or, if it's a necessary feature, you can spit out error messages with more help on fixing the dependency.


Understand undefined C

You are likely wrong about how C works. Consider the expression (x + 1 < x). A valid result of this is '5'. It's because C doesn't define what happens when 'x' is the maximum value for an integer and you add 1 to it, causing it to overflow. One thing many compilers do is treat this as the 2s complement overflow, similar to how other languages do (such as Java). But some compilers have been known to treat this as an impossibility, and remove the code depending on it completely.

Thus, instead of relying upon how the current version of your C compiler works, you really need to code according to the large spec of how C compilers may behave.


Conclusion


Don't program in C unless you are responsible. Responsible means understand buffer-overflows, integer overflows, thread synchronization, undefined behaviors, and so. Responsible means coding for quality that aggressively tries to expose bugs early. This is the nature of C in 2016.