Sunday, April 21, 2019

Programming languages infosec professionals should learn

Code is an essential skill of the infosec professional, but there are so many languages to choose from. What language should you learn? As a heavy coder, I thought I'd answer that question, or at least give some perspective.

The tl;dr is JavaScript. Whatever other language you learn, you'll also need to learn JavaScript. It's the language of browsers, Word macros, JSON, NodeJS server side, scripting on the command-line, and Electron apps. You'll also need to a bit of bash and/or PowerShell scripting skills, SQL for database queries, and regex for extracting data from text files. Other languages are important as well, Python is very popular for example. Actively avoid C++ and PHP as they are obsolete.

Also tl;dr: whatever language you decide to learn, also learn how to use an IDE with visual debugging, rather than just a text editor. That probably means Visual Code from Microsoft. Also, whatever language you learn, stash your code at GitHub.

Let's talk in general terms. Here are some types of languages.

  • Unavoidable. As mentioned above, familiarity with JavaScript, bash/Powershell, and SQL are unavoidable. If you are avoiding them, you are doing something wrong.
  • Small scripts. You need to learn at least one language for writing quick-and-dirty command-line scripts to automate tasks or process data. As a tool using animal, this is your basic tool. You are a monkey, this is the stick you use to knock down the banana. Good choices are JavaScript, Python, and Ruby. Some domain-specific languages can also work, like PHP and Lua. Those skilled in bash/PowerShell can do a surprising amount of "programming" tasks in those languages. Old timers use things like PERL or TCL. Sometimes the choice of which language to learn depends upon the vast libraries that come with the languages, especially Python and JavaScript libraries.
  • Development languages.  Those scripting languages have grown up into real programming languages, but for the most part, "software development" means languages designed for that task like C, C++, Java, C#, Rust, Go, or Swift.
  • Domain-specific languages. The language Lua is built into nmap, snortWireshark, and many games. Ruby is the language of Metasploit. Further afield, you may end up learning languages like R or Matlab. PHP is incredibly important for web development. Mobile apps may need Java, C#, Kotlin, Swift, or Objective-C.

As an experienced developer, here are my comments on the various languages, sorted in alphabetic order.


bash (and other Unix shells)

You have to learn some bash for dealing with the command-line. But it's also a fairly completely programming language. Perusing the scripts in an average Linux distribution, especially some of the older ones, and you'll find that bash makes up a substantial amount of what we think of as the Linux operating system. Actually, it's called bash/Linux.

In the Unix world, there are lots of other related shells that aren't bash, which have slightly different syntax. A good example is BusyBox which has "ash". I mention this because my bash skills are rather poor partly because I originally learned "csh" and get my syntax variants confused.

As a hard-core developer, I end up just programming in JavaScript or even C rather than trying to create complex bash scripts. But you shouldn't look down on complex bash scripts, because they can do great things. In particular, if you are a pentester, the shell is often the only language you'll get when hacking into a system, sod good bash language skills are a must.


C

This is the development language I use the most, simply because I'm an old-time "systems" developer. What "systems programming" means is simply that you have manual control over memory, which gives you about 4x performance and better "scalability" (performance doesn't degrade as much as problems get bigger). It's the language of the operating system kernel, as well as many libraries within an operating system.

But if you don't want manual control over memory, then you don't want to use it. It's lack of memory protection leading to security problems makes it almost obsolete.


C++

None of the benefits of modern languages like Rust, Java, and C#, but all of the problems of C. It's an obsolete, legacy language to be avoided.


C#

This is Microsoft's personal variant of Java designed to be better than Java. It's an excellent development language, for command-line utilities, back-end services, applications on the desktop (even Linux), and mobile apps. If you are working in a Windows environment at all, it's an excellent choice. If you can at all use C# instead of C++, do so. Also, in the Microsoft world, there is still a lot of VisualBasic. OMG avoid that like the plague that it is, burn in a fire burn burn burn, and use C# instead.


Go

Once a corporation reaches a certain size, it develops its own programming language. For Google, their most important language is Go.

Go is a fine language in general, but it's main purpose is scalable network programs using goroutines. This is does asynchronous user-mode programming in a way that's most convenient for the programmer. Since Google is all about scalable network services, Go is a perfect fit for them.

I do a lot of scalable network stuff in C, because I'm an oldtimer. If that's something you're interested in, you should probably choose Go over C.


Java

This gets a bad reputation because it was once designed for browsers, but has so many security flaws that it can't be used in browsers. You still find in-browser apps that use Java, even in infosec products (like consoles), but it's horrible for that. If you do this, you are bad and should feel bad.

But browsers aside, it's a great development language for command-line utilities, back-end services, apps on desktops, and apps on phones. If you want to write an app that runs on macOS, Windows, and on a Raspberry Pi running Linux, then this is an excellent choice.


JavaScript

As mentioned above, you don't have a choice but to learn this language. One of your basic skills is learning how to open Chrome developer tools and manipulate JavaScript on a web page.

So the question is whether you learn just enough familiarity with the language in order to hack around with it, or whether you spend the effort to really learn the language to do development or write scripts. I suggest that you should. For one thing, you'll often encounter weird usages of JavaScript that you are unfamiliar with unless you seriously learn the language, such as JQuery style constructions that look nothing like what you might've originally learned the language for.

JavaScript has actually become a serious app development language with NodeJS and frameworks like Electron. If there is one language in the world that can do everything, from writing back end services (NodeJS), desktop applications (Electron), mobile apps (numerous frameworks), quick-and-dirty scripts (NodeJS again), and browser apps -- it's JavaScript. It's the lingua franca of the world.

In addition, remember that your choice of scripting language will often be based on the underlying libraries available. For example, if writing TensorFlow machine-learning programs, you need those libraries available to the language. That's why JavaScript is popular in the machine-learning field, because there's so many libraries available for it.

BTW, "JSON" is also a language, or at least a data format, in its own right. So you have to learn that, too.


Lua

Lua is a language similar to JavaScript in many respects, with the big difference that arrays start with 1 instead of 0. The reason its exists is that it's extremely easy to embed in other programs as their scripting language, is lightweight in terms of memory/CPU, and is ultra-portable almost everywhere.

Thus, you find it embedded in security tools like nmap, snort, and Wireshark. You also see it as the scripting language in popular games. Like Go, it has extremely efficient coroutines, so you see it in the nginx web server, "OpenResty", for backend scripting of applications.


PERL

PERL was a popular scripting language in the early days of infosec (1990s), but has fallen behind the other languages in modern times. In terms of language design, it's a somewhat better language than shell languages like bash, yet not quite as robust as real programming languages like JavaScript, Python, and Ruby.

In addition, it was the primary web scripting language for building apps on servers in the 1990s before PHP came along.

Thus, it's a popular legacy language, but not a lot of new stuff is done in this language.


PHP

Surprisingly, PHP is a complete programming language. You can use it on the command-line to write scripts just like Python or JavaScript. You may have to learn it, because it's still the most popular language for creating webapps, but learning it well means being able to write backend scripts in it as well.

However, for writing web apps, it's obsolete. There are so many unavoidable security problems that you should avoid using it to create new apps. Also, scalability is still difficult. Use NodeJS, OpenResty/Lua, or Ruby instead.


PowerShell

The same comments above that apply to bash also apply to PowerShell, except that PowerShell is Windows.

Windows has two command-lines, the older CMD/BAT command-line, and the newer PowerShell. Anything complex uses PowerShell these days. For pentesting, there are lots of fairly complete tools for doing interesting things from the command-line written in the PowerShell programming language.

Thus, if Windows is in your field, and it almost certainly is, then PowerShell needs to be part of your toolkit.


Python

This has become one of the most popular languages, driven by universities which use it heavily as the teaching language for programming concepts. Anything academic, like machine learning, will have great libraries for Python.

A lot of hacker command-line tools are written in Python. Since such tools are often buggy and poorly documented, you'll end up having to reading the code a lot to figure out what is going wrong. Learning to program in Python means being able to contribute to those tools.

I personally hate the language because of the schism between v2/v3, and having to constantly struggle with that. Every language has a problem with evolution and backwards compatibility, but this v2 vs v3 issue with Python seems particularly troublesome.

Also, Python is slow. That shouldn't matter in this age of JITs everywhere and things like Webassembly, but somehow whenever you have an annoyingly slow tool, it's Python that's at fault.

Note that whenever I read reviews of programming languages, I see praise for Python's syntax. This is nonsense. After a short while, the syntax of all programming languages becomes quirky and weird. Most languages these days are multi-paradigm, a combination of imperative, object-oriented, and functional. Most all are JITted. "Syntax" is the least reason to choose a language. Instead, it's the choice of support/libraries (which are great for Python), or specific features like tight "systems" memory control (like Rust) or scalable coroutines (like Go). Seriously, stop praising the "elegant" and "simple" syntax of languages.


Regex

Like SQL for database queries, regular expressions aren't a programming language as such, but still a language you need to learn. They are patterns that match data. For example, if you want to find all social security numbers in a text file, you looked for that pattern of digits and dashes. Such pattern matching is so common that it's built into most tools, and is a feature of most scripting languages.

One thing to remember from an infosec point of view is that they are highly insecure. Hackers craft content to incorrectly match patterns, evade patterns, or cause "algorithmic complexity" attacks that cause simple regexes to exploded with excessive computation.

You have learn regexes enough to be familiar with the basics, but the syntax can get unreasonably complex, so few master the full regex syntax.


Ruby

Ruby is a great language for writing web apps that makes security easier than with PHP, though like all web apps it still has some issues.

In infosec, the major reason to learn Ruby is Metasploit.

Like Python and JavaScript, it's also a great command-line scripting language with lots of libraries available. You'll find it often used in this roll.


Rust

Rust is Mozilla's replacement language for C and especially C++. It's supports tight control over memory structures for "systems" programming, but is memory safe so doesn't have all those vulnerabilities. One of these days I'll stop programming in C and use Rust instead.

The problem with Rust is that it doesn't have quite the support that other languages have, like Java or C# for apps, and isn't as tightly focused on network apps as Go. But as a language, it's wonderful. In a perfect world, we'd all use JavaScript for scripting tasks and Rust for the backend work. But in the real world, other languages have better support.


SQL

SQL, "structure query language", isn't a programming language as such, but it's still a language of some sort. It's something that you unavoidably have to learn.

One of the reasons to learn a programming language is to process data. You can do that within a programming language, but an alternative is to shove the data into a database then write queries off that database. I have a server at home just for that purpose, with large disks and multicore processors. Instead of storing things as files, and writing scripts to process those files, I stick it in tables, and write SQL queries off those tables.


Swift

Back in the day, when computers were new, before C++ become the "object oriented" language standard, there was a competing object-oriented version of C known as "Objective C". Because, as everyone knew, object-oriented was the future, NeXT adopted this as their application programming language. Apple bought NeXT, and thus it became Apple's programming language.

But Objective C lost the object-oriented war to C++ and became an orphaned language. Also, it was really stupid, essentially two separate language syntaxes fighting for control of your code.

Therefore, a few years ago, Apple created a replacement called Swift, which is largely based on a variant of Rust. Like Rust, it's an excellent "systems" programming language that has more manual control over memory allocation, but without all the buffer-overflows and memory leaks you see in C.

It's an excellent language, and great when programming in an Apple environment. However, when choosing a "language" that's not particularly Apple focused, just choose Rust instead.


Conclusion

As I mentioned above, familiarity with JavaScript, bash/PowerShell, and SQL is unavoidable. So start with those. JavaScript in particular has become a lingua franca, able to do, and do well, almost anything you need a language to do these days, so it's worth getting into the finder details JavaScript.

However, there's no One Language to Rule them all. There's good reasons to learn most languages in this list. For some tasks, the support for a certain language is so good it's just best to learn that language to solve that task. With the academic focus on Python, you'll find well-written libraries that solve important tasks for you. If you want to work with a language that other people know, that you can ask questions about, then Python is a great choice.

The exceptions to this are C++ and PHP. They are so obsolete that you should avoid learning them, unless you plan on dealing with legacy.

6 comments:

  1. Great post. Another benefit to learning Javascript is that you can write tools/apps for integrating with GSuite (mail/calendar/etc) and you can create custom functions and macros in Google sheets.

    ReplyDelete
  2. > [C++ is] an obsolete, legacy language to be avoided

    let the 74th hunger games begin!

    ReplyDelete
  3. I'mjust completing a 4-year Cybersecurity degree and I've had to delve into C++ twice for related modules. A lot of networking simulators use it, also IoT (Cooja simulator) and Arduino's are programmed using C++ / C functions.

    As I learned java and python it was a bit of a pain at first, but it's still out there.

    ReplyDelete
  4. The v2 vs v3 difference in Python is nowhere near as annoying as the difference between regex variants (one reason why very few master the full syntax is that that means learning a lot of non-portable things or remembering how Python regex is different from .NET/C# regex is different from Java regex-- quick, in which languages is a [ inside a [] character class a literal [, and in which languages does it represent a nested character class so that you can do character class intersections?), or the different between SQL flavors.

    Speaking of obsolete formats, you missed an opportunity to mention how horrible and obsolete XML is in the JSON section (and how plenty of people do XML-like objects but generally don't do actual XML.)

    ReplyDelete
  5. Nothing new happening in Perl? Metacpan> begs to differ.

    ReplyDelete