Friday, January 20, 2017

The command-line, for cybersec

On Twitter I made the mistake of asking people about command-line basics for cybersec professionals. A got a lot of useful responses, which I summarize in this long (5k words) post. It’s mostly driven by the tools I use, with a bit of input from the tweets I got in response to my query.

bash

By command-line this document really means bash.

There are many types of command-line shells. Windows has two, 'cmd.exe' and 'PowerShell'. Unix started with the Bourne shell ‘sh’, and there have been many variations of this over the years, ‘csh’, ‘ksh’, ‘zsh’, ‘tcsh’, etc. When GNU rewrote Unix user-mode software independently, they called their shell “Bourne Again Shell” or “bash” (queue "JSON Bourne" shell jokes here).

Bash is the default shell for Linux and macOS. It’s also available on Windows, as part of their special “Windows Subsystem for Linux”. The windows version of ‘bash’ has become my most used shell.

For Linux IoT devices, BusyBox is the most popular shell. It’s easy to clear, as it includes feature-reduced versions of popular commands.


man

‘Man’ is the command you should not run if you want help for a command.

Man pages are designed to drive away newbies. They are only useful if you already mostly an expert with the command you desire help on. Man pages list all possible features of a program, but do not highlight examples of the most common features, or the most common way to use the commands.

Take ‘sed’ as an example. It’s used most commonly to do a search-and-replace in files, like so:

$ sed 's/rob/dave/' foo.txt

This usage is so common that many non-geeks know of it. Yet, if you type ‘man sed’ to figure out how to do a search and replace, you’ll get nearly incomprehensible gibberish, and no example of this most common usage.

I point this out because most guides on using the shell recommend ‘man’ pages to get help. This is wrong, it’ll just endlessly frustrate you. Instead, google the commands you need help on, or better yet, search StackExchange for answers.

You might try asking questions, like on Twitter or forum sites, but this requires a strategy. If you ask a basic question, self-important dickholes will respond by telling you to “rtfm” or “read the fucking manual”. A better strategy is to exploit their dickhole nature, such as saying “too bad command xxx cannot do yyy”. Helpful people will gladly explain why you are wrong, carefully explaining how xxx does yyy.

If you must use 'man', use the 'apropos' command to find the right man page. Sometimes multiple things in the system have the same or similar names, leading you to the wrong page.


apt-get install yum

Using the command-line means accessing that huge open-source ecosystem. Most of the things in this guide do no already exist on the system. You have to either compile them from source, or install via a package-manager. Linux distros ship with a small footprint, but have a massive database of precompiled software “packages” in the cloud somewhere. Use the "package manager" to install the software from the cloud.

On Debian-derived systems (like Ubuntu, Kali, Raspbian), type “apt-get install masscan” to install “masscan” (as an example). Use “apt-cache search scan” to find a bunch of scanners you might want to install.

On RedHat systems, use “yum” instead. On BSD, use the “ports” system, which you can also get working for macOS.

If no pre-compiled package exists for a program, then you’ll have to download the source code and compile it. There’s about an 80% chance this will work easy, following the instructions. There is a 20% chance you’ll experience “dependency hell”, for example, needing to install two mutually incompatible versions of Python.


Bash is a scripting language

Don’t forget that shells are really scripting languages. The bit that executes a single command is just a degenerate use of the scripting language. For example, you can do a traditional for loop like:

$ for i in $(seq 1 9); do echo $i; done

In this way, ‘bash’ is no different than any other scripting language, like Perl, Python, NodeJS, PHP CLI, etc. That’s why a lot of stuff on the system actually exists as short ‘bash’ programs, aka. shell scripts.

Few want to write bash scripts, but you are expected to be able to read them, either to tweek existing scripts on the system, or to read StackExchange help.


File system commands

The macOS “Finder” or Windows “File Explorer” are just graphical shells that help you find files, open, and save them. The first commands you learn are for the same functionality on the command-line: pwd, cd, ls, touch, rm, rmdir, mkdir, chmod, chown, find, ln, mount.

The command “rm –rf /” removes everything starting from the root directory. This will also follow mounted server directories, deleting files on the server. I point this out to give an appreciation of the raw power you have over the system from the command-line, and how easy you can disrupt things.

Of particular interest is the “mount” command. Desktop versions of Linux typically mount USB flash drives automatically, but on servers, you need to do it manually, e.g.:

$ mkdir ~/foobar
$ mount /dev/sdb ~/foobar

You’ll also use the ‘mount’ command to connect to file servers, using the “cifs” package if they are Windows file servers:

# apt-get install cifs-utils
# mkdir /mnt/vids
# mount -t cifs -o username=robert,password=foobar123  //192.168.1.11/videos /mnt/vids


Linux system commands

The next commands you’ll learn are about syadmin the Linux system: ps, top, who, history, last, df, du, kill, killall, lsof, lsmod, uname, id, shutdown, and so on.

The first thing hackers do when hacking into a system is run “uname” (to figure out what version of the OS is running) and “id” (to figure out which account they’ve acquired, like “root” or some other user).

The Linux system command I use most is “dmesg” (or ‘tail –f /var/log/dmesg’) which shows you the raw system messages. For example, when I plug in USB drives to a server, I look in ‘dmesg’ to find out which device was added so that I can mount it. I don’t know if this is the best way, it’s just the way I do it (servers don’t automount USB drives like desktops do).


Networking commands

The permanent state of the network (what gets configured on the next bootup) is configured in text files somewhere. But there are a wealth of commands you’ll use to view the current state of networking, make temporary changes, and diagnose problems.

The ‘ifconfig’ command has long been used to view the current TCP/IP configuration and make temporary changes. Learning how TCP/IP works means playing a lot with ‘ifconfig’. Use “ifconfig –a” for even more verbose information.

Use the “route” command to see if you are sending packets to the right router.

Use ‘arp’ command to make sure you can reach the local router.

Use ‘traceroute’ to make sure packets are following the correct route to their destination. You should learn the nifty trick it’s based on (TTLs). You should also play with the TCP, UDP, and ICMP options.

Use ‘ping’ to see if you can reach the target across the Internet. Usefully measures the latency in milliseconds, and congestion (via packet loss). For example, ping NetFlix throughout the day, and notice how the ping latency increases substantially during “prime time” viewing hours.

Use ‘dig’ to make sure DNS resolution is working right. (Some use ‘nslookup’ instead). Dig is useful because it’s the raw universal DNS tool – every time they add some new standard feature to DNS, they add that feature into ‘dig’ as well.

The ‘netstat –tualn’ command views the current TCP/IP connections and which ports are listening. I forget what the various options “tualn” mean, only it’s the output I always want to see, rather than the raw “netstat” command by itself.

You’ll want to use ‘ethtool –k’ to turn off checksum and segmentation offloading. These are features that break packet-captures sometimes.

There is this new fangled ‘ip’ system for Linux networking, replacing many of the above commands, but as an old timer, I haven’t looked into that.

Some other tools for diagnosing local network issues are ‘tcpdump’, ‘nmap’, and ‘netcat’. These are described in more detail below.


ssh

In general, you’ll remotely log into a system in order to use the command-line. We use ‘ssh’ for that. It uses a protocol similar to SSL in order to encrypt the connection. There are two ways to use ‘ssh’ to login, with a password or with a client-side certificate.

When using SSH with a password, you type “ssh username@servername”. The remote system will then prompt you for a password for that account.

When using client-side certificates, use “ssh-keygen” to generate a key, then either copy the public-key of the client to the server manually, or use “ssh-copy-id” to copy it using the password method above.

How this works is basic application of public-key cryptography. When logging in with a password, you get a copy of the server’s public-key the first time you login, and if it ever changes, you get a nasty warning that somebody may be attempting a man in the middle attack.

$ ssh rgraham@scanner2.erratasec.com
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!

When using client-side certificates, the server trusts your public-key. This is similar to how client-side certificates work in SSL VPNs.

You can use SSH for things other than loging into a remote shell. You can script ‘ssh’ to run commands remotely on a system in a local shell script. You can use ‘scp’ (SSH copy) to transfer files to and from a remote system. You can do tricks with SSH to create tunnels, which is popular way to bypass the restrictive rules of your local firewall nazi.


openssl

This is your general cryptography toolkit, doing everything from simple encryption, to public-key certificate signing, to establishing SSL connections.

It is extraordinarily user hostile, with terrible inconsistency among options. You can only figure out how to do things by looking up examples on the net, such as on StackExchange. There are competing SSL libraries with their own command-line tools, like GnuTLS and Mozilla NSS that you might find easier to use.

The fundamental use of the ‘openssl’ tool is to create public-keys, “certificate requests”, and creating self-signed certificates. All the web-site certificates I’ve ever obtained has been using the openssl command-line tool to create CSRs.

You should practice using the ‘openssl’ tool to encrypt files, sign files, and to check signatures.

You can use openssl just like PGP for encrypted emails/messages, but following the “S/MIME” standard rather than PGP standard. You might consider learning the ‘pgp’ command-line tools, or the open-source ‘gpg’ or ‘gpg2’ tools as well.

You should learn how to use the “openssl s_client” feature to establish SSL connections, as well as the “openssl s_server” feature to create an SSL proxy for a server that doesn’t otherwise support SSL.

Learning all the ways of using the ‘openssl’ tool to do useful things will go a long way in teaching somebody about crypto and cybersecurity. I can imagine an entire class consisting of nothing but learning ‘openssl’.


netcat (nc, socat, cyptocat, ncat)

A lot of Internet protocols are based on text. That means you can create a raw TCP connection to the service and interact with them using your keyboard. The classic tool for doing this is known as “netcat”, abbreviated “nc”. For example, connect to Google’s web server at port and type the HTTP HEAD command followed by a blank line (hit [return] twice):

$ nc www.google.com 80
HEAD / HTTP/1.0

HTTP/1.0 200 OK
Date: Tue, 17 Jan 2017 01:53:28 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
P3P: CP="This is not a P3P policy! See https://www.google.com/support/accounts/answer/151657?hl=en for more info."
Server: gws
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Set-Cookie: NID=95=o7GT1uJCWTPhaPAefs4CcqF7h7Yd7HEqPdAJncZfWfDSnNfliWuSj3XfS5GJXGt67-QJ9nc8xFsydZKufBHLj-K242C3_Vak9Uz1TmtZwT-1zVVBhP8limZI55uXHuPrejAxyTxSCgR6MQ; expires=Wed, 19-Jul-2017 01:53:28 GMT; path=/; domain=.google.com; HttpOnly
Accept-Ranges: none
Vary: Accept-Encoding

Another classic example is to connect to port 25 on a mail server to send email, spoofing the “MAIL FROM” address.

There are several versions of ‘netcat’ that work over SSL as well. My favorite is ‘ncat’, which comes with ‘nmap’, as it’s actively maintained. In theory, “openssl s_client” should also work this way.


nmap

At some point, you’ll need to port scan. The standard program for this is ‘nmap’, and it’s the best. The classic way of using it is something like:

# nmap –A scanme.nmap.org

The ‘-A’ option means to enable all the interesting features like OS detection, version detection, and basic scripts on the most common ports that a server might have open. It takes awhile to run. The “scanme.nmap.org” is a good site to practice on.

Nmap is more than just a port scanner. It has a rich scripting system for probing more deeply into a system than just a port, and to gather more information useful for attacks. The scripting system essentially contains some attacks, such as password guessing.

Scanning the Internet, finding services identified by ‘nmap’ scripts, and interacting with them with tools like ‘ncat’ will teach you a lot about how the Internet works.

BTW, if ‘nmap’ is too slow, using ‘masscan’ instead. It’s a lot faster, though has much more limited functionality.


Packet sniffing with tcpdump and tshark

All Internet traffic consists of packets going between IP addresses. You can capture those packets and view them using “packet sniffers”. The most important packet-sniffer is “Wireshark”, a GUI. For the command-line, there is ‘tcpdump’ and ‘tshark’.

You can run tcpdump on the command-line to watch packets go in/out of the local computer. This performs a quick “decode” of packets as they are captured. It’ll reverse-lookup IP addresses into DNS names, which means its buffers can overflow, dropping new packets while it’s waiting for DNS name responses for previous packets (which can be disabled with -n):

# tcpdump –p –i eth0

A common task is to create a round-robin set of files, saving the last 100 files of 1-gig each. Older files are overwritten. Thus, when an attack happens, you can stop capture, and go backward in times and view the contents of the network traffic using something like Wireshark:

# tcpdump –p -i eth0 -s65535 –C 1000 –W 100 –w cap

Instead of capturing everything, you’ll often set “BPF” filters to narrow down to traffic from a specific target, or a specific port.

The above examples use the –p option to capture traffic destined to the local computer. Sometimes you may want to look at all traffic going to other machines on the local network. You’ll need to figure out how to tap into wires, or setup “monitor” ports on switches for this to work.

A more advanced command-line program is ‘tshark’. It can apply much more complex filters. It can also be used to extract the values of specific fields and dump them to a text files.


Base64/hexdump/xxd/od

These are some rather trivial commands, but you should know them.

The ‘base64’ command encodes binary data in text. The text can then be passed around, such as in email messages. Base64 encoding is often automatic in the output from programs like openssl and PGP.

In many cases, you’ll need to view a hex dump of some binary data. There are many programs to do this, such as hexdump, xxd, od, and more.


grep

Grep searches for a pattern within a file. More important, it searches for a regular expression (regex) in a file. The fu of Unix is that a lot of stuff is stored in text files, and use grep for regex patterns in order to extra stuff stored in those files.

The power of this tool really depends on your mastery of regexes. You should master enough that you can understand StackExhange posts that explain almost what you want to do, and then tweek them to make them work.

Grep, by default, shows only the matching lines. In many cases, you only want the part that matches. To do that, use the –o option. (This is not available on all versions of grep).

You’ll probably want the better, “extended” regular expressions, so use the –E option.

You’ll often want “case-insensitive” options (matching both upper and lower case), so use the –i option.

For example, to extract all MAC address from a text file, you might do something like the following. This extracts all strings that are twelve hex digits.

$ grep –Eio ‘[0-9A-F]{12}’ foo.txt


Text processing

Grep is just the first of the various “text processing filters”. Other useful ones include ‘sed’, ‘cut’, ‘sort’, and ‘uniq’.

You’ll be an expert as piping output of one to the input of the next. You’ll use “sort | uniq” as god (Dennis Ritchie) intended and not the heresy of “sort –u”.

You might want to master ‘awk’. It’s a new programming language, but once you master it, it’ll be easier than other mechanisms.

You’ll end up using ‘wc’ (word-count) a lot. All it does is count the number of lines, words, characters in a file, but you’ll find yourself wanting to do this a lot.


csvkit and jq

You get data in CSV format and JSON format a lot. The tools ‘csvkit’ and ‘jq’ respectively help you deal with those tools, to convert these files into other formats, sticking the data in databases, and so forth.

It’ll be easier using these tools that understand these text formats to extract data than trying to write ‘awk’ command or ‘grep’ regexes.


strings

Most files are binary with a few readable ASCII strings. You use the program ‘strings’ to extract those strings.

This one simple trick sounds stupid, but it’s more powerful than you’d think. For example, I knew that a program probably contained a hard-coded password. I then blindly grabbed all the strings in the program’s binary file and sent them to a password cracker to see if they could decrypt something. And indeed, one of the 100,000 strings in the file worked, thus finding the hard-coded password.


tail -f

So ‘tail’ is just a standard Linux tool for looking at the end of files. If you want to keep checking the end of a live file that’s constantly growing, then use “tail –f”. It’ll sit there waiting for something new to be added to the end of the file, then print it out. I do this a lot, so I thought it’d be worth mentioning.


tar –xvfz, gzip, xz, 7z

In prehistorical times (like the 1980s), Unix was backed up to tape drives. The tar command could be used to combine a bunch of files into a single “archive” to be sent to the tape drive, hence “tape archive” or “tar”.

These days, a lot of stuff you download will be in tar format (ending in .tar). You’ll need to learn how to extract it:

$ tar –xvf something.tar

Nobody knows what the “xvf” options mean anymore, but these letters most be specified in that order. I’m joking here, but only a little: somebody did a survey once and found that virtually nobody know how to use ‘tar’ other than the canned formulas such as this.

Along with combining files into an archive you also need to compress them. In prehistoric Unix, the “compress” command would be used, which would replace a file with a compressed version ending in ‘.z’. This would found to be encumbered with patents, so everyone switched to ‘gzip’ instead, which replaces a file with a new one ending with ‘.gz’.

$ ls foo.txt*
foo.txt
$ gzip foo.txt
$ ls foo.txt*
foo.txt.gz

Combined with tar, you get files with either the “.tar.gz” extension, or simply “.tgz”. You can untar and uncompress at the same time:

$ tar –xvfz something .tar.gz

Gzip is always good enough, but nerds gonna nerd and want to compress with slightly better compression programs. They’ll have extensions like “.bz2”, “.7z”, “.xz”, and so on. There are a ton of them. Some of them are supported directly by the ‘tar’ program:

$ tar –xvfj something.tar.bz2

Then there is the “zip/unzip” program, which supports Windows .zip file format. To create compressed archives these days, I don’t bother with tar, but just use the ZIP format. For example, this will recursively descend a directory, adding all files to a ZIP file that can easily be extracted under Windows:

$ zip –r test.zip ./test/


dd

I should include this under the system tools at the top, but it’s interesting for a number of purposes. The usage is simply to copy one file to another, the in-file to the out-file.

$ dd if=foo.txt of=foo2.txt

But that’s not interesting. What interesting is using it to write to “devices”. The disk drives in your system also exist as raw devices under the /dev directory.

For example, if you want to create a boot USB drive for your Raspberry Pi:

# dd if=rpi-ubuntu.img of=/dev/sdb

Or, you might want to hard erase an entire hard drive by overwriting random data:

# dd if=/dev/urandom of=/dev/sdc

Or, you might want to image a drive on the system, for later forensics, without stumbling on things like open files.

# dd if=/dev/sda of=/media/Lexar/infected.img

The ‘dd’ program has some additional options, like block size and so forth, that you’ll want to pay attention to.


screen and tmux

You log in remotely and start some long running tool. Unfortunately, if you log out, all the processes you started will be killed. If you want it to keep running, then you need a tool to do this.

I use ‘screen’. Before I start a long running port scan, I run the “screen” command. Then, I type [ctrl-a][ctrl-d] to disconnect from that screen, leaving it running in the background.

Then later, I type “screen –r” to reconnect to it. If there are more than one screen sessions, using ‘-r’ by itself will list them all. Use “-r pid” to reattach to the proper one. If you can’t, then use “-D pid” or “-D –RR pid” to forced the other session to detached from whoever is using it.

Tmux is an alternative to screen that many use. It’s cool for also having lots of terminal screens open at once.


curl and wget

Sometimes you want to download files from websites without opening a browser. The ‘curl’ and ‘wget’ programs do that easily. Wget is the traditional way of doing this, but curl is a bit more flexible. I use curl for everything these days, except mirroring a website, in which case I just do “wget –m website”.

The thing that makes ‘curl’ so powerful is that it’s really designed as a tool for poking and prodding all the various features of HTTP. That it’s also useful for downloading files is a happy coincidence. When playing with a target website, curl will allow you do lots of complex things, which you can then script via bash. For example, hackers often write their cross-site scripting/forgeries in bash scripts using curl.


node/php/python/perl/ruby/lua

As mentioned above, bash is its own programming language. But it’s weird, and annoying. So sometimes you want a real programming language. Here are some useful ones.

Yes, PHP is a language that runs in a web server for creating web pages. But if you know the language well, it’s also a fine command-line language for doing stuff.

Yes, JavaScript is a language that runs in the web browser. But if you know it well, it’s also a great language for doing stuff, especially with the “nodejs” version.

Then there are other good command line languages, like the Python, Ruby, Lua, and the venerable Perl.

What makes all these great is the large library support. Somebody has already written a library that nearly does what you want that can be made to work with a little bit of extra code of your own.

My general impression is that Python and NodeJS have the largest libraries likely to have what you want, but you should pick whichever language you like best, whichever makes you most productive. For me, that’s NodeJS, because of the great Visual Code IDE/debugger.


iptables, iptables-save

I shouldn’t include this in the list. Iptables isn’t a command-line tool as such. The tool is the built-in firewalling/NAT features within the Linux kernel. Iptables is just the command to configure it.

Firewalling is an important part of cybersecurity. Everyone should have some experience playing with a Linux system doing basic firewalling tasks: basic rules, NATting, and transparent proxying for mitm attacks.

Use ‘iptables-save’ in order to persistently save your changes.


MySQL

Similar to ‘iptables’, ‘mysql’ isn’t a tool in its own right, but a way of accessing a database maintained by another process on the system.

Filters acting on text files only goes so far. Sometimes you need to dump it into a database, and make queries on that database.

There is also the offensive skill needed to learn how targets store things in a database, and how attackers get the data.

Hackers often publish raw SQL data they’ve stolen in their hacks (like the Ashley-Madisan dump). Being able to stick those dumps into your own database is quite useful. Hint: disable transaction logging while importing mass data.

If you don’t like SQL, you might consider NoSQL tools like Elasticsearch, MongoDB, and Redis that can similarly be useful for arranging and searching data. You’ll probably have to learn some JSON tools for formatting the data.


Reverse engineering tools

A cybersecurity specialty is “reverse engineering”. Some want to reverse engineer the target software being hacked, to understand vulnerabilities. This is needed for commercial software and device firmware where the source code is hidden. Others use these tools to analyze viruses/malware.

The ‘file’ command uses heuristics to discover the type of a file.

There’s a whole skillset for analyzing PDF and Microsoft Office documents. I play with pdf-parser. There’s a long list at this website:
https://zeltser.com/analyzing-malicious-documents/

There’s a whole skillset for analyzing executables. Binwalk is especially useful for analyzing firmware images.

Qemu is useful is a useful virtual-machine. It can emulate full systems, such as an IoT device based on the MIPS processor. Like some other tools mentioned here, it’s more a full subsystem than a simple command-line tool.

On a live system, you can use ‘strace’ to view what system calls a process is making. Use ‘lsof’ to view which files and network connections a process is making.


Password crackers

A common cybersecurity specialty is “password cracking”. There’s two kinds: online and offline password crackers.

Typical online password crackers are ‘hydra’ and ‘medusa’. They can take files containing common passwords and attempt to log on to various protocols remotely, like HTTP, SMB, FTP, Telnet, and so on. I used ‘hydra’ recently in order to find the default/backdoor passwords to many IoT devices I’ve bought recently in my test lab.

Online password crackers must open TCP connections to the target, and try to logon. This limits their speed. They also may be stymied by systems that lock accounts, or introduce delays, after too many bad password attempts.

Typical offline password crackers are ‘hashcat’ and ‘jtr’ (John the Ripper). They work off of stolen encrypted passwords. They can attempt billions of passwords-per-second, because there’s no network interaction, nothing slowing them down.

Understanding offline password crackers means getting an appreciation for the exponential difficulty of the problem. A sufficiently long and complex encrypted password is uncrackable. Instead of brute-force attempts at all possible combinations, we must use tricks, like mutating the top million most common passwords.

I use hashcat because of the great GPU support, but John is also a great program.


WiFi hacking

A common specialty in cybersecurity is WiFi hacking. The difficulty in WiFi hacking is getting the right WiFi hardware that supports the features (monitor mode, packet injection), then the right drivers installed in your operating system. That’s why I use Kali rather than some generic Linux distribution, because it’s got the right drivers installed.

The ‘aircrack-ng’ suite is the best for doing basic hacking, such as packet injection. When the parents are letting the iPad babysit their kid with a loud movie at the otherwise quite coffeeshop, use ‘aircrack-ng’ to deauth the kid.

The ‘reaver’ tool is useful for hacking into sites that leave WPS wide open and misconfigured.


Remote exploitation

A common specialty in cybersecurity is pentesting.

Nmap, curl, and netcat (described above) above are useful tools for this.

Some useful DNS tools are ‘dig’ (described above), dnsrecon/dnsenum/fierce that try to enumerate and guess as many names as possible within a domain. These tools all have unique features, but also have a lot of overlap.

Nikto is a basic tool for probing for common vulnerabilities, out-of-date software, and so on. It’s not really a vulnerability scanner like Nessus used by defenders, but more of a tool for attack.

SQLmap is a popular tool for probing for SQL injection weaknesses.

Then there is ‘msfconsole’. It has some attack features. This is humor – it has all the attack features. Metasploit is the most popular tool for running remote attacks against targets, exploiting vulnerabilities.


Text editor

Finally, there is the decision of text editor. I use ‘vi’ variants. Others like ‘nano’ and variants. There’s no wrong answer as to which editor to use, unless that answer is ‘emacs’.


Conclusion

Obviously, not every cybersecurity professional will be familiar with every tool in this list. If you don’t do reverse-engineering, then you won’t use reverse-engineering tools.

On the other hand, regardless of your specialty, you need to know basic crypto concepts, so you should know something like the ‘openssl’ tool. You need to know basic networking, so things like ‘nmap’ and ‘tcpdump’. You need to be comfortable processing large dumps of data, manipulating it with any tool available. You shouldn’t be frightened by a little sysadmin work.

The above list is therefore a useful starting point for cybersecurity professionals. Of course, those new to the industry won’t have much familiarity with them. But it’s fair to say that I’ve used everything listed above at least once in the last year, and the year before that, and the year before that. I spend a lot of time on StackExchange and Google searching the exact options I need, so I’m not an expert, but I am familiar with the basic use of all these things.

Friday, January 13, 2017

About that Giuliani website...

Rumors are that Trump is making Rudy Giuliani some sort of "cyberczar" in the new administration. Therefore, many in the cybersecurity scanned his website "www.giulianisecurity.com" to see if it was actually secure from hackers. The results have been laughable, with out-of-date software, bad encryption, unnecessary services, and so on.

But here's the deal: it's not his website. He just contracted with some generic web designer to put up a simple page with just some basic content. It's there only because people expect if you have a business, you also have a website.

That website designer in turn contracted some basic VPS hosting service from Verio. It's a service Verio exited around March of 2016, judging by the archived page.

The Verio service promised "security-hardened server software" that they "continually update and patch". According to the security scans, this is a lie, as the software is all woefully out-of-date. According OS fingerprint, the FreeBSD image it uses is 10 years old. The security is exactly what you'd expect from a legacy hosting company that's shut down some old business.

You can probably break into Giuliani's server. I know this because other FreeBSD servers in the same data center have already been broken into, tagged by hackers, or are now serving viruses.

But that doesn't matter. There's nothing on Giuliani's server worth hacking. The drama over his security, while an amazing joke, is actually meaningless. All this tells us is that Verio/NTT.net is a crappy hosting provider, not that Giuliani has done anything wrong.









Monday, January 09, 2017

NAT is a firewall

NAT is a firewall. It's the most common firewall. It's the best firewall.

I thought I'd point this out because most security experts might disagree, pointing to some "textbook definition". This is wrong.

No, Yahoo! isn't changing its name

Trending on social media is how Yahoo is changing it's name to "Altaba" and CEO Marissa Mayer is stepping down. This is false.

What is happening instead is that everything we know of as "Yahoo" (including the brand name) is being sold to Verizon. The bits that are left are a skeleton company that holds stock in Alibaba and a few other companies. Since the brand was sold to Verizon, that investment company could no longer use it, so chose "Altaba". Since 83% of its investment is in Alibabi, "Altaba" makes sense. It's not like this new brand name means anything -- the skeleton investment company will be wound down in the next year, either as a special dividend to investors, sold off to Alibaba, or both.

Marissa Mayer is an operations CEO. Verizon didn't want her to run their newly acquired operations, since the entire point of buying them was to take the web operations in a new direction (though apparently she'll still work a bit with them through the transition). And of course she's not an appropriate CEO for an investment company. So she had no job left -- she made her own job disappear.


What happened today is an obvious consequence of Alibaba going IPO in September 2014. It meant that Yahoo's stake of 16% in Alibaba was now liquid. All told, the investment arm of Yahoo was worth $36-billion while the web operations (Mail, Fantasy, Tumblr, etc.) was worth only $5-billion.

In other words, Yahoo became a Wall Street mutual fund who inexplicably also offered web mail and cat videos.

Such a thing cannot exist. If Yahoo didn't act, shareholders would start suing the company to get their money back.That $36-billion in investments doesn't belong to Yahoo, it belongs to its shareholders. Thus, the moment the Alibaba IPO closed, Yahoo started planning on how to separate the investment arm from the web operations.

Yahoo had basically three choices.
  • The first choice is simply give the Alibaba (and other investment) shares as a one time dividend to Yahoo shareholders. 
  • A second choice is simply split the company in two, one of which has the investments, and the other the web operations. 
  • The third choice is to sell off the web operations to some chump like Verizon.

Obviously, Marissa Mayer took the third choice. Without a slushfund (the investment arm) to keep it solvent, Yahoo didn't feel it could run its operations profitably without integration with some other company. That meant it either had to buy a large company to integrate with Yahoo, or sell the Yahoo portion to some other large company.


Every company, especially Internet ones, have a legacy value. It's the amount of money you'll get from firing everyone, stop investing in the future, and just raking in year after year a stream of declining revenue. It's the fate of early Internet companies like Earthlink and Slashdot. It's like how I documented with Earthlink [*], which continues to offer email to subscribers, but spends only enough to keep the lights on, not even upgrading to the simplest of things like SSL.

Presumably, Verizon will try to make something of a few of the properties. Apparently, Yahoo's Fantasy sports stuff is popular, and will probably be rebranded as some new Verizon thing. Tumblr is already it's own brand name, independent of Yahoo, and thus will probably continue to exist as its own business unit.

One of the weird things is Yahoo Mail. It permanently bound to the "yahoo.com" domain, so you can't do much with the "Yahoo" brand without bringing Mail along with it. Though at this point, the "Yahoo" brand is pretty tarnished. There's not much new you can put under that brand anyway. I can't see how Verizon would want to invest in that brand at all -- just milk it for what it can over the coming years.


The investment company cannot long exist on its own. Investors want their money back, so they can make future investment decisions on their own. They don't want the company to make investment choices for them.

Think about when Yahoo made its initial $1-billion investment for 40% of Alibaba in 2005, it did not do so because it was a good "investment opportunity", but because Yahoo believed it was good strategic investment, such as providing an entry in the Chinese market, or providing an e-commerce arm to compete against eBay and Amazon. In other words, Yahoo didn't consider as a good way of investing its money, but a good way to create a strategic partnership -- one that just never materialized. From that point of view, the Alibaba investment was a failure.

In 2012, Marissa Mayer sold off 25% of Alibaba, netting $4-billion after taxes. She then lost all $4-billion on the web operations. That stake would be worth over $50-billion today. You can see the problem: companies with large slush funds just fritter them away keeping operations going. Marissa Mayer abused her position of trust, playing with money that belong to shareholders.

Thus, Altbaba isn't going to play with shareholder's money. It's a skeleton company, so there's no strategic value to investments. They can make no better investment choices than its shareholders can with their own money. Thus, the only purpose of the skeleton investment company is to return the money back to the shareholders. I suspect it'll choose the most tax efficient way of doing this, like selling the whole thing to Alibaba, which just exchanges the Altaba shares for Alibaba shares, with a 15% bonus representing the value of the other Altaba investments. Either way, if Altaba is still around a year from now, it's because it's board is skimming money that doesn't belong to them.



Key points:

  • Altaba is the name of the remaining skeleton investment company, the "Yahoo" brand was sold with the web operations to Verizon.
  • The name Altaba sucks because it's not a brand name that will stick around for a while -- the skeleton company is going to return all its money to its investors.
  • Yahoo had to spin off its investments -- there's no excuse for 90% of its market value to be investments and 10% in its web operations.
  • In particular, the money belongs to Yahoo's investors, not Yahoo the company. It's not some sort of slush fund Yahoo's executives could use. Yahoo couldn't use that money to keep its flailing web operations going, as Marissa Mayer was attempting to do.
  • Most of Yahoo's web operations will go the way of Earthlink and Slashdot, as Verizon milks the slowly declining revenue while making no new investments in it.



Friday, January 06, 2017

Notes about the FTC action against D-Link

Today, the FTC filed a lawsuit[*] against D-Link for security problems, such as backdoor passwords. I thought I'd write up some notes.

The suit is not "product liability", but "unfair and deceptive" business practices for promising "security". In addition, they interpret "security" different from the cybersecurity community.

This needs to be stressed because right now in our industry, there is a big discussion of product liability, insisting that everything attached to the Internet needs to be secured. People will therefore assume the FTC action is based on "liability".

Instead, all six counts are based upon the fact that D-Link offers its products for securing networks, and claims they are secure. Because they have backdoor passwords, clear-text passwords, command-injection bugs, and public private-keys, the FTC feels the claims of security to be untrue.

The key point I'm trying to make is that D-Link can resolve the suit (in theory) by simply removing all claims of "security". Sure, it can claim it supports stateful-inspection firewalls and WPA2, but not things like "WPA2 security". (Sure, the FTC may come back with a new lawsuit -- but it would solve the points raised in this one).

On the other hand, while "deception" is the law the FTC uses, their obvious real intent is to improve security. They intend for D-Link to remove it's security weakness, not to change its claims. The lawsuit is also intended to scare all IoT makers into securing their products, not to remove claims of security.

We see this intent in other posts on the FTC website. They've long been talking about IoT security. Recently, they announced a contest giving out $25,000 to the best solution for patching out-of-date IoT devices [*]. It's a silly contest, but shows what their real intent is.

Thus, the language of the lawsuit is very much about improving security, while the actual counts are about unfair/deceptive practices.

This is nonsense for a number of reasons. Among their claims is that D-Link lied to their customers for saying "you need to change the default password to secure the device", because the device still had a command-injection bug. That's a shocking departure from common sense. We in the cybersecurity community repeatedly advise people to change passwords to make devices more secure, ignoring any other insecurity that might exist. It means I'm just as deceptive as D-Link is.

The FTC's action is a clear violation of "due process". They didn't create a standard ahead of time of bugs that it would consider making a product "insecure", but instead arbitrarily punished D-Link for not meeting an unknown standard "secure". They never published a document saying "you can't advertise your product as being 'secure' if it contains this list of problems".

More to the point, their idea of "secure" is at odds with the cybersecurity community. We would indeed describe WPA2 as secure, regardless of some other feature of the device that makes it insecure. Most IoT devices are intended to be used behind a firewall anyway, so the only attack surface is the WiFi network. In such cases, the device can have backdoor passwords up the ying-yang, and we in the cybersecurity community will still call this "secure".

This is important because no product will ever be perfectly secure. Ten years from now, hackers will still discover some bug in some IoT product that nobody considered before, and the FTC will come down on them and punish them for deceptive practice. This is also counterproductive to the FTC's goals: if they are going to be so unfair about it, they are going to create incentives for companies to produce the wrong solution, to stop advertising their products as "secure".


The consequence of this action against D-Link is that the FTC is going to create an enormous chilling effect on innovation. As apps and IoT devices proliferate, the FTC is going to punish those on the forefront creating new and innovative products. At the same time, it's going to have little impact on actual security. They'll raise the price of brand-name products, while still being unable to target the white-box/no-name products that contain most of the vulnerabilities.



D-Link's makes a standard claim that we always make in the security industry:


...and then the FTC sues them for it.


Thursday, January 05, 2017

Profs: you should use JavaScript to teach Computer Science

Universities struggle with the canonical programming language they should teach students for Computer Science. Ideally, as they take computer science classes, all the homework assignments and examples will be in the same language. Today, that language is usually Java or Python. It should be JavaScript.

The reason for this is simple: whatever language you learn, you will also have to learn JavaScript, because it's the lingua franca of web browsers.

Python is a fundamentally broken language. Version 3 is incompatible with version 2, but after a decade, version 2 is still more popular. It's still unforgivably slow: other languages use JITs as a matter of course to get near native speed, while Python is still nearly always interpreted. Python isn't used in the real world, it's far down the list of languages programmers will use professionally. Python is primarily a middlware language, with neither apps nor services written in it.

Java is a fine language, but there's a problem with it: it's fundamentally controlled by a single company, Oracle, who is an evil company. Consumer versions of Java come with viruses. They sue those who try to come up with competing versions of Java. It's not an "open" system necessary for universities.

JavaScript has none of these problems. It's an open standard with many competing versions, two of which are completely open-source. New versions of the language are backwards compatible, but everyone stays closely up to date with the latest version anyway. It's extremely fast, as browsers vendors compete among themselves for the fastest JavaScript engine. It's used professionally everywhere, from writing phone apps to writing network services. And as mentioned above, everyone has to learn it eventually, because it's the language of web browsers.

It's a great "software engineering" language. Most IDEs support it, but especially Microsoft's "Visual Code", which provides the same IDE for Windows, Mac, and Linux for editing and debugging JavaScript. A cross-platform IDE that works the same for all students, regardless of desktop, is an enormous plus. All the other "software engineering" features work well with JavaScript as well, such as professional requirements of version control, bug tracking, and unit/regression testing.

It's an adequate "computer science" language. It supports all the major paradigms, like object-oriented and functional programming. It's perfect for teaching algorithms, data structures, complexity, boolean logic, number theory, Like most programming languages, it's got great library support for things like graphics, machine learning, robotics, cryptography, networking, databases, and so on.

One weakness is that it's not "multithreaded", but that's pretty much a weakness in every language except maybe Erlang. Even in C, people are taught to do it wrong (mutexes) instead of the right way (scalable).

JavaScript certainly has some quirks, but those are a feature for education, not a fault. Educators should go into a deep-dive with JavaScript explaining how it differs from other programming language. Explain how JavaScript pointers differ from C pointers, how object-oriented features differ from Java/C++, how functional features differ from LISP. A deep dive into things like AsmJS and JITs will teach you a lot about all languages.

It's not adequate to teach all computer science concepts, of course. If you are teaching scientific computing, then things like MATLAB and R will be better -- but those languages are impractical for other computer science topics.


In short, unlike any other language, everyone eventually has to learn JavaScript, in order to work within the browser. Given that, then we might as well use it as a pedagogical language. For most computer science topics, it's as least as good as any other language, like C, Java, or Python.

Tuesday, January 03, 2017

Dear Obama, From Infosec

Dear President Obama:

We are more than willing to believe Russia was responsible for the hacked emails/records that influenced our election. We believe Russian hackers were involved. Even if these hackers weren't under the direct command of Putin, we know he could put a stop to such hacking if he chose. It's like harassment of journalists and diplomats. Putin encourages a culture of thuggery that attacks opposition, without his personal direction, but with his tacit approval.

Your lame attempts to convince us of what we already agree with has irretrievably damaged your message.

Saturday, December 31, 2016

Your absurd story doesn't make me a Snowden apologist

Defending truth in the Snowden Affair doesn't make one an "apologist", for either side. There plenty of ardent supporters on either side that need to be debunked. The latest (anti-Snowden) example is the HPSCI committee report on Snowden [*], and stories like this one in the Wall Street Journal [*]. Pointing out the obvious holes doesn't make us "apologists".

As Edward Epstein documents in the WSJ story, one of the lies Snowden told was telling his employer (Booz-Allen) that he was being treated for epilepsy when in fact he was fleeing to Hong Kong in order to give documents to Greenwald and Poitras.

Well, of course he did. If you are going to leak a bunch of documents to the press, you can't do that without deceiving your employer. That's the very definition of this sort of "whistleblowing". Snowden has been quite open to the public about the lies he told his employer, including this one.

Rather than evidence that there's something wrong with Snowden, the way Snowden-haters (is that the opposite of "apologist"?) seize on this is evidence that they are a bit unhinged.


The next "lie" is the difference between the number of documents Greenwald says he received (10,000) and the number investigators claim were stolen (1.5 million). This is not the discrepancy that it seems. A "document" counted by the NSA is not the same as the number of "files" you might get on a thumb drive, which was shown the various ways of counting the size of the Chelsea/Bradley Manning leaks. Also, the NSA can only see which files Snowden accessed, not which ones were then subsequently copied to a thumb drive.

Finally, there is the more practical issue that Snowden cannot review the documents while at work. He'd have to instead download databases and copy whole directories to his thumb drives. Only away from work would he have the chance to winnow down which documents he wanted to take to Hong Kong, deleting the rest. Nothing Snowden has said conflicts with him deleting lots of stuff he never gave journalists, that he never took with him to Hong Kong, or took with him to Moscow.


The next "lie" is that Snowden claims the US revoked his passport after he got on the plane from Hong Kong and before he landed in Moscow.

This is factually wrong, in so far as the US had revoked his passport (and issued an arrest warrant) and notified Hong Kong of the revocation a day before the plane took off. However, as numerous news reports of the time reported, the US information [in the arrest warrant] was contradictory and incomplete, and thus Hong Kong did nothing to stop Snowden from leaving [*]. The Guardian [*] quotes a Hong Kong official as saying Snowden left "through a lawful and normal channel". Seriously, countries are much less concerned about checking passports of passenger leaving than those arriving.

It's the WSJ article that's clearly prevaricating here, quoting a news article where a Hong Kong official admits being notified, but not quoting the officials saying that the information was bad, that they took no action, and that Snowden left in the normal way.


The next item is Snowden's claim he destroyed all his copies of US secrets before going to Moscow. To debunk this, the WSJ refers to an NPR interview [*] with Frants Klintsevich, deputy chairman of the defense and security committee within the Duma at the time. Klintsevich is quoted as saying "Let's be frank, Snowden did share intelligence".

But Snowden himself debunks this:
The WSJ piece was written a week after this tweet. It's hard to imagine why they ignored it. Either it itself is a lie (in which case, it should've been added to the article), or it totally debunks the statement. If Klintsevich is "only speculating", then nothing after that point can be used to show Snowden is lying.

Thus, again we have proof that Epstein cannot be trusted. He clearly has an angle and bends evidence to service that angle, rather than being a reliable source of information.


I am no Snowden apologist. Most of my blogposts regarding Snowden have gone the other way, criticizing the way those like The Intercept distort Snowden disclosures in an anti-NSA/anti-USA manner. In areas of my experience (network stuff), I've blogged showing that those reporting on Snowden are clearly technically deficient.

But in this post, I show how Edward Epstein is clearly biased/untrustworthy, and how he adjusts the facts into a character attack on Snowden. I've documented it in a clear way that you can easily refute if I'm not correct. This is not because I'm a biased toward Snowden, but because I'm biased toward the truth.