In order to grok the concept of scalability, I've drawn a series of graphs. Talking about "scalability" is hard because we translate those numbers into "performance". But the two concepts are unrelated. We say things like "NodeJS is 85% as fast as Nginx", but speed doesn't matter, scalability does. The important difference in those two is how they scale, not how they perform. I'm going to show these graphs in this post.
Consider the classic Apache web server. We typically benchmark the speed in terms of "requests per second". But, there is another metric of "concurrent connections". Consider a web server handling 1,000 requests-per-second, and that it takes 5 seconds to complete a request. That means at any point in time, on average, there will be 5,000 pending requests, or 5,000 concurrent connections. But Apache has a problem. It creates a thread to handle each connection. The operation system can't handle a lot of threads. So, once there are too many connections, performance falls off a cliff.
This is shown in the graph below. This is a hypothetical graph rather than a real benchmark, but it's about what you'll see in the real world. You'll see that around the 5000 connections point, performance falls off a cliff.
Let's say that you are happy with the performance at 5000 connections, but you need the server to support up to 10,000 connections. What do you do? The naive approach is to simply double the speed of the server, or buy a dual-core server.
But this naive approach doesn't work. As shown in the graph below, doubling performance doesn't double scalability.
While this graph shows a clear doubling of performance, it only shows an increase of about 20% in terms scalability, handling about 6000 connections.
The same is true if we increase performance 4 times, 8 times, or even 16 times, as shown in the graph below. Even using a server 16 times as fast as the original, we still haven't even doubled scalability.
The solution to this problem isn't faster hardware, but changing the software to scale. Consider some server software that is a lot slower than Apache, but whose performance doesn't drop off so quickly when there are lots of connections to the server. The graph would look like the orange line in the following graph:
This orange line could be a server running NodeJS on my laptop computer. Even though it's slower than a real beefy expensive 32-core server you spent $50,000, it'll perform faster when you've got a situation that needs to handle 10,000 concurrent connections.
The point I'm trying to make is that "performance" and "scalability" are orthogonal problems. When trying to teach engineers how to fix scalability, their most common question is "but won't that hurt performance". The answer is that it almost doesn't matter how much you hurt performance as long as it makes the application scale.
I've encapsulated all this text into the following picture. When I have a scalability problem, I can't solve it by increasing performance, as shown by all the curvey lines. Instead, I have to fix the problem itself -- even if it means lower performance, as shown by the orange line. The moral of the story is that performance and scalability are orthogonal. This is the one picture you need to remember when dealing with scalability.
I use the "notebook computer running NodeJS" as an example because it's frankly unbelievable. When somebody has invested huge amounts of money in big solutions, they flat out refuse to believe that a tiny notebook computer can vastly outperform them at scale. In the dumber market (government agencies, big corporations) the word "scale" is used to refer to big hardware, not smarter software.
In recent years, interest in scalable servers have started to grow rapidly. This is shown in the following Netcraft graph of the most popular web servers. Apache still dominates, but due to its scalability problems, it's dropping popularity. Nginx, the most popular scalable alternative, has been growing rapidly.
Nginx is represented by the green line the above graph, and you scan see how it's grown from nothing to becoming the second most popular web server in the last 5 years.
But even Nginx has limits. It scales to about 100,000 concurrent connections before operating system limits prevent further scalability. This is far below what the hardware is capable of. Today's hardware, even heap desktop computers, can scale to 10 million connections, or 100 times what Nginx is practically capable of. I call this the "The C10M Problem", referring to how software is still far below the capability of the hardware.
Talking about scalability is hard. It's easier to understand if you can visualize it. That's why I put together these graphs. I want to use these graphs in future presentations. But, if I store them on my hard disk, I'm likely to lose them. Therefore, I'm posting these to the intertubes so that I can just use the google to find them later. You are free to use these graphs too, if you want, with no strings attached, though I wouldn't mind credit when it's not too much trouble.