Sunday, November 18, 2018

Some notes about HTTP/3

HTTP/3 is going to be standardized. As an old protocol guy, I thought I'd write up some comments.

Google (pbuh) has both the most popular web browser (Chrome) and the two most popular websites (#1 Google.com #2 Youtube.com). Therefore, they are in control of future web protocol development. Their first upgrade they called SPDY (pronounced "speedy"), which was eventually standardized as the second version of HTTP, or HTTP/2. Their second upgrade they called QUIC (pronounced "quick"), which is being standardized as HTTP/3.


SPDY (HTTP/2) is already supported by the major web browser (Chrome, Firefox, Edge, Safari) and major web servers (Apache, Nginx, IIS, CloudFlare). Many of the most popular websites support it (even non-Google ones), though you are unlikely to ever see it on the wire (sniffing with Wireshark or tcpdump), because it's always encrypted with SSL. While the standard allows for HTTP/2 to run raw over TCP, all the implementations only use it over SSL.

There is a good lesson here about standards. Outside the Internet, standards are often de jure, run by government, driven by getting all major stakeholders in a room and hashing it out, then using rules to force people to adopt it. On the Internet, people implement things first, and then if others like it, they'll start using it, too. Standards are often de facto, with RFCs being written for what is already working well on the Internet, documenting what people are already using. SPDY was adopted by browsers/servers not because it was standardized, but because the major players simply started adding it. The same is happening with QUIC: the fact that it's being standardized as HTTP/3 is a reflection that it's already being used, rather than some milestone that now that it's standardized that people can start using it.

QUIC is really more of a new version of TCP (TCP/2???) than a new version of HTTP (HTTP/3). It doesn't really change what HTTP/2 does so much as change how the transport works. Therefore, my comments below are focused on transport issues rather than HTTP issues.

The major headline feature is faster connection setup and latency. TCP requires a number of packets being sent back-and-forth before the connection is established. SSL again requires a number of packets sent back-and-forth before encryption is established. If there is a lot of network delay, such as when people use satellite Internet with half-second ping times, it can take quite a long time for a connection to be established. By reducing round-trips, connections get setup faster, so that when you click on a link, the linked resource pops up immediately

The next headline feature is bandwidth. There is always a bandwidth limitation between source and destination of a network connection, which is almost always due to congestion. Both sides need to discover this speed so that they can send packets at just the right rate. Sending packets too fast, so that they'll get dropped, causes even more congestion for others without improving transfer rate. Sending packets too slowly means unoptimal use of the network.

How HTTP traditionally does this is bad. Using a single TCP connection didn't work for HTTP because interactions with websites require multiple things to be transferred simultaneously, so browsers opened multiple connections to the web server (typically 6). However, this breaks the bandwidth estimation, because each of your TCP connections is trying to do it independently as if the other connections don't exist. SPDY addressed this by its multiplexing feature that combined multiple interactions between browser/server with a single bandwidth calculation.

QUIC extends this multiplexing, making it even easier to handle multiple interactions between the browser/server, without any one interaction blocking another, but with a common bandwidth estimation. This will make interactions smoother from a user's perspective, while at the same time reduce congestion that routers experience.

Now let's talk user-mode stacks. The problem with TCP, especially on the server, is that TCP connections are handled by the operating system kernel, while the service itself runs in usermode. Moving things across the kernel/usermode boundary causes performance issues. Tracking a large number of TCP connections causes scalability issues. Some people have tried putting the services into the kernel, to avoid the transitions, which is a bad because it destabilizes the operating system. My own solution, with the BlackICE IPS and masscan, was to use a usermode driver for the hardware, getting packets from the network chip directly to the usermode process, bypassing the kernel (see PoC||GTFO #15), using my own custom TCP stack. This has become popular in recent years with the DPDK kit.

But moving from TCP to UDP can get you much the same performance without usermode drivers. Instead of calling the well-known recv() function to receive a single packet at a time, you can call recvmmsg() to receive a bunch of UDP packets at once. It's still a kernel/usermode transition, but one amortized across a hundred packets received at once, rather a transition per packet.

In my own tests, you are limited to about 500,000 UDP packets/second using the typical recv() function, but with recvmmsg() and some other optimizations (multicore using RSS), you can get to 5,000,000 UDP packets/second on a low-end quad-core server. Since this scales well per core, moving to the beefy servers with 64 cores should improve things even further.

BTW, "RSS" is a feature of network hardware that splits incoming packets into multiple receive queues. The biggest problem with multi-core scalability is whenever two CPU cores need to read/modify the same thing at the same time, so sharing the same UDP queue of packets becomes the biggest bottleneck. Therefore, first Intel and then other Ethernet vendors added RSS giving each core it's own non-shared packet queue. Linux and then other operating systems upgraded UDP to support multiple file descriptors for a single socket (SO_REUSEPORT) to handle the multiple queues. Now QUIC uses those advances allowing each core to manage it's own stream of UDP packets without the scalability problems of sharing things with other CPU cores. I mention this because I personally had discussions with Intel hardware engineers about having multiple packet queues back in 2000. It's a common problem and an obvious solution, and it's been fun watching it progress over the last two decades until it appears on the top end as HTTP/3. Without RSS in the network hardware, it's unlikely QUIC would become a standard.

Another cool solution in QUIC is mobile support. As you move around with your notebook computer to different WiFI networks, or move around with your mobile phone, your IP address can change. The operating system and protocols don't gracefully close the old connections and open new ones. With QUIC, however, the identifier for a connection is not the traditional concept of a "socket" (the source/destination port/address protocol combination), but a 64-bit identifier assigned to the connection.

This means that as you move around, you can continue with a constant stream uninterrupted from YouTube even as your IP address changes, or continue with a video phone call without it being dropped. Internet engineers have been struggling with "mobile IP" for decades, trying to come up with a workable solution. They've focused on the end-to-end principle of somehow keeping a constant IP address as you moved around, which isn't a practical solution. It's fun to see QUIC/HTTP/3 finally solve this, with a working solution in the real world.

How can use use this new transport? For decades, the standard for network programing has been the transport layer API known as "sockets". That where you call functions like recv() to receive packets in your code. With QUIC/HTTP/3, we no longer have an operating-system transport-layer API. Instead, it's a higher layer feature that you use in something like the go programming language, or using Lua in the OpenResty nginx web server.

I mention this because one of the things that's missing from your education about the OSI Model is that it originally envisioned everyone writing to application layer (7) APIs instead of transport layer (4) APIs. There was supposed to be things like application service elements that would handling things like file transfer and messaging in a standard way for different applications. I think people are increasingly moving to that model, especially driven by Google with go, QUIC, protobufs, and so on.

I mention this because of the contrast between Google and Microsoft. Microsoft owns a popular operating system, so it's innovations are driven by what it can do within that operating system. Google's innovations are driven by what it can put on top of the operating system. Then there is Facebook and Amazon themselves which must innovate on top of (or outside of) the stack that Google provides them. The top 5 corporations in the world are, in order, Apple-Google-Microsoft-Amazon-Facebook, so where each one drives innovation is important.

Conclusion

When TCP was created in the 1970s, it was sublime. It handled things, like congestion, vastly better than competing protocols. For all that people claim IPv4 didn't anticipate things like having more than 4-billion addresses, it anticipated the modern Internet vastly better than competing designs throughout the 70s and 80s. The upgrade from IPv4 to IPv6 largely maintains what makes IP great. The upgrade from TCP to QUIC is similarly based on what makes TCP great, but extending it to modern needs. It's actually surprising TCP has lasted this long, and this well, without an upgrade.




20 comments:

Tiwy said...

Just a question: if QUIC is more TCP/3 than HTTP/3, could it be used for other purposes? For example multiplayer games, IoT protocols, streaming and such? Does it have to include the HTTP path, headers and body structure?

Unknown said...

How does the security look?

Unknown said...

How does QUIC work when the client is behind firewalls or IP translation gateways?

Unknown said...

@Tiwy Google already uses streaming (WebRTC) over QUIC for their Duo product.

Scott Brickey said...

I agree with Tiwy... seems that using QUIC as an alternative to TCP or UDP would have value in other spaces such as VoIP calling from cell phones, specifically during the transition between networks / network types, such as LTE to WiFi... aside from the implementation, is there something i'm missing in my understanding of the opportunity?

Sudsy said...

Minor typo:

"giving each core it's own non-shared packet queue." ->
"giving each core its own non-shared packet queue."

Sudsy said...

And two others:

"to manage it's own stream of UDP" ->
"to manage its own stream of UDP"

"so it's innovations are driven" ->
"so its innovations are driven"

Fazal Majid said...

The problem is fairness in the presence of network congestion. To a large extent it depends on most TCP implementations using the same congestion control algorithm, or at least algorithms that have the same general behavior. Google's developed a new algorithm called BBR that is robust, but also unfair. When a TCP connection implementing the NewReno algorithm shares a congested link with another one implementing BBR, the BBR grabs the lion's share of the bandwidth:

https://ripe76.ripe.net/presentations/10-2018-05-15-bbr.pdf

QUIC specifies NewReno as default and mentions CUBIC, but the choice of algorithm is left to the implementation. I can easily envision Google using BBR for connections between Chrome and Google properties, which means Google traffic would be prioritized over competitors'. Over time, more players would implement BBR in a race to the bottom (or a tragedy of the commons) and Internet brown-outs as in the 1980s and 1990s would come back.

Joe Klein said...

Quic or http/3 along with PIMv2 and IPv6's end-to-end model would contribute significantly improve 1-to-many and many-to-many audio, video, and VR distribution. Just saying.

Nils said...

To several commenters: QUIC uses UDP, which is already used in gaming and VoIP streaming. UDP has no delivery guarantees, it's fire-and-forget. QUIC adds delivery guarantees, but this is often not needed in the aforementioned scenarios, where real-time is more important. E.g. a lost packet is already stale when using VoIP or gaming.

Nithin said...

if it uses UDP why is it mentioned as TCP/3 & not HTTP over UDP?

ranasing rajkumar said...

Thank you for the link building list.I am going jot down this because it will help me a lot.Great blog! Please keep on posting such blog.
white label website builder

user9438 said...

So this is HTTP over UDP with a subset of TCP's features like guarantee of delivery etc? If so why call it TCP/2?

Unknown said...

"Why call it TCP/2?" Because whoever wrote this post likes to be misleading.

There's nothing new about reimplementing TCP on top of UDP and it's what many applications have done for a long time. VoIP does some of it (f.e. SIP reimplements the whole shebang for "control" packets - with ACKs and sequence numbers and so forth - but drops down to bare UDP for actual call data). Games generally have sequences and ACKs (or repeat-backs) for at least some operations.


Also, UDP makes you much more likely to run into NAT issues, particularly with longer-lived conversations which may outlast the state table entry on the router. This is why games, VoIP, etc so frequently have issues with NAT. However I don't expect that to really matter with QUIC given that the conversation will likely be forgotten by the server long before your NAT device.

TheThagenesis said...

Quic solves problems that do not exist. the need for multiplexing is there if you need a lot of content from the same server but HTTP pipelining already exists and reality of the Internet nowadays is that sites load a ton of external resources from somewhere else. ads, trackers, Javascript libraries, Facebook buttons, you name it!

Kevin Doyon said...

@TheThagenesis Just because you can't envision what problems it solves, it doesn't mean it doesn't solve anything.

For example what about the packet loss that easily happens using WiFi/cell? Using TCP, packet loss could significantly slow down transfer since it needs to stop and wait for the dropped packets. It will also cause you and the server to think you are transferring too much (congestion) when it isn't the case, so packets will be sent more slowly.

Robert Kingston said...

Does using a connection identifier rather than sockets mean it's possible for a single stream to saturate 2 network gateways of a device?

Eg. Say you have 2 internet connections on your network or device. With tcp, connections persist over the gateway they were established through. What's stopping QUIC from using both simultaneously?

Zan Lynx said...

@Fazal Majid BBR does use more bandwidth more efficiently without the spiky drops that other TCP congestion algorithms end up with. They tend to see a packet drop and panic, while BBR knows the effective pipe size is not likely to change that dramatically so it adjusts at a slower rate.

But Google and others have done extensive testing of BBR and it does share mostly fairly with other TCP types. With other BBR flows it is fair. Using all BBR everywhere would not overload the Internet.

Aniketh Gireesh said...

Foremost, HTTP/3 is not QUIC. More to relate as the transport of HTTP over QUIC. And for the second part of your question, QUIC was developed with the mindset to improve such streaming and protocol experience with lower overhead as of UDP rather than TCP.

Aniketh Gireesh said...

QUIC integrates TLS 1.3 as of now :)