March 4, 2024

How TCP congestion control saved the Internet • The Register

Systems approach With the annual SIGCOMM conference taking place this month, we saw that congestion control still occupies an hour on the agenda, 35 years after the first paper on TCP congestion control was published. So it seems a good time to appreciate the extent to which the Internet’s success has depended on its approach to managing congestion.

After my recent talk and article on 60 Years of Networking, which focused almost exclusively on the Internet and ARPANET, I received quite a bit of feedback about several networking technologies competing for dominance at the same time.

These included the OSI stack (anyone remember CLNP and TP4?), the Colored Book protocols (including Cambridge Ring), and of course ATM (asynchronous transfer mode), which was actually the first network protocol in which I worked in depth. It’s hard to understand now, but in the 1980s I was one of many people who thought ATM could be the packet-switching technology that would take over the world.

I rate congestion control as one of the key factors that allowed the Internet to progress from a moderate scale to a global scale.

ATM advocates used to refer to existing technologies, such as Ethernet and TCP/IP, as “legacy” protocols that, if necessary, could be carried over the global ATM network once established. One of my fond memories from those days is of Steve Deering (an IP networking pioneer) boldly (and correctly) stating that ATM would never be successful enough to even be a legacy protocol.

One of the reasons I skipped these other protocols when retelling the story of networks was simply to save space; It’s a little-known fact that my Systems Approach colleague Larry Peterson and I strive for brevity, especially since we received a one-star review on Amazon. who called our book “a wall of text.” But I also focused on how we got to today’s Internet, where TCP/IP has effectively surpassed other protocol suites to achieve global (or near-global) penetration.

There are many theories as to why TCP/IP was more successful than its contemporaries, and they are not easily verifiable. Most likely, there were many factors that influenced the success of Internet protocols. But I rate congestion control as one of the key factors that allowed the Internet to progress from a moderate scale to a global scale.

It is also an interesting study in how particular architectural choices made in the 1970s proved effective in the decades that followed.

Distributed resource management

In David Clark’s article [PDF] “The Design Philosophy of DARPA Internet Protocols”, a stated design goal is: “The architecture of the Internet should enable distributed management of its resources.” There are many different implications of that goal, but the way Jacobson and Karels [PDF] Congestion control first implemented in TCP is a good example of taking that principle seriously.

Their approach also encompasses another Internet design goal: accommodating many different types of networks. Together, these principles virtually rule out the possibility of any type of network-based admission control, in stark contrast to networks such as ATM, which assumed that a request for resources would be made from an end system to the network before they were sent. the data. could flow.

Part of the “adapt to many types of networks” philosophy is that you can’t assume that all networks have admission control. Add to this distributed resource management, and the result is that congestion control is something that end systems must handle, which is exactly what Jacobson and Karels did with their initial changes to TCP.

We are trying to get millions of end systems to cooperatively share the bandwidth of bottleneck links in some fair way.

The history of TCP congestion control is long enough to fill a book (and we did), but work done in Berkeley, California, from 1986 to 1998 casts a long shadow, and Jacobson’s 1988 SIGCOMM paper It is among the most cited networking articles of all. time.

Slow start, AIMD (additive increase, multiplicative decrease), RTT estimation, and using packet loss as a congestion signal were all in that paper, laying the foundation for the next decades of congestion control research. . I think one of the reasons for the influence of that document is that the foundation it laid was solid, while leaving a lot of room for future improvements, as we see in the continuing efforts to improve congestion control today.

And the problem is fundamentally difficult: we are trying to get millions of end systems that have no direct contact with each other to cooperatively share the bandwidth of the bottleneck links in some moderately fair way using only the information that can be obtained by the package shipping. on the network and observe when and if they arrive at their destination.

Arguably one of the biggest advances after 1988 was the realization by Brakmo and Peterson (yes, that guy) that packet loss was not the only sign of congestion: so was increasing delay. This was the basis of the 1994 TCP Vegas paper, and the idea of ​​using delay instead of loss alone was quite controversial at the time.

However, Vegas started a new trend in congestion control research, inspiring many other efforts to take delays into account as an early indicator of congestion before stall occurs. Data Center TCP (DTCCP) and Google BBR are two examples.

One of the reasons I credit congestion control algorithms for explaining the success of the Internet is that the path to Internet failure was clearly shown in 1986. Jacobson describes some of the first episodes of congestion collapse , in which performance fell by three orders. of magnitude.

When I joined Cisco in 1995, we were still hearing customer stories about catastrophic congestion events. The same year, Bob Metcalfe, inventor of Ethernet and recent Turing Award winner, predicted that the Internet would collapse as consumer access to the Internet and the rise of the web drove rapid growth in traffic. It was not so.

Congestion control has continued to evolve; The QUIC protocol, for example, offers better mechanisms for detecting congestion and the option to experiment with multiple congestion control algorithms. And some of the congestion control has been moved to the application layer, for example: Dynamic Adaptive Streaming over HTTP (DASH).

An interesting side effect of the congestion episodes of the 1980s and 1990s was that we observed that small reserves were sometimes the cause of congestion collapse. An influential paper by Villamizar and Song showed that TCP performance decreased when the amount of buffering was less than the average delay x bandwidth product of flows.

Unfortunately, the result only held for a very small number of flows (as acknowledged in the paper), but it was widely interpreted as an inviolable rule that influenced subsequent years of router design.

This was finally disproved by Appenzeller’s buffer sizing work. et al in 2004, but not before the unfortunate phenomenon of Bufferbloat (truly excessive buffer sizes causing massive queuing delays) reached millions of low-end routers. It’s worth taking a look at Bufferbloat self-testing on your home network.

So while we can’t go back and perform controlled experiments to see exactly how the Internet became successful while other protocol suites fell by the wayside, we can at least see that the Internet avoided potential failures due to congestion in a timely manner. Added control.

In 1986 it was relatively easy to experiment with new ideas by modifying the code on a couple of end systems and then bringing the effective solution to a broad set of systems. Nothing within the network had to change. It almost certainly helped that the set of operating systems that needed to be changed and the community of people who could make those changes was small enough to see widespread implementation of Jacobson and Karels’ initial BSD-based algorithms.

It seems clear that there is no perfect approach to congestion control, which is why we continue to see new articles on the topic 35 years after Jacobson’s. But the architecture of the Internet has fostered the environment in which effective solutions to achieve distributed management of shared resources can be tested and implemented.

In my opinion, that is a great testament to the quality of that architecture. ®

Leave a Reply

Your email address will not be published. Required fields are marked *