Tuesday, March 22, 2011

The Internet as Infrastructure

Today, when one connects an application, system, or network to the public networks, one is adding to the "system of public works," that is to "infrastructure," of the nation and the world.

The standards for building infrastructure, such as bridges, tunnels, and dams, are different from those for other artifacts. Infrastructure must not fall of its own weight, it should not fail in normal use or under normal load, and must resist "easily anticipated abuse and misuse." A suspension bridge must not fall because a driver falls asleep and an eighteen wheeler goes over the side.

Notice that the abuse and misuse that can be easily anticipated today, is much worse than when we began the Internet. Were it not so, we might have done many things differently.

We call the resultant necessary property of infrastructure resiliency, rather than security, but the properties are related.

For any artifact, there are limits to the complexity, scale, load, and simultaneous component failures that the mechanism can be expected to survive. How many simultaneous sleepy drivers and plunging eighteen wheelers must a bridge be designed to survive.

When those limits are reached, what we want to happen is that the mechanism fail in such a way that damage is limited and the mechanism can be restored to operation as quickly as possible.

The three Great Northeastern Blackouts, of which August 14, 2003 was the latest, are examples. It is interesting that engineers see these blackouts as successes while the public and their surrogates, journalists and politicians, see them as failures.

All three were caused by multiple simultaneous and cascading component failures under conditions of heavy load. In all three cases the system failed in such a way that it was restored to a ninety percent service level in a day. While all three were spectacular and exciting, the damage was not nearly so severe as one might expect from a major ice storm.

This is the way that we would like the public networks to fail. In fact, so far, that is what we have seen. We have had massive local failures of the PSTN where it took days to weeks to restore to a ninety percent service level. Most of these were fire related and local. We have had one that was national and caused by a software change. We recovered from this one in hours.

To date, we have had a number of local failures of the Internet, all man-made (mostly caused by the infamous "cable-seeking backhoes or boat anchors"); most were accidental. We recovered from all of these in days. SQL/Slammer was man-made, malicious, and software related; it caused a noticeable drop in service for hours. However, there was not really a discontinuity of service.

It should be noted that SQL/Slammer was a homogenous attack. That is, every instance of it looked the same. This made it relatively easy to construct and deploy filters that would resist its flow while not interfering with normal traffic. However, it is fairly easy to visualize a heterogeneous attack that might overwhelm this remedy.

So, there is wide-spread concern that there might be a malicious software-based attack that would bring down the entire Internet. To some degree this is angst, an unfocused apprehension rooted in intuition or ignorance. However, it is shared by many who are knowledgeable. Their concern is rooted in the (often unidentified and un-enumerated) facts that:

* the Internet evolved; it was not designed and deployed
* switching in the network is software-based,
* operation of the components is homogenous
* operation of network management controls is in-band
* users often have default access to management controls
* the topology is both open and flat
* paths in the network are ad hoc and adaptive
* connection policy is permissive,
* most of the nodes in the network are un-trusted and a large number are under malicious control.
* access is open and cheap
* identity of both components and users is unreliable
* ownership and management is decentralized
* other

If the impact of these things on the resiliency of the Internet were as obvious prospectively as it is retrospectively, we might have done things differently. On the other hand, we might not have. A little discussion is in order.

Unlike the PSTN, the Internet is packet, rather than circuit, switched. The intent of this was to make the network more resilient in the face of node or link failures.

The routers and switches may be software running on von Neumann architecture general-purpose computers. This may make the network more resistant to component failure while making the components more vulnerable to malicious attack.

We have become accustomed to the idea that software processes are vulnerable to interference or contamination by their data, i.e., the software in the switch can be contaminated by its traffic. This exposes us to attacks intended to exploit, interfere with, or take control of switches and routers.

This may be aggravated by the fact that so many routers and switches look the same. While there are hundreds of products, most of them present controls that are operated via the Border Gateway Protocol (BGP). An attack that can take control of one might be able to take control of many.

Even most non-switch nodes in the network look the same, that is, like Windows or Unix (rather than, for example, MVS or OS/400.) These two operating systems are open, historically broken, and have a commitment to backward compatibility that makes them difficult to fix. Historically they have shipped with unsafe defaults and have been corrupted within minutes of being connected to the Internet. The result has been that there are millions of corrupt nodes in the Internet that are under the control of malicious actors.

Operation of the routers and switches (and other network nodes) is via the network itself; they can be operated from almost any node in the network. Many are hidden, if at all, only by a password, often weak or even default. Thus, it might be possible to coordinate simultaneous mis-operation of many nodes at the same time.

The Internet is open to as to user, attachment, protocol, and application. The cost of a connection to the Internet is a function of the bandwidth or load but the cost of a relatively fast persistent connection is in the tens of dollars per month, about the same as a dial connection a decade ago.

While one must demonstrate the ability to pay, usually with a credit card, the credit card may be stolen, and, depending on the provider, the name in which the connection is registered may not have to be the same as that on the credit card. In short, almost anyone can add a node to the Internet with minimal checks on their identity or bona fides. There will be bad actors.

The only thing that is required to add a new protocol or application to the Internet is that at least two nodes agree on it and that it can be composed from IP packets. Use of load-intensive protocols and applications for streaming audio and video were added to other protocols and applications with no changes to the underlying infrastructure. We have seen DoS attacks that relied upon minor changes to protocols and their use.

At least in theory, the topology of Internet is "flat," as opposed to structured or hierarchical. That is, at least in theory and with few exceptions, any node in the Internet can send a packet to any other node in the Internet. The time and cost to send a packet between any two nodes chosen at random is roughly the same as for any other pair of nodes.

Said another way, both the time and cost to send a packet are independent of distance. One implication of this is that attacks are cheap, can originate anywhere, and can attack anything attached.

Paths in the Internet are determined late, possibly on a packet by packet basis, and adapt to changes in load or control settings. The intent is that there be so many potential paths between A and B that at least one will always be available and that it will be discovered and used. While the intent is to make the network resistant to node and link failures, an unintended consequence is that it is difficult to resist the flow of attack traffic.

The original policies of the Internet were promiscuous (as opposed to permissive or restrictive); not only was any packet and flow permitted but there were no controls in place to resist them. This was essential to the its triumph over competitors like SNA and may have been necessary to its success.

While controls have been added as the scale has grown, the policy is still permissive, rather than restrictive, i.e., everything is allowed that is not explicitly forbidden.

Said another way, all traffic is presumed to be benign until shown otherwise. Attack traffic can flow freely until identified and restricted.

Finally, while most of the nodes in the Internet are untrusted, and we know that many are corrupted and under hostile control, all are given the benefit of the doubt. To date there has been little effort to identify and eliminate those that have been corrupted. Therefore there remains a possibility that these corrupt systems can be marshaled in such a way as to deny the use of network to all, or some targeted group, of users.

The Internet is robust, not fragile. It is resistant to both natural and accidental artificial events. However, To the extent that the above things are, and remain, true, the Internet, and indirectly, the nations, economies, institutions and individuals that rely upon, it are vulnerable to abuse and misuse; concern is justified, if not proportionate.

While these characteristics are pervasive and resistant to change, while they were often chosen for good reason, they are not fixed or required and can be changed. Understanding them and how they might be changed is key to making the Internet as resistant to abuse and misuse as it is to component failure or destruction.

It suggests that the network must become both less open, not to say, closed, and more structured. The management controls must be protected and taken out of band. The policy must become much more restrictive. We must identify our users and customers and hold them accountable for their traffic.

To bring the Internet to infrastructure standards, we must overcome not only inertia but also culture. Each of us must exercise our influence on our employers, clients, and vendors to move the Internet to the same standards that we expect of skyscrapers, bridges, tunnels, and dams. Since there is no one else to do it, we are called professionals and are paid the big bucks.

No comments:

Post a Comment