Tuesday, March 29, 2011

Lessons from the RSA Breach

As has already been reported on IGTV, RSA, the security division of EMC and the vendor of the SecurID two-factor authentication system and identity management services, has suffered a network breach. The limited disclosures suggest that this was a patient and resourceful attack intended to gain access to intellectual property. More specifically, RSA does not deny that at least one target was information about the SecurID system.

Whether or not this is a security problem, it has certainly been a public relations disaster. It is so bad that one government agency, that is a SecurID customer, has announced that they will switch to another product. Whether or not RSA has done the right thing in this case, it is clear that no one is happy with the way that they have handled it.

This is a case study in how difficult it is to handle a breach. The otherwise disinterested curious want full disclosure. On the other hand, the victim would like as little disclosure as possible. Customers want to know but do not want anyone else to know.

I am reminded of Franklin National Bank. A rogue trader lost about $50M dollars of the bank's money, painful but still only a fraction of the bank's capital. The bank managed to keep the loss "secret" for about ninety days. At that point, the Wall Street Journal reported it. In the next ninety days, the bank lost $2B in deposits and it failed. It could have survived the loss but was killed by the publicity.

As this case illustrates, the first concern that a victim has is to ensure that the publicity is not worse than the breach. What could be worse for a security company than to have to admit to ineffective security or a breach that reduces the effectiveness of products that they have already sold.

However, in fairness to RSA, they have other concerns. As a security company, they have an obligation to their customers to tell them about anything that diminishes the security that they think that they have purchased. They also have a responsibility to not make the situation worse by unnecessary disclosure.

They have a responsibility to cooperate with law enforcement. They want to protect the investigative process and the utility of the product of the investigation.

Now add to this that they really are not sure of the extent of the damage. The longer they delay disclosure, the more they know, the more certain they are of what they know. However, for more sophisticated and patient attacks, one may never be confident about the extent of its success or what information has been compromised.

Note that, as a target they owe a certain duty to peer target enterprises to share information that might be useful in protecting themselves. As a security company, they owe a certain duty to the security community at large to share information necessary to judge the effectiveness, or damage thereto, of the products and services that they offer.

As a vendor, they owe a duty to their customers. However, this duty may be different to those who purchase the SecurID tokens and servers and those to whom they also provide identity management and authentication services.

As security professionals, we can sympathize with this over-constrained problem. Few among us would like to be confronted with such a dilemma. None of this is to say that RSA has done the right thing or that this is not a PR disaster of epic proportions but only that we may never know enough to fairly judge what they have done. Microsoft has never divulged the details of the compromise of their development system.

If you are merely among the curious, a peer company, security professional, or prospective customer of RSA, you may never know what really happened.

You should know that:

Six pieces of special knowledge are necessary to successfully authenticate to the RSA system:

* the (address of the) system that will accept the credential
* the user ID
* the PIN or passphrase
* the seed value
* the algorithm
* the association or bind among the first four

RSA does not know all of these things. Therefore, while a compromise of its systems might reduce the cost of attack, it cannot make it free or even trivial.

The algorithm has been reverse engineered and software that implements it is available for download.

The token is both a forgery-resistant artifact and a mechanism for resisting re-play. Knowledge of the seed lowers the cost of forgery but does not lower the cost of replay.

Since its compromise, RSA has encouraged all of its customers to monitor their authentication servers for evidence of attack against PINs and to encourage their users to employ strong PINs. This is good practice in any case but might be more important if there was reason to believe that any seeds have been compromised.

Under NDA, RSA has told some customers more. If you are a customer and if you are willing to agree not to share what they tell you, RSA may tell you more about the compromise. Note that, since you cannot discuss with others, you cannot verify everything, perhaps anything, that they tell you.

Finally, If one is using strong authentication and one is compromised, the most likely cause is that someone took bait and compromised the network.

The bad news is that RSA may never know exactly what happened; the rest of us will definitely never know.

The good news is that we know enough.

Most of us need only get over the fact that we will never know.

Most users of the tokens need not do anything.

Those of you whose principals are peer targets of RSA must talk to RSA and request a remedy. On the low side the remedy may be nothing. On the high side it may be replacement and re-enrollment of any compromised tokens. Under normal circumstances, one might have weeks to months to get this done. However, since we do not know when the breach took place, days to weeks is a safer time-frame.

A colleague of mine, one who knows this space and this company better than most, wonders that there should be any doubt, that token seeds would ever be connected to the enterprise network. Can you say hardened system?

I am uncomfortable with the expression "Advanced Persistent Threat' but the clear implication of it is that, at least for some identifiable set of enterprises, the threat environment has changed by an order of magnitude.

The heavy, not to say exclusive, reliance on perimeter security that we have used for a generation is no longer adequate. Real defense in depth must be the new order of the day. Defense in depth implies identification of the "crown jewels." It implies that the compromise of one, two, or even three or four defenses should not compromise them. It implies that no single insider can compromise them on purpose, much less by accident or error.

Since, based upon the Verizon data breach report, the time to detection of a compromise is measured in weeks-to-months, this data must be protected based on the assumption that there are compromised systems on the enterprise network. Some data must be behind an air-gap.

Systems and users that access external objects, for example, e-mail messages or web pages, may have to use application-only or locked-down systems to reduce compromise by taking bait. VPNs must terminate on the application, not on the perimeter, not on an operating system. These may be just some of the hard choices we will have to make.

Our choice is to adapt our security strategy to deal with the higher threat level or our public relations strategy to deal with the kind of breach that RSA is dealing with now. It is a difficult dilemma but that is why we are called professionals and are paid the big bucks.

Tuesday, March 22, 2011

The Internet as Infrastructure

Today, when one connects an application, system, or network to the public networks, one is adding to the "system of public works," that is to "infrastructure," of the nation and the world.

The standards for building infrastructure, such as bridges, tunnels, and dams, are different from those for other artifacts. Infrastructure must not fall of its own weight, it should not fail in normal use or under normal load, and must resist "easily anticipated abuse and misuse." A suspension bridge must not fall because a driver falls asleep and an eighteen wheeler goes over the side.

Notice that the abuse and misuse that can be easily anticipated today, is much worse than when we began the Internet. Were it not so, we might have done many things differently.

We call the resultant necessary property of infrastructure resiliency, rather than security, but the properties are related.

For any artifact, there are limits to the complexity, scale, load, and simultaneous component failures that the mechanism can be expected to survive. How many simultaneous sleepy drivers and plunging eighteen wheelers must a bridge be designed to survive.

When those limits are reached, what we want to happen is that the mechanism fail in such a way that damage is limited and the mechanism can be restored to operation as quickly as possible.

The three Great Northeastern Blackouts, of which August 14, 2003 was the latest, are examples. It is interesting that engineers see these blackouts as successes while the public and their surrogates, journalists and politicians, see them as failures.

All three were caused by multiple simultaneous and cascading component failures under conditions of heavy load. In all three cases the system failed in such a way that it was restored to a ninety percent service level in a day. While all three were spectacular and exciting, the damage was not nearly so severe as one might expect from a major ice storm.

This is the way that we would like the public networks to fail. In fact, so far, that is what we have seen. We have had massive local failures of the PSTN where it took days to weeks to restore to a ninety percent service level. Most of these were fire related and local. We have had one that was national and caused by a software change. We recovered from this one in hours.

To date, we have had a number of local failures of the Internet, all man-made (mostly caused by the infamous "cable-seeking backhoes or boat anchors"); most were accidental. We recovered from all of these in days. SQL/Slammer was man-made, malicious, and software related; it caused a noticeable drop in service for hours. However, there was not really a discontinuity of service.

It should be noted that SQL/Slammer was a homogenous attack. That is, every instance of it looked the same. This made it relatively easy to construct and deploy filters that would resist its flow while not interfering with normal traffic. However, it is fairly easy to visualize a heterogeneous attack that might overwhelm this remedy.

So, there is wide-spread concern that there might be a malicious software-based attack that would bring down the entire Internet. To some degree this is angst, an unfocused apprehension rooted in intuition or ignorance. However, it is shared by many who are knowledgeable. Their concern is rooted in the (often unidentified and un-enumerated) facts that:


* the Internet evolved; it was not designed and deployed
* switching in the network is software-based,
* operation of the components is homogenous
* operation of network management controls is in-band
* users often have default access to management controls
* the topology is both open and flat
* paths in the network are ad hoc and adaptive
* connection policy is permissive,
* most of the nodes in the network are un-trusted and a large number are under malicious control.
* access is open and cheap
* identity of both components and users is unreliable
* ownership and management is decentralized
* other


If the impact of these things on the resiliency of the Internet were as obvious prospectively as it is retrospectively, we might have done things differently. On the other hand, we might not have. A little discussion is in order.


Unlike the PSTN, the Internet is packet, rather than circuit, switched. The intent of this was to make the network more resilient in the face of node or link failures.

The routers and switches may be software running on von Neumann architecture general-purpose computers. This may make the network more resistant to component failure while making the components more vulnerable to malicious attack.

We have become accustomed to the idea that software processes are vulnerable to interference or contamination by their data, i.e., the software in the switch can be contaminated by its traffic. This exposes us to attacks intended to exploit, interfere with, or take control of switches and routers.

This may be aggravated by the fact that so many routers and switches look the same. While there are hundreds of products, most of them present controls that are operated via the Border Gateway Protocol (BGP). An attack that can take control of one might be able to take control of many.

Even most non-switch nodes in the network look the same, that is, like Windows or Unix (rather than, for example, MVS or OS/400.) These two operating systems are open, historically broken, and have a commitment to backward compatibility that makes them difficult to fix. Historically they have shipped with unsafe defaults and have been corrupted within minutes of being connected to the Internet. The result has been that there are millions of corrupt nodes in the Internet that are under the control of malicious actors.

Operation of the routers and switches (and other network nodes) is via the network itself; they can be operated from almost any node in the network. Many are hidden, if at all, only by a password, often weak or even default. Thus, it might be possible to coordinate simultaneous mis-operation of many nodes at the same time.

The Internet is open to as to user, attachment, protocol, and application. The cost of a connection to the Internet is a function of the bandwidth or load but the cost of a relatively fast persistent connection is in the tens of dollars per month, about the same as a dial connection a decade ago.

While one must demonstrate the ability to pay, usually with a credit card, the credit card may be stolen, and, depending on the provider, the name in which the connection is registered may not have to be the same as that on the credit card. In short, almost anyone can add a node to the Internet with minimal checks on their identity or bona fides. There will be bad actors.

The only thing that is required to add a new protocol or application to the Internet is that at least two nodes agree on it and that it can be composed from IP packets. Use of load-intensive protocols and applications for streaming audio and video were added to other protocols and applications with no changes to the underlying infrastructure. We have seen DoS attacks that relied upon minor changes to protocols and their use.

At least in theory, the topology of Internet is "flat," as opposed to structured or hierarchical. That is, at least in theory and with few exceptions, any node in the Internet can send a packet to any other node in the Internet. The time and cost to send a packet between any two nodes chosen at random is roughly the same as for any other pair of nodes.

Said another way, both the time and cost to send a packet are independent of distance. One implication of this is that attacks are cheap, can originate anywhere, and can attack anything attached.

Paths in the Internet are determined late, possibly on a packet by packet basis, and adapt to changes in load or control settings. The intent is that there be so many potential paths between A and B that at least one will always be available and that it will be discovered and used. While the intent is to make the network resistant to node and link failures, an unintended consequence is that it is difficult to resist the flow of attack traffic.

The original policies of the Internet were promiscuous (as opposed to permissive or restrictive); not only was any packet and flow permitted but there were no controls in place to resist them. This was essential to the its triumph over competitors like SNA and may have been necessary to its success.

While controls have been added as the scale has grown, the policy is still permissive, rather than restrictive, i.e., everything is allowed that is not explicitly forbidden.

Said another way, all traffic is presumed to be benign until shown otherwise. Attack traffic can flow freely until identified and restricted.

Finally, while most of the nodes in the Internet are untrusted, and we know that many are corrupted and under hostile control, all are given the benefit of the doubt. To date there has been little effort to identify and eliminate those that have been corrupted. Therefore there remains a possibility that these corrupt systems can be marshaled in such a way as to deny the use of network to all, or some targeted group, of users.

The Internet is robust, not fragile. It is resistant to both natural and accidental artificial events. However, To the extent that the above things are, and remain, true, the Internet, and indirectly, the nations, economies, institutions and individuals that rely upon, it are vulnerable to abuse and misuse; concern is justified, if not proportionate.

While these characteristics are pervasive and resistant to change, while they were often chosen for good reason, they are not fixed or required and can be changed. Understanding them and how they might be changed is key to making the Internet as resistant to abuse and misuse as it is to component failure or destruction.

It suggests that the network must become both less open, not to say, closed, and more structured. The management controls must be protected and taken out of band. The policy must become much more restrictive. We must identify our users and customers and hold them accountable for their traffic.

To bring the Internet to infrastructure standards, we must overcome not only inertia but also culture. Each of us must exercise our influence on our employers, clients, and vendors to move the Internet to the same standards that we expect of skyscrapers, bridges, tunnels, and dams. Since there is no one else to do it, we are called professionals and are paid the big bucks.