Wednesday, February 24, 2010

The Worst Case Scenario

At the direction of the board of directors, the IT staff of a national property and casualty insurance company developed a backup and recovery contingency plan (as contrasted to a business continuity plan.) They found themselves in a bind between the board, who said the plan cost too much, and the auditors who said that it was inadequate.

Many of us have been there and I was called in to assist, i.e. to "consult." I was not terribly surprised by what I found. It seems that every time the staff thought that they had a plan the auditors would identify another case in which it would not work. The staff would add a new capability to address the new case.

The board tended to look less at the capabilities than the total cost. Admittedly, the board of a property and casualty company looks at the cost a little differently than might a bank or a manufacturer. The insurers ask themselves, how much insurance must I write to cover that cost? How much coverage would I offer for that amount if it was paid to me as a premium. How much coverage could I buy for that much money? They could not even judge the capability in the plan but they "knew" that its cost was too high.

Of course, the problem was in the failure to properly identify the objectives of the plan. Allowing the auditors to hypothesize cases clearly was not working. No matter the plan, they were always clever enough to come up with a new case in which it would not work.

A plan that can deal with the "worst case" has infinite cost. What case then? What case must the backup and recovery plan of a national property and casualty insurer deal with?

We concluded that such a company would have to recover from any disaster that both it and the majority of its policy holders survived. Certainly it has an obligation to recover from the destruction of its own premises. It must survive a community disaster like an earthquake. It must survive a regional disaster like Katrina. Of course, these are far short of the "worst case," short of thermo-nuclear war or the end of the world. Of course, the scope of the event was not the only thing that had to be agreed upon but also the expected rate.

Finally, IT had to agree with the business as to the mean-time-to-recovery and the point of recovery for each application. The faster one wants the application back, the more one can expect to pay. The closer one wants to recover to the point of failure, e.g. close of business on the day before the event, the more one can expect to pay. More on these on another day.

While these things are difficult to agree upon, such agreements are essential to an effective and efficient plan. They are necessary to being able to satisfy both the auditors and the directors.

Thursday, February 11, 2010

You may enjoy this.

Bears, Brigands, and Dragons

A Fable

In medieval times, the populous was terrified of dragons. Everyone knew someone who knew someone who had seen one. Many knew someone who knew someone who had lost a relative to dragons.

However, when they built the castle, usually in stages over decades, getting stronger with time, they always stopped long before the castle became dragon-resistant, much less dragon-proof. After all, dragons are awesome creatures; they are very strong and they fly. How high would the walls of the castle have to be to keep the dragon from just flying over?

So, they built their walls to resist bears and brigands. They fully intended to get around to resisting dragons but it was so expensive that somehow it never got done. After all, bears and brigands were much more numerous than dragons.

In modern terms, we would call the dragon strategy, risk acceptance. This is sort of like our strategy for greater than Richter 7.0 seismic events and greater than Saffir-Simpson Category V storms. Needless to say, the watch mounted the walls everyday, with their bows and arrows, ready to repel the dragons, but they never saw any.

Every twenty years or so we have a massive power blackout, embracing multiple states and tens of millions of homes, and lasting for several days. It is usually the result of multiple simultaneous failure of a highly unlikely number of components. The media and the politicians scream "There be dragons. Why weren't you prepared?" The industry says mea culpa and promises to do better next time.

Actually they do do better the next time. They raise the walls. They replace older components with new ones that have a longer mean-time-to-failure. They add redundancy so that they are better able to tolerate component failures, and they automate the response to component failures. Of course, all of this ads cost. Long before the mean-time-to-failure of the system reaches infinity, they stop.

In fact, in about twenty years, their best efforts will be overwhelmed once more. The knee of the curve that plots mean-time-to-massive-failure against cost seems to be at about twenty years. I have now lived through three such blackouts and hope to live to see a fourth. Mitigating it will be expensive but not as expensive as preventing it.

No matter how high we build the walls, the damned dragons just fly over.

Tuesday, February 9, 2010

"Effective" Security

Nothing useful can be said about the effectiveness of a security mechanism except in the context of a specific application and environment. -- Robert H. Courtney

Perfect security has infinite cost.

Security people often reject novel security mechanisms "because they know how to break them." That is to say, they are not effective. On the other hand, they may continue to rely heavily on other mechanisms, like passwords, that they also know how to break. Most of this is simply habit. It does not really have anything to do with effectiveness.

Effectiveness has nothing to do with whether or not something can be broken. Anything and everything created by man can be broken by man; the real issue is the cost. No mechanism provides perfect security. (Indeed, the last thing anyone wants is perfect security; think about it.)

A security mechanism can be said to be effective if the cost of attack is higher than value of success. The issue is not whether or not a mechanism can be broken but how much it costs to break it.

Since we may not know all the failure modes of a mechanism, we never really know the minimum cost of breaking the mechanism. On the other hand, we often know the maximum cost. The maximum cost of breaking an encryption mechanism is never higher than the cost of an exhaustive attack against the key. Similarly, the maximum cost of breaking a password is the cost of a "brute force" attack. Of course, the cost of attack against a well chosen password can be arbitrarily high. Note that the cost of an attack to the attacker is measured in terms of the resources available to him, their reusability, and how he values them. For example, he may value as cheap special knowledge that he already possesses and that is easily re-used, while he values as dear knowledge that he does not have and which would have limited application once obtained.

A security mechanism can be said to be effective if it behaves as expected in the intended application and environment.
One of the possible expectations might be that it would resist a certain percentage, e.g. 80%, of attacks. It might be that it would take more than a certain amount of time to break it.

As with most things in security, we need not know the effectiveness of a measure with a high degree of accuracy or precision for the abstraction of effectiveness to be useful to us.

Sunday, February 7, 2010

"Advanced Persistent Threat"

Courtney's First Law says that, "Nothing useful can be said about the security of a mechanism except in the context of a specific application and environment." The "environment" is all about threat, natural and artificial. For the first three decades, while we talked a lot about the man-made threat, the threat that mattered was from "The" environment, that is, from nature, mostly fire and water but also earthquake.

While the natural threat has not changed much, the risk has changed. The risk was governed in part by scale. In the early days, the consequences were related to the fact that computers were scarce, large, expensive, and we thought that we were very dependent on some of their applications. In a world in which computers are a commodity, small, and cheap, the risk is not to the property but to the information, the loss of confidentiality, integrity, or availability. Man, not nature, has become the threat of interest.

In 2006 The US Air Force began to use the Term "Advanced Persistent Threat" to describe the role of nation states in attacking users of the Internet. The expression has surfaced in both the industry and popular press during the past two weeks.

The use of words is how we "think about security." Expressions like this one influence what and how we think about security. If the expressions are not carefully crafted, they may distort or mislead. If we are to use them, we should examine them carefully.

Of course a nation state is not a threat; threats have rate. Rather a nation state, like organized crime, is a threat source; threats have rate and source. Persistent can clearly modify a threat source. One must assume that nation states are persistent.

It is hard to see how "advanced" can modify either threat or persistent. In context, it clearly modifies the attack method. Fundamental attack methods have not changed since I wrote about them in a side-bar for an article in IEEE Spectrum in the early seventies. What has changed is the implementation, both the art and the craft.

Nation states and organized crime may exploit vulnerabilities that are not widely known but what is significant about the methods in these attacks is how they are used in steps and stages, from target selection, to exploitation of the product.

For example, while Operation Aurora used other elements after the bait was taken, getting someone to take the bait was the key to success. While crafting of the bait included forging the origin address, and while resistance to this could have been automated, we need to be more skilled at recognizing bait. Today, it is all too easy to get someone to "click" on the bait and that is often sufficient to compromise a system or domain. Apparently, the higher up the "food chain" one is, the easier. Similarly, the higher up the food chain the origin appears to be, the more likely the target is to take the bait.

I am reminded of my South African colleague who said that his demonstration bait message had a subject line of "big teats." He argued that its attraction was gender dependent, but its appeal was to both genders. One gender wanted them while the other wanted to look at them.

The key word is "persistent." Right now that means fishing every day and throwing out a lot of bait. History suggests that artfully crafted bait sufficiently replicated and spread, will work. Of course, the key word in all of this is "sufficiently." However, "sufficiently" implies brute force. Since the adversary is not going away, one must recognize bait and force early, while there is still time to mitigate or resist it. One must decrease the size of the domain that can be compromised by a single "click."

Every "large" enterprise is a target but surprisingly so are some small ones. We will save this discussion for another day.

Friday, February 5, 2010

The Total Cost of Security

In the context of this curve, which illustrates that the total cost of security is equal to the sum of the cost of security measures plus the cost of the losses after those measures, Fred asks,


But is it obvious where you should operate? Is lowest cost necessarily the best?”

The best answer to the question is “close to the middle; “ at either extreme, the cost of error goes up exponentially. Ideally, one wants to plan and operate at the min but that is not knowable in any real sense. That is why one does risk assessments and other attempts to estimate the annualized cost of losses and the value of security measures.

One implication of the curve is that at the far right, it is very expensive to achieve small reductions in losses. Security measures increase dramatically in cost for small reductions in already small losses. Said another way, as the cost of losses approaches zero, the cost of security approaches infinity.

Note that in the middle, the sum curve tends to be fairly flat. That says that any place in the middle is “OK.” There is little danger that one will overspend on security. Long before it becomes inefficient, other limits on available resources will kick in. The real danger is in under spending. The cost of under spending is not so obvious; indeed one may under spend for several years and “get away with it.” It may be only across a few years that it becomes obvious that it is inefficient.

Note that spending on security is balanced by risk. The tolerance for risk is different for different enterprises. For example, small new enterprises are inherently more risky that large mature ones; it is not efficient to pursue low security risk in the face of high business risk. Within an enterprise, risk tolerance may be different in different periods. In some periods, management may tolerate a higher level of risk in an attempt to move net income from one period to another.

The curve can be used to illustrate Courtney’s Second Law, “Do not spend more mitigating a problem than tolerating it will cost you.” However, it is an abstraction. The two curves do not have the same time scale. In any period the cost of security is more predictable than the cost of losses. We plan and measure the cost of security mechanisms annually while the cost of losses may only be known with confidence across decades. On the other hand, one can estimate the cost of losses well enough to avoid gross errors, or, in the vernacular, “Close enough for government work.”

Donn Parker warns about “risk assessment:” it is a blunt tool that can cost more than making the decision wrong will cost. He argues for what he calls “baseline controls” and what Peter Tippett calls essential practices. In combination these low cost controls are very effective and so efficient as to require little justification. This is a subject for another day.

Actually taken across a large enough enterprise, one can measure losses pretty accurately. However, I have only encountered one enterprise, Nortel, that does it. They have a budget for losses, not meaningful at the departmental level, but works pretty well at the business unit level. In the first year that they did it, the variance between what was budgeted and actual was pretty high. However, after a few years of experience, variance was much more within a normal range.

In the long run, the cost of security simply is what it is. It is unavoidable. In the words of the mechanic in the Pennzoil ad, “You can pay me now, or you can pay me later,” but the implication is that one cannot escape this cost. The advantage of the cost of security over that of losses is that it is both knowable and predictable. As long as one avoids gross over or under spending, one is likely to be within the efficient range.