Security Metametrics: Hannover/Tripwire metrics part 2

"Number of security incidents" was the second most popular metric according to the Hannover Research/Tripwire CISO Pulse/Insight Survey. As literally stated in the survey report, the metric is simply an integer but in mocking-up the illustration above, I've made the not unreasonable assumption that it might be presented graphically, the number of incidents accumulating over time.

Imagine you're a manager presented with a graph like that: what does it tell you?

Well, for a start, the graph keeps climbing. Security incidents are evidently occurring repeatedly which doesn't exactly sound good. But is it OK, bad, terrible or absolutely devastating news? Without more information, it's impossible to say. The real question is whether the rate that security incidents are occurring is acceptable or unacceptable - which the metric, taken in isolation, doesn't address. There's no context.

Talking of rate, it's simple enough to tell from the steepness of the graph that the number of security incidents was increasing faster than normal in the fourth quarter of 2011, begging the question, "Why?" Again, the metric doesn't answer, but maybe the presenter can.

Thinking about it, figuring out the right questions to ask is the real art to metrics. So what question or questions does this metric help us answer? "How many security incidents have their been?" might be facile but, seriously, it's hard to think of anything much better. "Is our information security getting better or worse?" maybe?

It's probably clear by now that I'm struggling with this metric. Time to see how it fares under the glare of the PRAGMATIC spotlight:

Predictiveness: the numbers are historical, and the trends not very meaningful. Who can say whether the number of incidents will climb gently or steeply over the next reporting period? All we know for certain is that the number will be higher than today, unless something is going wrong in the measurement process anyway. There's a more fundamental question too about whether the numbers predict the organization's information security status, an issue I'll discuss further below.
Relevance: security incidents are the outcome of inadequate information security risk management, so this is a measure of the overall effectiveness of the security arrangements. It is relevant to information security, but is is relevant to information security management? I'm not so clear about that.
Actionability: if there are 'too many' incidents (whatever that means), security patently needs to be improved, but how? In what way? The metric remains strangely silent on that, and may even be distinctly misleading.
Genuineness: if for some reason someone in power wanted to manipulate this metric, they could do so by meddling with the definition of what constitutes an incident, or meddling with the numbers directly.
Meaningfulness: at face value, this is an extremely simple and obvious metric: more incidents bad, fewer incidents good. However, digging deeper, it might in fact mean the very opposite.
Accuracy: exactly what does and does not qualify as 'a security incident'? Unless this is clearly defined, leaving no room for interpretation, there's little hope of this ever being an accurate, scientific measure. Even then, as with any metric, there is a dependence on the availability of the raw data.
Timeliness: no problems here, provided the systems and processes for reporting and counting incidents are working smoothly, a reasonable assumption in most cases.
Independence: it would be easy for an auditor to double-check that the incident numbers spewing out of an incident management/problem ticketing system are being correctly reported, but not nearly so simple to confirm that the numbers are correct. It would take some digging to check whether incident reports are being correctly received and recorded in the first place, and to confirm that security-related incidents are being consistently coded as such in the incident management/problem ticketing system. Going even further upstream, it would be harder still to determine that security incidents are being reliably identified and reported. The process involves several subjective decisions along the way, implying a lot of work for an auditor to examine those decisions, given the information available at the time they were made. In practice, second-guessing decisions made by business people, help-deskers, security pros and middle managers is not a productive audit approach, consequently there's little alternative to accepting them at face value.
Cost-effectiveness: as with the first metric in this series "Vulnerability scan coverage", the metric may be fairly cheap to collect, analyze, report and use, but unfortunately the business benefits are also low, hence the net value is close to zero. Read on to find out why.

With deeper analysis, it turns out the metric could be distinctly misleading, perhaps even counterproductive. Under some circumstances, an apparent increase in the number of security incidents may perversely mean that information security has in fact improved whereas in normal circumstances the opposite would be true: more incidents means worse security.

The paradox occurs because people are naturally reluctant to report bad news to management, security incidents for example, especially so if they fear being blamed or held accountable for them. If on top of that the metric is used naively by management as a big stick to drive down the incident rate, the added pressure will make people even more reluctant to report incidents Accurately and Genuinely, which is not good for security. If management wants to see the rate go down, the slope of the graph will no doubt decrease, but part of that decrease will be due to incidents remaining unreported, or perhaps specific incidents being aggregated into nondescript reports that downplay the effects. They may have succeeded in improving information security, but there's a distinct possibility that the quality or integrity of management information has degraded.

Conversely, if management recognizes the issue and uses the metric instead to drive up the reporting of security incidents, they will remind people of their reporting obligations and push the numbers up, applying a subtle pressure to split incidents into a number of smaller ones simply to satisfy management's demands for more incidents. Worse still, some people may be tempted to worsen security in order not to prevent incidents, driving the numbers up that way. Hardly a shining example of information security management practices!

That said, I rather suspect the sponsors and authors of the survey, and perhaps the respondents also, may have interpreted this metric in the very specific technical context of Tripwire's system. It's not unreasonable to assume that, once correctly configured, the system will identify and count what it considers to be security incidents, in a mechanistic and relatively objective way, hence the number of incidents reported by the system would be a fairly reliable guide to the occurrence of those particular incidents. However, there are probably still opportunities for mischief if someone decides, for whatever reason, to tweak the system's configuration settings, perhaps even meddle with the internal records, making it appear that fewer incidents have occurred. The bigger issue is that Tripwire's system sees only a small fraction of the information security incidents affecting the organization. It doesn't know about John in Accounts putting a decimal point in the wrong place and overstating the tax bill by $1m, and it hasn't a clue about Julie in Marketing accidentally publishing a list of sales prospects on the web server, or leaving behind a sheaf of confidential marketing plans and customer details in a local cafe ...

Even being generous, the bottom line is a PRAGMATIC score of just 46% by my calculation, given all the issues and concerns noted above. If you work through the thinking and scoring process for this metric in the context of your organization, you will almost certainly come up with a different figure ... and that's fine. The analysis is the important bit. I'd be surprised if you still think this is a good information security metric.

More to come: if you missed them, see the introduction and parts one, three, four and five of this series.

01 June 2013

Hannover/Tripwire metrics part 2

No comments:

Post a Comment