28 August 2013

SMotW #70: incident detection time lag

Security Metric of the Week #70: delay between incident occurrence and detection


While some information security incidents are immediately obvious, others take a while to come to light ... and an unknown number are never discovered. Compare, say, a major storm that knocks out the computer suite against an APT (Advanced Persistent Threat) incident.  During the initial period between occurrence and detection, and subsequently between detection and resolution, incidents are probably impacting the organization. Measuring how long it takes to identify incidents that have occurred therefore sounds like it might be a useful way of assessing and if necessary improving the efficiency of incident detection to reduce the time lag.  

When ACME's managers scratched beneath the surface of this candidate security metric, thinking more deeply about it as they worked methodically through the PRAGMATIC analysis, it turned out to be not quite so promising as some initially thought:

P
R
A
G
M
A
T
I
C
Score
80
70
72
30
75
50
50
65
65
62%

Management was concerned that, in practice, while the time that an incident is detected can be ascertained from incident reports (assuming that incidents are being reliably and rapidly reported - a significant assumption), it is harder determine, with any accuracy, exactly when an incident first occurred.  Root cause analysis often discovers a number of control failures that contributed or led to an incident, while in the early stages of many deliberate attacks the perpetrators are gathering information, passively at first then more actively but often covertly. Forensic investigation might establish more objectively the history leading up to the discovery and reporting of incidents, but at what cost?

For the purposes of the metric, one might arbitrarily state that an incident doesn't exist until the moment it creates an adverse impact on the organization, but that still leaves the question of degree.  Polling the corporate website for information to use in a hacking or phishing attack has a tiny - negligible, almost unmeasurable - impact on the organization, so a better definition of the start of an incident might involve 'material impact' above a specified dollar value: fine if the costs are known, otherwise not so good.

The 30% rating for Genuinness highlights management's key concern with this metric.  The more they discussed it, the more issues, pitfalls and concerns came out of the woodwork, leaving an overriding impression that the numbers probably couldn't be trusted.  On the other hand, the 62% score means the metric has some potential: the CISO was asked to suggest other security incident-related metrics, perhaps variants of this one that would address management's reservations.  

[This is one of eight possible security incident metrics discussed in the book, two of which scored quite a bit higher on the PRAGMATIC scale.  There are many many more possibilities in this space: how would you and your colleagues choose between them?]

18 August 2013

SMotW #69: incident root causes

Information Security Metric of the Week #69: proportion of information security incidents for which root causes have been diagnosed and addressed


'Learning the lessons' from information security incidents is the important final phase of the incident management lifecycle that also involves preventing, detecting, containing and resolving incidents.  Its importance is obvious when you think about it:
"Progress, far from consisting in change, depends on retentiveness. When change is absolute there remains no being to improve and no direction is set for possible improvement: and when experience is not retained, as among savages, infancy is perpetual. Those who cannot remember the past are condemned to repeat it."
George Santayana

This week's example metric picks up on three crucial aspects:
  1. Root causes must be determined.  Addressing the evident, immediate or proximal causes of incidents is generally a superficial and unsatisfactory approach since problems upstream (e.g. other threats and vulnerabilities) are likely to continue causing trouble if they remain unidentified and unresolved.
  2. Diagnosis of root causes implies sound, competent, thorough analysis in the same way that doctors diagnose illnesses.  Casual examinations are more likely to lead to misdiagnoses, increasing the probability of failing to identify the true causes and perhaps then making things even worse by treating the wrong ailments, or implementing the wrong treatments.
  3. Addressing root causes means treating them appropriately such that, ideally, they will never recur. The fixes need to be both effective and permanent.

Before you read ahead, think for a moment about what we've just said.  Given the positive nature of that analysis, you might be tempted to implement this metric immediately ... but systematically applying the PRAGMATIC criteria reveals a number of concerns:

P
R
A
G
M
A
T
I
C
Score
85
85
67
40
77
40
48
16
40
55%

Aside from the undeniable Costs of analysing in depth and fully addressing root causes, it seems there are issues with the metric's Genuinness, Accuracy and most of all its Integrity ratings.  

One of the ACME managers who scored this metric expressed concern that the people most likely to be measuring and reporting the metric (meaning ACME's information security professionals) would have a vested interest in the outcome.  While hopefully such professionals could be trusted not to play political games with the numbers, the fact remained that they are actively involved in determining, diagnosing and addressing root causes, hence there is a distinct possibility that they might be mistaken, especially given the practical difficulties in this domain.  Information security incidents often have multiple causes, including contributory factors (such as the corporate culture) that are both hard to identify and difficult to resolve.  They may well believe that they have eliminated root causes whereas in fact even deeper issues remain unaddressed.

Given the promising introduction above, the metric's disappointing 55% score led ACME management to put this one on the watch list for now, preferring to implement higher-scoring metrics in this domain first.  The CISO was asked to think of ways to address the independence and trust issues that might put this metric back on the agenda for ACME's next security metrics review meeting.

08 August 2013

Security lessons from a car park worm

A news item about an NZ car park card payment system being infected with the Conficker worm, and potentially compromising customers' credit cards, is a classic example of the potential fallout from an incident that probably would not have occurred if the company concerned had been proactively tracking the appropriate information security metrics.

According to Stuff.co.nz's article "Car park hack puts credit cards at risk":
"Hundreds of parking machines used by thousands of motorists a week may be infected with a virus allowing hackers to harvest credit card numbers.  A compromised machine in Wilson Parking's Alexandra St car park in Hamilton prompted security experts to warn motorists to check their credit card statements if they've recently used a machine at one of the company's 276 car parks across the country.  But Wilson Parking says there is no problem with the system.  The Hamilton machine was displaying an error message on Monday and Tuesday warning it was infected with the Conficker virus, the same virus which disabled Waikato District Health Board's 3000 computers in 2009."
The article makes a right muddle of virus, worm, hacking and identity theft/credit card fraud, but that aside, it seems clear that Wilson Parking has fallen seriously behind with their patching of public-access systems that are processing credit cards, meaning they handle personal and financial data.  In information security risk terms, this was an eminently predictable incident that should have been avoided. 

The incident might have avoided or at least ameliorated using controls such as:
  • Patch management, keeping all their distributed and office systems reasonably up-to-date with patches and especially the critical security patches;
  • Periodic security reviews, audits and tests of the systems, which should have identified this issue long ago;
  • Antivirus software, again maintained up-to-date with virus signatures and periodically checked/tested; and
  • Strong incident identification, management and response policies and procedures, reinforced by security awareness and training (their management should probably have known about the incident before the journalist called, should already have been well on top of the response, and should have known to refer inquiries to someone trained to deal with the press).
All of those I would have thought would be a routine part of their information security arrangements, particularly if they are subject to the requirements of PCI-DSS and other compliance obligations (e.g. privacy laws), let alone good security practices such as ISO27k.

At a higher level, Wilson Parking's management should have known there was trouble brewing through a number of management-level controls and information flows (including suitable metrics, naturally!), hinting at a possible governance failure here.  A simple metric such as 'patch status' or even 'unpatched vulnerabilities' should have indicated in bright red that they were way behind, and the security and risk people should have been clamouring for attention and corrective action as a result.  In theory.

However, let me be clear: I am only guessing at what really went on in this case, based on the very limited and perhaps misleading and inaccurate information in the news article.  I have no further knowledge of Wilson Parking's security arrangements, metrics, controls or risks, nor should I - it's not my business.  It is conceivable that they have simply been caught out by a one-off fluke and a journalist prone to hyperbole.  As far as I can tell, Conficker is more likely to have been sending spam than stealing credit card numbers.  It is vaguely possible that management deliberately and consciously accepted this risk for genuine business reasons (such as the practical difficulties of managing, updating, testing and maintaining physically distributed and hardened IT systems) ... although that begs further awkward questions about their risk/security management and contingency planning!

The real point is to learn what we can from incidents like this, the better to avoid ending up in the same situation ourselves.

Would YOUR security controls have avoided something along these lines?

Would YOUR security metrics have made the accumulating risk obvious to management?

What do YOU need to do to check and perhaps update YOUR information security arrangements?

Think on.

07 August 2013

ISO/IEC 27004 back on track?

At long last: a glimmer of hope
on the ISO27k metrics front!  

ISO/IEC JTC1/SC27 respondents to a questionnaire circulated by the editors responsible for revising ISO/IEC 27004:2009 acknowledge that the current published standard is wordy, academic, perhaps even unworkable, which is probably why it has achieved such a low uptake, despite the obvious need for measurements as part of an Information Security Management System.  No surprise there. 
However, there are encouraging signs that the editors and project team are prepared to consider a markedly different approach, although there is some concern that the new version ought to be backward compatible with the old (one might ask “Why?” given that it is hardly being used!).  I hope publication of the current version of 27004 has not, in fact, set the field back which was the fear expressed to SC27 in the formal comments accompanying NZ’s vote against publishing the standard.

Given that the editors feel “ISMS standards are practical standards, not university textbooks”, the rather academic and unhelpful measurement modelling content of the current version will hopefully be dropped like a stone, toned-down or at least relegated to an dark and dusty annex.

Other security measurement standards are being trawled for more pragmatic guidance in relation to ISO27k. NIST SP800-55 Revision 1 certainly merits a closer look, as does ISO/IEC 15939, BSI’s BIP 0074 and perhaps IT Grundshutz. The idea of ‘categorizing’ metrics seems to have taken hold, although there is no agreement yet on the nature of those categories, while maturity metrics are also of interest (in the sense that an organization’s infosec metrics will change as its approach to and experience of infosec matures). Meanwhile, for those who simply can’t wait for the 27004 update, we recommend the PRAGMATIC approach which, we believe, addresses many of the shortcomings of 27004 - for example, how to select or design worthwhile security metrics, those being workable measures that support both business/strategic and information security management objectives.
I will be doing my level best to help the SC27 project team exploit the PRAGMATIC ideas and other concepts from the book, where appropriate.

06 August 2013

Your authors need you!


Have you read PRAGMATIC Security Metrics yet?  What did you make of it? Does it make good sense?  Is it understandable?  Are the tips and suggestions helpful?  Is it interesting, well written, approachable, stimulating?  Is it a worthwhile addition to your bookshelf, a valuable contribution to the field - something you are already using in earnest, or that you definitely intend to put to good use?  A book you are happy to recommend to your colleagues - your peers and managers - and to the likes of (ISC)2, ISACA and SANS perhaps?  

 - OR - 

Have you skimmed it in the bookshop or website and put it straight back on the (virtual) shelf?  Is it gibberish?  Did you buy it but wish you'd not wasted your money on it?  Is it a pathetic attempt, not a patch on the other excellent security metrics books and standards out there?  Does the casual writing style annoy you, and the footnotes distract you?  Is the PRAGMATIC approach completely misguided and misleading?  

We are very keen to hear back from you either way.  So far, apart from two five-star customer reviews on Amazon and some words of encouragement from Professor Kabay (who kindly wrote the preface for us), we are surprised and somewhat disheartened by the lack of reader feedback, whether positive or negative. Nice comments are welcome for obvious reasons, but even complaints have their uses!  Most helpful of all are your constructive criticisms and improvement suggestions, especially those that make us think and perhaps stimulate us to tackle new angles or new topics.

The thing is, to you this book represents an investment of 50-odd bucks, a few hours' reading and a few more contemplating, interpreting and then applying the PRAGMATIC method.  To us, it represents literally hundreds, maybe thousands of hours of intense focus, an enormous effort over the two years it took to write and publish.  Don't get me wrong, both Krag and I enjoy our writing.  The question is: do you?  Should we continue, or give it up as a bad job?

We are also very keen to add to our stock of 150+ example metrics that have been put through the mill, and we are looking for case study materials, anecdotes and feedback on the method to use in PRAGMATIC training courses.  While it might be interesting to know your organization's industry, size, maturity etc., we don't need to know its name and we are very happy to maintain your privacy if you would rather not be identified.

Please get in touch by email (Gary@isect.com or Kragby@gmail.com) or by commenting here on the blog.  Thank you in advance for your trouble.

Kind regards,
Gary Hinson

PS  If you feel strongly about it, how about writing and publishing your own book review?

04 August 2013

SMotW #68: continuity plan maintenance

Security Metric of the Week #68: business continuity plan maintenance status


Business continuity plans that are out of date may be a liability rather than an asset.  Whereas ostensibly it may appear that the organization is ready to cope with business interruption, in fact the plans may be unworkable in practice due to substantial changes in the business and/or the technology and/or the people since they were written or last updated.  

Furthermore, valid questions about the suitability of the continuity plans at the time they were originally prepared or updated are still more important if the organization is failing to maintain the plans. Did the inevitable assumptions and constraints involved in their preparation invalidate them?  Did they pass their tests with flying colors?  Were they ever adequately tested in fact?  Could they be trusted to work properly?  If they are not being properly maintained (which could be taken to imply their being systematically reviewed and improved), the quality of the organization's processes for managing the plans is seriously in doubt.

ACME's senior managers are quite rightly concerned that its business continuity arrangements are good and ready to keep things going when it all turns to custard, begging the question how to measure its business continuity plans?

Possible business continuity metrics include:

  • Measuring the breadth of coverage of the plans, particularly of course those business processes (and the associated IT systems and relationships and people and other vital assets or components ...) deemed business-critical, but also miscellaneous supporting processes that could become critical if they failed irrecoverably;
  • Measuring the quality of the plans, perhaps by assessing compliance with ACME's business continuity plan quality standards, or against some external arbiter such as BS 2999, ISO 22301 or the Business Continuity Institute's recommendations;
  • Testing the plans to the appropriate level of assurance (corresponding to the criticality of the associated processes etc.), and measuring the test results (hopefully with something more useful than crude pass/fail!);
  • Counting the number of plans that have not been reviewed or tested when planned;
  • Counting the number of days overdue for the plan reviews - easier if all the plans have a "test before" date;
  • Proportion of plans that BOTH passed their last test AND are not overdue for the next planned test;
  • A maturity metric looking at the overall quality and suitability of ACME's business continuity planning;
  • Measure and rank the residual risks associated with the failure of business processes etc., taking into account their inherent risks and the risk treatments, including business continuity plans;
  • Measuring component parts of the business continuity arrangements e.g. resilience, recovery and contingency aspects;
  • Benchmarking e.g. comparing the business continuity arrangements made by various parts of ACME against each other, and/or against acknowledged good practices, and using the ranking to encourage the weakest to emulate the strongest.
[Some of these metrics have been or will be discussed and scored separately in this blog and the book, but feel free to apply the PRAGMATIC approach to them yourself, in the context of your organization, if they strike you as worth considering.  By all means score other business continuity metrics on the same basis, including any that you favor or are already using.  For bonus marks, tell us what you make of them and share your PRAGMATIC scores with us and our readers.  Seriously, we'd be fascinated.]

Anyway, faced with a proposal to implement a metric that reported the status of the business continuity plans across ACME using a red-amber-green map representation as shown above, ACME management rated the metric as follows:


P
R
A
G
M
A
T
I
C
Score
75
75
90
73
84
76
80
77
93
80%


80% is a very respectable score with no serious concerns, making this a strong candidate for incorporation into ACME's "Executive Management Metrics Dashboard" (well, OK, an intranet page and perhaps a simple display app to help justify those shiny new iPads!).  However, since there are four even-higher-scoring business continuity metrics examples in chapter 7 of the book, plus a further 5 metrics scoring over 70%, it's not an automatic decision to adopt this one.