Author Archives: enicaise

The intention-behaviour gap in cybersecurity

More and more we see cybersecurity professionals using surveys about attitudes and intention as performance indicators of their interventions. While questions like “Do you think it is important to use complex passwords” might give an insight on someone’s attitude toward password complexity, they are not good indicators of our human-risks. Values, Attitudes, Intentions and Behaviours are sometimes confusing concepts for some people. Here is a quick summary of the differences between values, attitudes, intentions and behaviours and what we should do to reduce the gap.

Phishing: KPI or KRI?

Some questions seem to have no definitive answers. The egg and the chicken is one of them, and within the small world of Phishing, KPI or KRI is another one.
The question seems trivial. Do we consider the risk or the performance? Do we want to measure how many of our people will likely fail a phishing test or how many will detect it? It is the kind of question most people would likely dismiss. Just pick one, could we say. However, there is more to it than meets the eye.

We use Key Risk Indicators and Key Performance Indicators to help steer our company. They must provide relevant information allowing us to decide if we are on the right track, at the right speed and in the right direction or with the right level of protection. The burn rate and the net profit margin are standard financial KPIs. They allow us to know where we are going financially. What kind of indicator would best achieve the same objective when it comes to Phishing?

Phishing is a risk for most, if not all, companies. If we consider the risk approach, using a KRI makes sense. We often use the click ratio to measure the risk of a company being vulnerable to a phishing attack. Risk officers usually calculate it using the number of people clicking on a phishing link divided by the number of people who received the email. It makes sense, no? No! Not entirely at least. First, it does not measure the actual risk. Second, it is not an accurate measure of the risk.

Let us take a closer look at the question.

First, clicking on a link in an email will most probably cause no harm. The danger comes afterwards when users disclose credentials on a phishing website or open a file they just downloaded from it. The other possible threat with a phishing email is to open a malicious attachment. It will allow the propagation of ransomware or the installation of malware.

If we want to measure phishing related risks, these three behaviours are our more relevant candidates. One could say that clicking on an attachment or a link is still clicking. True, but not true. Our researches, confirming others, shows that we can have a reduction in the likelihood of clicking a malicious link and, still, having many people opening attachments. If we do not train our users specifically to be vigilant with files, they will not be as cautious as with links. Consequently, we should have multiple risk indicators, one for credential disclosure, one for downloading and opening files, and one for opening attachments. A KRI can be build up using the average or the worse result of these three indicators.

Still, we would not have an accurate measure of these risks. As discussed in a previous post, the variance between scenarios is way too high to be able to extrapolate a risk exposure using one situation only. The only reliable result we will have from a test using one scenario is the likelihood of our users to fall for this particular scenario at that moment. Is it enough to make an informed decision? Unlikely.

Worse, we cannot predict with certainty which scenario will have more impact. In other words, the margin of error of such measurement is probably around 40%. We can easily agree that we cannot rely on such an uncertain result to take any decision. Somehow, we should probably accept that measuring our risk exposure is difficult and move to another indicator.


Instead of the risk of failing, we could use our performance in detecting, and reporting, phishing email as an indicator. We could think of the performance in detection as the opposite of the failure. That would be a mistake. Let us have a look at the split of possible behaviours when people receive an email. We can see on the pie that there is a large part of it that is neither green (Detection and reporting) nor red (failure).

Analysis of a typical phishing exercise

The number of people reporting phishing emails is not complementary to the number of people failing the phishing exercise. First, we can fail the test and still report. We should even make that mandatory. It shows that, despite having failed the exercise, we have understood it is essential to pay attention and to report. It emphasizes that accidents may happen, but we still have to perform the expected behaviour.

Regarding the Unknowns, keep in mind that the subject could have opened the email and just deleted it or ignore it. He may believe that it is a genuine email, and might process it later.

The scenario will also have a significant impact on the result. When the scenario is more relevant for the targeted population, there is more chance that the people will fail or detect it as more people will open it.

We can summarize these results in two simple graphs focusing on the risk (clicking) and the performance (reporting):

We should also measure the performance using the ratio between the number of people reporting the phishing email divided by the number of people opening the email. That gives us an accurate view of the percentage of people in our organization performing the expected behaviour, whether they were able to detect the phishing exercise or not.

The email opening ratio will also provide insights on the scenario relevance. The first challenge for a phishing email is to stand out from the possible hundreds of emails received by the target and be read. The fate of many phishing email is just to be ignored. And, unfortunately, in such case, it neither trains the user neither provides us any good indication of their phishing susceptibility. So, the failure is more on the phisher / trainer side. Let’s also keep in mind that we might have some issues with false positive and false negative as the usual technique used to measure email opening (the white pixel trick) is not always reliable.

There is no perfect way of measuring the risk related to Phishing. However, the four scenarios protocol discussed in our previous post gives us a reliable measure of the effectiveness of training. It is the right candidate for a Key Indicator. Measuring the ratio of phishing reports also provides relevant information regarding Phishing education and cybersecurity culture. We should use both indicators while keeping in mind what they measure. If we read an instrument in a plane and misinterpret the value, it can lead to an accident. It could also happen with our Key Indicators.

Click ratio is a useless metric for phishing!

I do not think it is still necessary to explain that phishing is a major threat for businesses and individuals. By now, most companies have one type of phishing training or another. But, are we sure these exercises work?

If we want to measure our training efficiency, we often perform regular phishing exercise and measure the results. If our phishing education was efficient, we should see a negative trend. Right? If we perform exercises every quarter, we should obtain something like that:

Typical phishing metrics

Looks good, isn’t it? Except we don’t know why there is a bump in the numbers in Q4. Is our training not working? Maybe is it due to the end of year exhaustion. Who knows? Or maybe the scenario we used in Q4 is more relevant to our context. Context is a key factor influencing phishing susceptibility. Unfortunately, it is hard to measure. So, we can’t accurately predict, nor define a level of efficacy for our phishing scenarios. Basically, comparing click/ration between different scenarios is utterly useless to measure progress and phishing risk reduction. So, how do we do?

Siadati et al. published an excellent article in 2017 highlighting this very issue. As the variance between scenarios can be as high as 40% (our research showed that it could be up to 60%), we cannot rely on inter-scenario measurement to measure the efficiency of our training. To say otherwise, the difference in the percentage of people clicking on a phishing link between two phishing scenarios sent to the same people at the same time can be as high as 60%.

Instead, they suggested using a system using multiple scenarios in parallel. The scenarios are used repeatedly with different groups of the population (groups are randomized). In our example, this would give this:

As you can see, we now have the four same scenarios sent to four groups of people in our population. Notice the 27% gap between scenario C and scenario D in Q1, like we had in our first example. Now, we don’t really care for the click ratio itself. What we would like to see is a downward trend for each scenario. And that’s what we’ve got. Same scenarios, same people, and a totally different, more accurate, measurement of our progress.

This protocol requires a yearly plan (that we should have anyway) and a sufficiently big enough population to have, at least, 30 persons in each group (for statistical significance).

There are, unfortunately, other pitfalls in our metrics that we have to take into account but that will be the subject of another post (and included in a short document we will publish very soon).

Reference:
Siadati, H., Palka, S., Siegel, A., & McCoy, D. (2017). Measuring the effectiveness of embedded phishing exercises. 10th {USENIX} Workshop on …, Query date: 2019-03-12. https://www.usenix.org/conference/cset17/workshop-program/presentation/siadatii