Risk management header
products page

Risk management - Basic statistical terms

Basic statistical terms

Let us assume we have the following values.

Values: 4, 5, 5, 5, 7, 9, 11, 12, 12, 15, 16

Mean or average (expected value)

This is the sum of these values divided by the number of values, that is:

(4 + 5 + 5 + 5 + 7 + 9 + 11 + 12 + 12 + 15 +16) / 11 = 101 / 11 = 9.2 (to 1 decimal place)

The ‘expected value’ of a number of variables is equal to the sum of the individual expected values.
These are ‘additive’.
The expected value above is 9.2. If we calculated the expected value for, say, 5 individual costs as:

9.2, 6.3, 5.4, 11.6 and 4.8 the overall expected value for the cost would be:

9.2 + 6.3 + 5.4 + 11.6 + 4.8 = 37.3

The above assumes an equal likelihood of each of the 11 values occurring.
In practice there will be a distribution of values.

For the simpler system we have seen before:

DelayProbabilityContribution
70.37 x 0.3 = 2.1
90.59 x 0.5 = 4.5
110.211 x 0.2 = 2.2

Expected value = 2.1 + 4.5 + 2.2 = 8.8 weeks

If we looked at 5 such activities the delay would be 5 x 8.8 = 44 weeks.
Expected values are additive.

The ‘most likely’ value (‘best estimate’) would be 9 weeks delay.
The difference of 0.2 is 2.2%. As the distribution becomes more asymmetric this difference will become larger.

If the distribution had been symmetrical as:

DelayProbabilityContribution
70.257 x 0.25 = 1.75
90.59 x 0.5 = 4.5
110.2511 x 0.25 = 2.75

Then the expected value would be 1.75 + 4.5 + 2.75 = 9, the same as the ‘most likely’ value.

The ‘median’ value (below) will lie somewhere between the ‘expected value’ and the ‘most likely value’.
Naturally, for symmetrical distribution, the median value will be the same as the ‘expected value’ and the ‘most likely value.

Where a combination of risk distributions contains a lot of low likelihood but high impact components the asymmetry will cause a long tail on the probability density curve on the right hand side (i.e. towards the high impact end with small values of probability). This situation may lead to large differences between the expected and most likely values.

Median

This is the middle value of a range of values such that 50% are higher and 50% are lower.
If we line up the values in order of size we get:

4, 5, 5, 5, 7, 9, 11, 12, 12, 15, 16

The middle value = 9
The median is therefore 9. That is there are 5 values lower and 5 higher.
In the case of an even number of values the central 2 are averaged. Had 16 not been a value we would have had:

4, 5, 5, 5, 7, 9, 11, 12, 12, 15
The median would be (7 + 9) / 2 = 16 / 2 = 8

The above calculation is again based upon all values occurring equally.

Mode

This is the value that occurs the most. In the above case the value of 5 occurs the most.
Thus the mode = 5.

Variance

Variance is also additive in that the overall variance is equal to the sum of the individual variances.

The variance of a value is the square of the difference between the value and the mean (expected value)

For the simpler system we have seen before:

DelayProbabilityContribution
70.37 x 0.3 = 2.1
90.59 x 0.5 = 4.5
110.211 x 0.2 = 2.2

Expected value = 2.1 + 4.5 + 2.2 = 8.8 weeks

The overall variance will be:

(7 – 8.8)(7 – 8.8) x 0.3 + (9 – 8.8)(9 – 8.8) x 0.5 + (11 – 8.8)(11 – 8.8) x 0.2
= (-1.8) (-1.8) x 0.3 + (0.2) (0.2) x 0.5 + (2.2) (2.2) x 0.2
= (3.24 x 0.3) + (0.04 x 0.5) + (4.84 x 0.2)
= 0.972 + 0.02 + 0.968
= 1.96

The variance indicates an ‘average amount of spread’ in the distribution.

Standard deviation

For a combination of 100 activities the overall variance would be 100 x 1.96 = 196, with an expected value of 8.8 x 100 = 880.

The standard deviation would be the square root of 196 = 14.

In theory, there is a 68% confidence of the actual value being the expected value + or – one standard deviation.
= 880 + or – 14.

There is a 99% confidence of the actual value being the expected value + or – three standard deviations.
= 880 + or – 42.

This appears to be a very high degree of accuracy because the calculations assume ‘independence’ and thus reflects the importance of considering dependence between activities.

The standard deviation for one activity is the square root of 1.96 = 1.4, this 15.9% of the expected value of 8.8.
For a combination of 100 activities the standard deviation is 14 just 1.6%! This is a reduction of about a factor of 9.9 close to the square root of 100.

The application of appropriate software must be considered carefully by a skilled analyst with the necessary expertise to interpret the data.

The success of the evaluation phase requires the ability to combine risks which you can only do with a numerical quantitative system and not a crude qualitative method as in HIGH, MEDIUM and LOW.