Quantitative Evidence - Avoiding Statistical Pitfalls

Quantitative evidence, i.e. evidence containing numbers or probabilities arises in many cases. The consequences of failing to consult an expert can be unfortunate. Sally Clarke and Angela Cannings provide the most dramatic examples of this since they might not have been wrongly imprisoned for the murder of their children if a statistical expert had been called at their original trials. However the outcome of many civil cases can also turn on the correct interpretation of this type of evidence.

My career as a quantitative evidence expert came about because I am an experienced market researcher. However my knowledge of statistics and my experience in analysing and interpreting numerical data has led to my being consulted in many other types of case as well as ones involving market research.

In this article I shall therefore describe how both marketing research and other types of quantitative evidence have been used.

Planning and Licensing

Here is an example of the importance of correctly interpreting quantitative evidence. The market research consultant for a casino group who knew of my experience in acting as an expert witness contacted me because an applicant for a casino licence which the casino group was opposing had, at the last minute, submitted a report on a market research survey that the applicant had commissioned. I examined this report and found that the questionnaire was not objective as it contained phrases that suggested to those interviewed that they should give favourable answers. More importantly when I re-analysed the results, I found that the number who might actually visit the casino was much smaller than that claimed, once account was taken of the interviewees’ past experience of casinos as well as their claimed likelihood of visiting the new one. My report enabled the person responsible for the survey to be cross-examined very effectively, the Judge preferred my evidence to theirs and the licence was refused.

Passing Off

Research agencies may not appreciate that surveys conducted to provide evidence in legal cases have to be designed even more carefully than the normal run. This is due to the fact that those who commission commercial work usually do so because they are prepared to accept the results. On the other hand our adversarial system means that in legal cases there will always be one party who will take a sceptical view of any type of evidence.

An example of the need to design surveys carefully was a passing off action on whether a Scottish sounding name could be used for a whisky that was
not distilled in Scotland. A survey had been conducted that showed that most people thought that the whisky had been produced in Scotland but the Defendants argued that this was not surprising. They claimed that Scotland was such a dominant producer that most people would think that any whisky was made there. I was only consulted after this point had been raised. Fortunately the questionnaire had asked people to give reasons for their answers. I carried out a detailed reanalysis of these. This enabled the Defendants’ argument to be refuted and the case was settled satisfactorily.

Later I was asked to plan a survey for another passing off action where, even though the ‘product’ in question was a web page, the Claimant was well known and might have been thought to be dominant. I avoided any possibility of a similar defence being raised by using an experimental design to show that confusion arose with the Defendant’s page but not with that of another similar company. The results were unambiguous and this enabled the case to be settled quite readily.

Quantum

Survey results are not always as unambiguous as they appear. When the Law Commission was reviewing the appropriate level of damages for non pecuniary loss for physical injury, it employed the ONS to ask questions of the general public on the level they thought appropriate in four typical cases. The Law
Commission believed that the results supported their view. This was that damages for non-pecuniary loss should be increased by a factor of at least 1.5 but not
more than 2, where the current award would be more than £3,000. This would have a severe effect on those paying these awards and could have led to a substantial increase in insurance premiums. I was, therefore, asked to examine the Law Commission report on the research and give my opinion.

Among the conclusions in my detailed report were the following:

• The value of the information obtained from the ONS survey is limited by the way in which it was designed. (This included the questionnaire as well as other aspects of the survey design.)

• The method of interpretation that was adopted did not reveal the instability caused by respondents’ tendency to give round number answers.

• Even for the four cases studied the results of the survey did not support the conclusion that non-pecuniary damages should be increased by a factor of at least 1.5 but not more than 2.

• The range of cases studied would in any case have been too limited to allow the results to be extrapolated to other cases.

Furthermore I was able to show that, contrary to the Law Commission interpretation, the results did not support a uniform increase for all levels of damage. Instead they suggested that the proportionate increase should be greater for the most severe cases than for those that were less severe.

The matter was decided by means of an appeal in specimen cases. As in other cases the contribution of the legal team was crucial. I was, however, gratified that the comprehensive judgement reflected my findings. It included the following: ‘At the highest level, we see a need for awards to be increased by in the region of one third. We see no need for an increase in awards which are at present below £10,000. It is our view that between those awards at the highest level, which require an upwards adjustment of one third, and those awards where no adjustment is required, the extent of the adjustment should taper downwards’.

Sampling

Sampling is, of course, crucial to marketing research. The first question to be decided is what is the population that is to be sampled? The answer to this is not always as obvious as it seems. Apart from questions of geography there are other questions such as whether it is people or visiting occasions that should be sampled, i.e. should the answers of those who use a facility more often be given greater weight? Those who do not consider these questions at the outset may find themselves trapped into having sampled a population that is not well suited to their needs.

Even when the population has been selected the practicalities of sampling may impose constraints. It is therefore advisable to incorporate quality controls to test that the constraints have not distorted the representativeness of the sample unduly. The size of the sample also needs to be decided. This is determined by the accuracy with which results are required. If results are needed for subgroups, e.g. users of a particular brand, then steps have to be taken to ensure that the number of these is adequate.

An example of the importance of these issues is a case where I was only consulted after results for a sample of stores had been obtained. The value of claims for each store in the sample had been assessed and the question was how these claims should be grossed up to evaluate the total claim. Problems arose because one type of store had been deliberately omitted and the characteristics of stores in the sample did not match the population that the sample was intended to represent. I was able to show that these deficiencies did not make much difference to the size of the claim. I did this by demonstrating that different assumptions produced similar results and that the sampling error was not excessive. Nonetheless it is preferable to obtain expert advice before a sample is drawn rather than to delay seeking it until after the results have been obtained.

However sampling issues often arise with other types of legal evidence. One area is when the quantity of evidence is large e.g. millions of envelopes a proportion of which had been misaddressed or tens of thousands of cases of mineral water some of which had been contaminated. With these quantities the storage cost can be high so both parties will wish to reduce the volume of evidence by sampling. The sample preserved as evidence has to be designed not just to enable the average level of defects to be assessed but also to determine how the level of defects varies, e.g. is it confined to a particular period of production or to items of a particular type? I have found designing sampling schemes that are practicable, economical and meet the evidential needs of all parties in a case to be a fascinating exercise.

Sampling issues arise in other disputes, e.g. when the volume of paper records is too large to examine or, as in one case in which I acted, where some of the records have been damaged or destroyed. In these circumstances it is necessary to assess whether the remainder can be treated as being representative of the whole either by treating it as a single sample or by making appropriate statistical adjustments.

Statistical Misunderstandings

Sometimes an expert is needed to correct misunderstandings about statistics. For example it is often thought that a sample that only represents a small percentage of the population must be inadequate. In fact the sample size required usually hardly varies with the size of the population.

Misunderstandings about samples can be even more extreme. I was asked to give expert evidence in a fraud case. I was told that a sample of 97 customers had been interviewed and none of them had received the service promised for the fee that they had paid. I asked why my services were required in what seemed to be an open and shut case. My client explained that they were concerned because they knew of a similar case in which a sample of calls had been analysed. The judge in this had said that, if the case proceeded, he would have to tell the jury to assume that all the unsampled calls were legitimate. This, of course, would have meant that only a small proportion of customers could have been found to have been misled, so the case had to be discontinued. I took care that my Report explained clearly why the results could be extrapolated to all those who had received the calls. Nonetheless the Defendants called their own expert. We had an expert meeting and produced an agreed statement. This concluded that, although there were some minor points, the main findings that had been discussed in my report remained unchanged and the defendants were convicted.

Technical Language

Another source of misunderstanding is that certain words have a different meaning to their everyday interpretation when they are used for statistical concepts. The statistical meaning of ‘significance’ or ‘significant’ is often misunderstood. If a sample is used to test a hypothesis, say that a value exceeds a particular figure, then a result is significant if it is unlikely, (e.g. less than 5% probability) to have been obtained by chance, if the hypothesis was correct. This need not mean that the difference between the sample result and the hypothesis is so great that it could be described as ‘significant’ in the everyday sense. The value investigated may not be relevant or, if the sample is large, the difference may be too small to be relevant in spite of being statistically significant.

The term ‘normal distribution’ is another source of misunderstanding. This refers to a statistical distribution having a particular bell-shaped form, i.e. one specified by a certain mathematical equation. The misunderstanding arises because the terminology suggests that a distribution will always have this form.
However this is not necessarily so. For example incomes and body weights are not normally distributed.

I was asked to act as an expert in a case that concerned whether the products produced by a manufacturer had failed a quality control test. None of those sampled had actually failed the test but it was claimed that the test results showed so much variability that an unacceptable proportion of the products produced would have failed if they had been tested. This argument was based on the assumption that the test results would follow the normal distribution. I carried out an analysis that showed that this was not the case, i.e. the test results were extremely unlikely to be found if the sample had been drawn from a normally distributed population of products. In my report I said that I did not know why or how this had happened but that one possible explanation was that steps were being taken to make sure that the products did not fall below the critical level.

Another statistical term that is worth knowing is the Likelihood Ratio. This can be defined in different ways. I think the simplest is to say that it is the probability of certain evidence being found if a specified hypothesis is correct divided by the probability of it being found if the hypothesis is not correct or, in some cases, if an alternative hypothesis applies. The relevance of this statistic is that it evaluates the ‘weight’ of the evidence. This is because it indicates the factor by which the odds in favour of the hypothesis, i.e. the prior odds, increase when the evidence is included, assuming that the prior odds are independent of the evidence. The Likelihood Ratio may be particularly useful in a civil case, which is decided on the balance of probabilities. In a criminal case there is no statistical equivalent to the test of ‘beyond reasonable doubt’. Nonetheless it may still be worth calculating the Likelihood Ratio when considering a piece of quantitative evidence.

In some cases the only real evidence for a person being accused may be the evidence that is being evaluated by the Likelihood Ratio. In these circumstances it is difficult to know how small a number should be used for the prior odds, i.e. what is the statistical equivalent of ‘innocent until proven guilty’?

An alternative approach is to calculate the number of innocent people who could have similar evidence brought against them. I was asked by the defence to consider the case of a man who had been present in four different locations at the time a robbery took place. Although it was accepted that the man took no part in the robberies it was suggested that this coincidence was so unlikely that he must have been an accessory. It turned out that the man spent a considerable amount of time in these places. Even taking this into account I calculated that the probability of a particular person like him being present by chance when four crimes were committed could be as low as 1 in 250,000, although it could be higher - 1 in 600 or even 1 in 140 according to the assumptions that were made. I then pointed out that, although at first sight the probabilities that I had calculated might look low, they still meant that two or more innocent people in a year could be prosecuted on similar evidence. This was on the assumption that 1% of the UK population spent as much time as the accused in locations of the type where the robberies occurred.

Probability and Coincidence in criminal cases

The above example shows that it is easy to be mistaken when examining quantitative evidence. The prosecutor’s fallacy - that the probability of a conclusion about guilt or liability is the same as the probability of the evidence being found by chance - is or ought to be well known and I have already referred to the cases of Sally Clark and Angela Cannings .

In one case that I was involved in, an expert with a limited knowledge of statistics examined 91 records, picked the 10 most extreme cases and calculated for each of them the probabilities of their being found if they were chosen at random from a population with a specified mean and standard deviation. My own tests showed that the results taken as a whole were not abnormal. Other cases have included the one already quoted which concerned whether or not the circumstances of a set of crimes were sufficiently alike for them to be used as similar facts evidence and if so whether mere presence at a crime scene on multiple occasions is evidence of guilt.

Probability of incorrect records in Civil Cases

A question that can arise in civil cases is whether apparently contemporaneous records were actually created at the time or were constructed after the event. Statistics can help to answer this question since the characteristics of artificially created records often differ from the ones that they would have if they were
produced correctly. For example I was asked to examine sets of time sheets that were being used to substantiate a civil claim. I found a number of regularities that would have been extremely unlikely to have occurred by chance if the time sheets had been produced in the manner claimed. My clients received a satisfactory settlement.

Summary

I hope that this article has explained that quantitative evidence can arise in many different types of case and that where it does come up it can be advantageous to obtain advice from an expert at as early a stage as possible. It is also helpful to have an expert who can explain the issues clearly and be a good witness. The work carried out will depend on whether the expert is commenting on evidence produced by someone else or producing quantitative evidence themselves. In the first situation the expert will consider such issues as:

1) The method by which the evidence was obtained. This includes:

a) If a questionnaire was used the question wording and the likelihood of this having biased the results.

b) Whether the statistics presented were appropriate to the case.

c) Whether there were other statistics that might be relevant. If so, the expert will, of course, calculate them or explain how they can be produced.

d) If a sample had been used whether it correctly represented the population in relevant respects and the likely accuracy of any population estimates.

e) The likely impact of any deficiencies noted.

2) Alternative sources of data that might substantiate or refute the evidence presented.

3) The method by which the data was analysed and interpreted. This includes:

a) The reasonableness of any assumptions that had been made. This refers both to those that had been stated and to those that were implied but had not been described.

b) Whether there were any alternative methods that might lead to a different conclusion.

Where the expert is producing their own evidence they need to consider similar points to those set out above but from the opposite point of view. In other words they need to make sure that the study will provide the information that is required in a manner that is open to as little reasonable objection as possible.

In all cases they need to ensure that their report is as comprehensible as is possible.

Building & Construction News

Forensic News

Medico Legal News