REUTERS |

Predictive coding: the rise of the machines?

In the ongoing case of Pyrrho Investments v MWB Property, Master Matthews has handed down the first reported English High Court decision on the topic of predictive coding. Judicial approval for the use of predictive coding during disclosure has been long awaited and has now finally arrived.

Does the Pyrrho judgment mark the start of, effectively, mandatory machine document review? Does it also mean, perhaps counter intuitively, that disclosure will now be a job for more senior fee earners? Our view is “yes” and “sort of”, in that order, but parties will now have to ensure, in any event, that they are au fait with predictive coding and be ready to use it very much more frequently.

Why would you use predictive coding?

Master Matthews stressed a few important points about disclosure in his judgment. Broadly speaking, he said that human review of documents is far from perfect in any event and the yardstick for a properly conducted disclosure exercise is a reasonable search and not, despite what some practitioners think, perfection. The point he was apparently driving at was that, if machines are at least as accurate as human reviewers, then they should be used for reviewing the rafts of electronic documents found in so many cases today, because machine review can cope with this task at both acceptable levels of accuracy and proportionate cost.

There is little, if any, scope to disagree with this logic, subject to understanding that predictive coding:

  • Relies on human training (so there is still scope for user error).
  • Relies heavily on statistical sampling and achieving an acceptable agreed error rate.

Predictive coding review process

An initial sample set of documents is used to train the software to classify the documents in its dataset as relevant or not relevant. This teaching set is manually reviewed prior to this by a senior lawyer familiar with the proceedings and the underlying issues in dispute, who categorises the documents contained within it as relevant or not relevant.

After the initial sample set is categorised by the human reviewer, the predictive coding software analyses the documents within the sample for language and common concepts. Based on this initial analysis, it then reviews the rest of its dataset for similar types of language and documents and categorises them as relevant or irrelevant.

A random selection of these documents is then manually reviewed again for relevance, to either confirm or correct the decisions made by the predictive coding software. The software learns from the decisions made by the reviewers, and uses this information to re-review all of the documents in the dataset. This exercise is repeated until the number of documents in each review batch that the reviewer has identified as incorrectly categorised by the software is below an agreed error threshold.

In addition, a review is performed on a batch of documents that have been reviewed manually, but where the software disagrees with the decision made by the human reviewer. Where the human decisions are deemed correct, these are fed back into the system for further learning.

The various rounds of review and analysis described above are based on statistical sampling. The number of documents contained in each round of review will depend on the required level of confidence and margin for error, which are agreed in advance between the parties. The higher the level of confidence desired (and, conversely, the lower the margin for error), the greater the sample for review must be. This, in turn, leads to the review taking more time and being completed at greater expense.

The number of documents that will need to be reviewed in order for the system to correctly categorise documents below the agreed error threshold may be high. However, in cases where there is a large quantity of documents that need reviewing for disclosure, this will tend to be a small proportion of the number of documents that would otherwise be reviewed in an exclusively manual, human review.

How will predictive coding change the disclosure process?

Contrary to the more common approach of using junior fee earners to undertake the disclosure exercise for costs reasons, predictive coding emphasises the initial use of a senior lawyer fully familiar with the pleaded issues in the case to establish the relevant parameters. This generates a short-term increase in cost, but in return for a potential long-term saving. It may also have longer-term implications for how litigation practices shape their teams.

In light of the emphasis that the Jackson review placed upon the need to reduce the costs of litigation, as well as recent advances in technology, we are sure to see predictive coding becoming much more prevalent in the coming years. In the increasingly common world of multi-million document review populations, the judgment of Master Matthews has effectively made predictive coding mandatory. It is difficult, to say no more, to conceive of a situation where an exclusively manual review of such a large population (even in conjunction with, for example, keyword searching) could now be considered proportionate given the apparent accuracy of predictive coding. Where the review population is in the low thousands, it might still be proportionate to undertake an exclusively manual review, but as predictive coding becomes a cheaper technology, no doubt it will only be exceptionally small cases that do not use it.

Memery Crystal Nicholas Scott Nick West

Leave a Reply

Your email address will not be published. Required fields are marked *

Share this post on: