In our article Interrogation of big data, we outlined some of the key tools and search methods used to effectively and efficiently facilitate the interrogation of big data. In this article we provide a review of one of the most effective interrogation tools: predictive coding.
Predictive coding is the coming together of human reasoning and advanced technology. It is a two-step process that takes place once the data has been collected and uploaded onto a review database. The first step involves an ‘expert reviewer’ manually reviewing a sample of documents from the wider document pool and categorising them as he or she thinks fit. This expert reviewer will be an individual (or a small group of individuals) who knows the matter and the focus of the review in detail. It is therefore highly likely that this person will be able to accurately categorise the sample data.
In the second step, the decisions made during the discrete manual review will ‘train’ the review database to apply the expert’s pattern of categorisation. The database will create models, or algorithms, based on the content and metadata of the expert categorised documents, and these models are then applied across the entire document set, categorising all documents and highlighting those relevant to the case.
There are many advantages of using predictive coding in a large disclosure exercise. The accuracy and consistency of the review is a key one. Once the system is trained, it can detect the nuances in documents that may not have been visible to the human eye (such as hidden text). It also helps prevent mis-categorisation, particularly, by more junior colleagues who may not have much experience or understanding of the subject matter.
Using predictive coding will also save parties time and money. It may take a veritable army of junior solicitors to trawl through masses of data on a review platform, which can lead to high costs being incurred. By allowing one or two senior or ‘expert’ staff members to undertake a review of a far smaller subset of data, these costs can be reduced dramatically. Given that, post–Jackson reforms, proportionality must be considered in the course of a disclosure exercise, using predicative coding to make the process more economical in turn makes it more defensible before the Court, which will allow higher cost recovery for successful parties.
Predictive coding is also a defensible method of document interrogation. Parties may decide between themselves, the Court and/or Regulator who shall review the documents and the sample size that must be reviewed by an ‘expert reviewer’. Therefore, any fear that the computer may miss out the document or bring out a document that it should not is allayed by the fact that Court/Regulator has sanctioned it.
There are, of course, those within the profession who are doubtful of the effectiveness of replacing human insight with artificial intelligence. However, given the great difficulty in balancing disclosure obligations with the increasing volumes of data together with cost and time pressures, predictive coding provides a defensible way to meet these obligations.