E-Discovery – Using Keywords to Find the Needle in the Haystack

Since the emergence of Google as the world’s first port of call for “searching”, the importance of “keywords” as a tool for finding information has grown exponentially. If faced with an inconceivable volume of diverse data, it would be impossible to track down specific information without keywords and the search engines.

28th April 2015

Since the emergence of Google as the world’s first port of call for “searching”, the importance of “keywords” as a tool for finding information has grown exponentially. If faced with an inconceivable volume of diverse data, it would be impossible to track down specific information without keywords and the search engines.

The purpose of keywords in the field of e-disclosure is much the same. It can help the reviewer in two ways. The first is to narrow down the data set to be reviewed by limiting it to only documents containing that specific keyword (or keywords). The second is to assist a reviewer in finding the specific piece of information sought. It should be clear from the outset which one of these two purposes the reviewer seeks to achieve as this will influence the choice of keywords employed.

In order to narrow “big data” into a more manageable number of documents for review, the simplified form of keyword searching could and should often be employed. This is usually done at the outset, to ensure only relevant documents are taken through to the review phase. This can reduce costs considerably by cutting down the man hours needed to sift through a multitude of documents.

Typically, the first step in the e-disclosure process is to conduct a scoping meeting between the provider of e-discovery review platform, the client and their external solicitor. Together, they will devise a list of relevant keywords and apply these across the entire document set, picking out the documents containing any word that appears on that list. Generally speaking, a wide range of keywords can be used to ensure that vital documents do not slip through the net, making the choice of keywords defensible.

However, if the reviewer is looking for a specific “needle in a haystack” among the data set, the keywords used should be very specific and narrow. In the same way that an online shopper may choose a very specific set of words to find the exact pair of shoes they want (black shoes patent size 7 Marks & Spencer), a reviewer will employ a set of keywords to find exactly what they are looking for. This may include, for example, the custodian’s name, specific date and the subject matter of an e-mail. This technique would usually be employed in the later stages of the investigation, when the document set has already been narrowed down.

In terms of the searches themselves, the simplest form of keyword search is what is known as “Boolean logic searching”. A reviewer will use a Boolean operator such as “AND”, “OR”, and “NOT” to link keywords (e.g. “green” AND “apple” NOT “banana”). This, in turn, will only produce hits that match your Boolean search terms. In our example, the only results produced will be those that contain the words “green” and “apple”. This is particularly useful when trying to find specific documents.

“Fuzzy” or “string” keyword searching can also be used. This search method looks for not just the specific word the reviewer has selected, but also words that are similar. For example, a search for the word “huge” would also bring back documents containing the word “giant”.

Words can also be searched based on their proximity to one another. For example, a reviewer can request that the only results returned are those that contain the word “green” w/3 (within three words) of “apple”. This can be extremely helpful in narrowing down a data set.

Our experience in using these tools can help you find that needle in a haystack in a more efficient manner, both in terms of time and cost. It helps turn a once laborious task into a much more manageable one.

If we can help with your disclosure issues, please contact John Mackenzie, Guy Harvey or Hayley Pizzey.

Related articles