The following is Part VIII in a multi-part series on how to draft and leverage an ESI protocol in any litigation. Part I of our series discussed the When, How and Why in planning for and creating your ESI protocol. Part II addressed the Key Components of an ESI Protocol, Part III walked through the Top 10 Situations You Can Avoid with a Protocol, and Part IV discussed Planning for the Production of Social Media. Part V covered the importance of including Manner of Production in your protocols, Part VI discussed the value of metadata and what to ask for, and Part VII covered Form of Production, Why it’s Crucial and What to Include.
In conjunction with this series, eDiscovery Assistant has created a new section in Checklists and Forms titled ESI Protocols that will include new content with each part of this series. That section includes sample ESI protocols, checklists on what to include and a list of metadata fields for inclusion in your protocol.
With all the talk of artificial intelligence (AI), and acquisitions and inclusions of AI into ediscovery review platforms, the reality is that search terms are still the first place we start when trying to identify a scope of data for collection. Note I said collection, because it’s my view that using search terms to identify data for preservation is a slippery slope that should be limited only to situations where 1) you have vast numbers of custodians and data and specific agreements negotiated using the process defined below, or 2) you are responding to a government inquiry that necessitates larger swaths of responsive material than targeted litigation requests should entail. For purposes of today’s post, I’m going to focus on using search terms in responding to discovery in litigation, not for broader investigations.
The Key Takeaway Regarding Search Terms
How you negotiate the process for deciding on search terms in your ESI protocol requires careful consideration and an understanding of the overall approach to discovery for a case. Failing to adequately consider how this process plays out can leave you and your client stuck with terms that do not cover the gamut of what you need for your case, or with overly broad terms that cost your client too much money to review non-responsive information. Your ESI protocol needs to include a process for identifying search terms, where the search terms will be run (i.e. in what system), what the responding party needs to do to validate those search term strings will provide the most responsive information, and how the parties will iterate on the process going forward.
Think too about HOW you will use this data at trial and how you will want to physically present it to the jury. That impacts the outputs you will want from the sources of ESI at issue. For example, a Celebrite output for text messages is an excel file. Is that how you want to display a key text string to a jury? Most jurors will struggle absorbing the information when it’s visually different than what they are used to, and you’ll lose the value of the data.
The key to search terms is remembering that the party with the data has the power to know what the best search terms are to be used. The absolute worst way to identify search terms is for a lawyer to sit in his or her office and guess at search terms, or to allow the other side to suggest that the requesting party propose terms first to be tested. The most effective process starts with the requesting party issuing narrowly tailored requests for production, and the responding party proposing search terms on a request by request basis. Not a custodian by custodian basis, a request by request basis. The proposed terms will have to take into account custodians, different sources of data and proportionality. Behind the scenes, the responding party will have already interviewed custodians (or you’ll have to get on that immediately) and asked them specifically what terms they use or others use to address the various issues in the case. If the discovery requests have already been issued at the time of interviews, ask the custodians specifically what information exists on a topic, who has it, and what terms would be used to describe it.
Once the responding party has requests, information from custodians and collected data, it should load data and start using the technology to run various iterations of search terms to identify the strings that are most responsive. As you are iterating, keep track of results so that you can explain why /50 is too broad and /10 gives you more responsive results. The responding party provides the proposed strings and the parties meet and confer on whether each one is appropriate or whether further iteration is needed (PRO TIP: It almost always is).
Search Term Hit Reports (often referred to as STR’s) are one tool to help iterate on search terms, but they are ONLY a tool. STR’s are not the holy grail of what search terms should be for a request or matter generally, and believing that they are, or not understanding what STR’s mean. We’ll talk more about STR’s and how to read them in a subsequent post. For now, if you don’t know how to use them, consider that before including them in your defined process.
When you have agreed upon search results for each request, the responding party runs those results, reviews the data and produces the responsive data. We always ask to have the search terms included as a metadata field in the list of metadata to be provided so that we can filter on the search terms and know what produced documents were responsive to that string. It’s an easy filter to do in any ediscovery review platform (as long as you ask for the metadata field) and let’s you analyze whether that search gave you the right documents.
What Systems are Being Searched
Where the agreed upon terms will be run is the next piece to focus on because there is an enormous disparity in results between systems. For example, the same search string run against multiple custodians in O365 will give you substantially different results than pulling the custodians mail into a review platform and running the search there. Think carefully about where the search terms will be applied and include it in the protocol. It varies by system and the type of data.
Next, consider the different types of ESI that you’ll be searching. You wouldn’t use the same search terms for email that you want for text messages or instant messaging. Frankly, search terms for text messages are fraught and I suggest steering clear of them — instead consider text strings among custodians and date ranges as a way to filter texts. Custodians will use entirely different language in instant messaging applications (think Slack, Teams, WhatsApp, etc.) than they use in email or texts. Have the responding party review the data and propose a methodology that’s reasonable — usually channels (for Slack) dedicated to the project with some carve outs for other non-responsive business information, or chats for WhatsApp with specific custodians. There are inexpensive ways to collect this data via searchable pdf that allow you to OCR the data and make it available for search if necessary. We prefer to review these types of data separately, so think about that as you are drafting the protocol.
Keep in mind that the process is iterative. Once you receive data, review it immediately to understand whether you need to ask for additional terms, additional custodians, etc. based on your review. And make sure your protocol includes language contemplating an iterative process. This is not a one and done process, but many producing parties will argue it is once search terms are agreed on initially, and you need to protect your client from that argument.
Pulling it All Together
Search terms and how to leverage them continues to be a key aspect of the ediscovery process. To use search and search terms effectively, you need the overall picture of the case and what you will want to do with your evidence in front of a jury to guide each step of the process. This isn’t simple, and it requires a great deal of thought. If you don’t think you have what it takes to undertake it for your client, find someone who does to assist. Missing out on key search terms can mean the difference between getting the documents you need or losing your case all together.
Part IX in this series covers negotiating a privilege log you can live with.