#CaseoftheWeekCase Law

Episode 141: Finally! Well Reasoned Guidance on the eDiscovery Issues in Hyperlinked Files

In Episode 141, Kelly Twigger discusses the latest decision on hyperlinked files in which the court undertook, for the first time, a discussion and analysis of the issues of versioning and whether hyperlinked files are attachments and provided language for the parties to govern the issue in the case. In re Uber Techs., Inc. Passenger Sexual Assault Litig., 2024 WL 1772832 (N.D. Cal. 2024).


Introduction

Welcome to this week’s episode of our Case of the Week series brought to you by eDiscovery Assistant in partnership with ACEDS. My name is Kelly Twigger. I am the CEO and founder at eDiscovery Assistant, your GPS for ediscovery knowledge and education. Thanks so much for joining me today.

A couple of announcements before we get started. I am working with the Midwest Chapters of ACEDS on a four-part series on ESI protocols. There are four webinars in total, two have already aired, but are available to stream here. The third installment will be a judges panel moderated by David Horrigan on June 6th, and I’ll be hosting a workshop on July 18th. You can use the link to sign up for more details and review the previous sessions. It is free, and today’s decision will have a huge impact on those ESI protocols and all subsequent additions.

Second, our team from eDiscovery Assistant will be at the Masters Conference in Chicago on May 15th, and we’ll be conducting a breakout session in which we invite you to play our eDiscovery Explorer Trivia Game for lots of fun and surprises. If you’re in the area or planning to attend the Masters, please add our session to your agenda. You can email us at support@ediscoveryassistant.com for a free pass if you are at a law firm, government agency or in-house counsel and would like to attend.

On with the show.

Each week on the Case of the Week I choose a recent decision in ediscovery and talk to you about the practical implications. This week’s decision involves an incredibly well-reasoned decision from Judge Cisneros that finally evaluates the technical issues associated with hyperlinked files in the context of an ESI protocol. If you remember Must See TV, this is as close as we get in ediscovery case law.

Let’s dive into this week’s decision. It comes to us from the In re Uber, Techs., Inc. Passenger Sexual Assault Litig. This is a decision from United States Magistrate Judge Lisa Cisneros from April 23, 2024, so just from a week ago. We also covered an earlier decision in this case on Episode 138 just a couple of weeks ago. Judge Cisneros has 25 decisions in our eDiscovery Assistant database and, of course, is with the Northern District of California — one of the most recognized courts with their expertise on eDiscovery issues, particularly when they pertain to technology, as we see here.

As always, we identify each of the issues associated with a decision in our eDiscovery Assistant database, and this week’s issues include hyperlinked files, cooperation of counsel, proportionality, manner of production, ESI protocol, Slack, instant messaging, forensic examination, cloud computing, and metadata.

Facts

We are before the Court on an order resolving outstanding ESI protocol disputes. This is the second order on the parties’ ESI protocol. All of the previous rulings in this case are included in eDiscovery Assistant. In a previous ruling, Judge Cisneros entered Pretrial Order No. 9 and instructed the parties to meet and confer regarding cloud-stored document issues, metadata fields, and related provisions of the ESI protocol. Pretrial Order No. 9 also required the following:

Uber shall direct an employee with knowledge and expertise regarding Google Vault and Uber’s data and information systems to investigate in detail the extent to which Google Vault’s API, macro readers, Metaspike’s FEC or other programs may be useful to automate, to some extent, the process of collecting the contemporaneous version of the document linked to a Gmail or other communication within Uber’s systems, whether the email or communication is stored in Google Vault, or outside. This investigation shall not be limited to documents referenced by URL or hyperlinks in emails or Google documents stored in Google Vault, but shall also include other cloud-based messages such as Slack. Uber’s designated employee may consult with Uber’s e-discovery experts. Likewise, Plaintiffs shall also more thoroughly investigate these potential solutions.

Following that Order and the parties’ meet and confers, the parties submitted a joint discovery letter outlining to the Court the issues that they could not agree upon, together with affidavits or declarations from employees and experts. Uber submitted declarations from Uber’s eDiscovery Manager, the Vice President of Global Advisory Services at Lighthouse, Uber’s vendor, and Uber’s eDiscovery expert. Plaintiffs submitted an additional declaration also from their expert.

Uber uses Google Workspace, which includes Gmail, Google Chat, Google Drive, and Vault. Uber uses Google Vault as its information governance and ediscovery tool for its Google Workspace data. Through Google Vault, Uber retains and holds, among other things, users’ Gmail messages and Google Drive files. Uber’s standard discovery process involves exporting a custodian’s Gmail messages and Google Drive files from Google Vault. Data is not stored using Google Vault is sometimes referred to as “active data.”

At issue here are the dreaded hyperlinked files that reference a Google Drive document that “may still be evolving”, meaning that a recipient or others may modify that reference document from a central location that allows multiple people to edit it. Some important points about how Google Drive and Vault function with regard to exporting versions of hyperlinked files came from affidavits submitted by Uber and include the following:

Google Vault does not export, collect, or connect the contemporaneous versions of the hyperlinked documents with the corresponding emails or messages in which they are found. Rather, when a hyperlinked Google Drive document is exported from Google Vault, the current version of that document is exported. If a Google Drive document archived using Google Vault was edited after the email with the hyperlink to the document was sent, then the Google Vault export will not reflect the version of the document that existed at the time of the email. For data archived using Google Vault, and no longer in the active Google Workspace, there is a manual process in place to identify a historic version of a hyperlinked Google Drive document contemporaneous with the email communication.

If that makes your head spin, what it effectively means is you have to do a manual review of each one of these documents to be able to provide the contemporaneous version. That’s a heck of a lot of burden when we’re talking about an MDL-type litigation here.

The Court then reviewed the technology that’s currently available to link email and chat messages to Google Drive documents and the limitations of the two tools that were raised to do it. The first, Metaspike’s Forensic Evidence Collector (“FEC”) program was the tool that was initially raised by the plaintiffs in the Nichols v. Noom Inc. case back in 2021. That was the first decision from Judge Katherine Parker in the Southern District of New York on this hyperlinked files issue. FEC can retrieve active Google email and contemporaneous versions of linked Google Drive documents, but it does not have the same ability to do that with Google email and Drive documents that are archived using Google Vault. So, big difference here between what’s active in Google Drive and FEC’s ability to grab versions of hyperlinked documents versus what’s archived in Google Vault. Important distinction.

Uber’s ediscovery vendor, Lighthouse, developed another tool that’s being considered called Google Parser that extracts specific links to Google Drive documents from email and chat messages and certain metadata. Google Parser facilitates the grouping together of a message and document stored in Google for purposes of review and production, and it contains certain metadata fields relevant to search, review, and production of the message. However, there’s no evidence that this technology — which is an extraction tool — has been refined and deployed to collect contemporaneous versions of hyperlinked documents archived within Google Vault. So we don’t know whether Google Parser will work to collect the contemporaneous versions from Google Vault. We know that FEC does not.

Following a review of facts and evidence submitted by affidavit from the parties and citing Nichols v. Noom Inc., the Court stated that

In sum, the briefing and evidence, as well as related case law, have made clear that cloud computing and document retention through Google Drive and Google Vault introduce a host of challenges to producing hyperlinked documents from Google Drive and other sources.

But the Court also recognized the importance of having contemporaneous versions associated with a message or email:

[C]ontemporaneous versions of hyperlinked documents can support an inference regarding ‘who knew what, when.’ An email message with a hyperlinked document may reflect a logical single communication of information at a specific point in time, even if the hyperlinked document is later edited. Thus, important evidence bearing on claims and defenses may be at stake, but the ESI containing that evidence is not readily available for production in the same manner that traditional email attachments could be produced.

So what we have is the Court recognizing the technological limitations on the one hand, but also the value of the evidence to the receiving party on the other hand.

Analysis

There are three issues for analysis:

  1. Whether Uber can and is required to collect contemporaneous versions of hyperlinked documents,
  2. What metadata fields the parties must provide related to the hyperlinked files, and
  3. What is the definition of attachment for purposes of this proceeding.

The Court begins its analysis with a review of the parties’ proposed language on hyperlinked files and whether Uber can collect contemporaneous versions of those files, either in an automated fashion or manually. As to an automated fashion, Uber took the position that there was no technical stable solution available that successfully automates the process of collecting contemporaneous versions of hyperlinked documents. Uber proposed language that allowed them to use the Google Parser tool from Lighthouse and take reasonable steps to preserve the relationship between email messages and hyperlinked files and for the parties to continue to meet and confer on the issue. So the language that they said was, let us use Google Parser and we’ll do whatever we can do in an automated fashion, but that’s the best we can do because that’s what’s available right now.

The plaintiffs claimed to have created a “proof of concept” program, which supposedly demonstrates that there is a method available to programmatically retrieve contemporaneous versions of linked Google Drive documents from Google Drive. Plaintiffs’ proposed methodology provides that Uber is to create such a program based on plaintiffs’ proof of concept to produce contemporaneous versions of the documents within Google Drive. Plaintiffs proposed language for the ESI protocol that required full compliance with preservation of all hyperlinked files and required Uber to produce all contemporaneous versions where possible and to meet and confer as necessary.

The plaintiffs’ proposed language did not limit Uber’s obligations to produce the contemporaneous version attached to an email message or corresponding with an email message. I encourage you to go into this decision and view the full language that’s proposed by both parties. Both sides are thoughtful and well-constructed on these issues, but they are very different positions.

After reviewing the submissions, the Court was not persuaded that the plaintiffs’ proposed methodology was commercially ready to automate the collection of hyperlinked files, or as the Court calls it, not a “reasonably available option” for a couple of reasons.

First, the plaintiffs’ expert’s firm created the proof of concept program based on a post in Stack Overflow, which is a well-known and widely used form for developers and that contains scripts purportedly for retrieving a data-specific revision of a Google Drive, Google-native document identified by its document ID. If you’re not familiar with Stack Overflow, it is a private collaboration and knowledge sharing SaaS platform for developers. It’s essentially a question and answer website for computer programmers. But Uber pointed out that the anonymous user who posted that script on Stack Overflow admitted that it didn’t work. And according to Uber, even “a functioning version of the script would not address the issues presented here, in part because the script was designed for a single document using the Google Drive API, restoring a non-Vault document, with owner access” and was not designed for Google Vault.

Again, remember that the big issue here is that Google has stored its documents in Google Vault and the tools that are available to go and get these contemporaneous versions when it’s active data don’t work with Google Vault. Uber also objected to many other aspects of the proof of concept program, which they claim ignores necessary steps in the collection process and disregards numerous points of manual intervention.

Relying on Nichols v. Noom Inc., in which Judge Parker rejected a proposal for Noom to develop a similar program in 2021, the Court here noted that while the proof of concept program may eventually allow for the creation of a program that automates the process of collecting from archived Google Vault data — the contemporaneous versions of Google Drive hyperlinked documents — the Court would not order Uber to develop such a program to produce discovery in this case. No surprise there.

The Court also found, based on the submissions, that there is currently no technological solution available to automate the process of collecting contemporaneous versions of hyperlinked documents. Now, that’s not surprising again. We all know there is no commercially available solution to solve this problem, or we wouldn’t be having this discussion today. But having a Court as influential as the Northern District of California on ediscovery issues state it specifically for us is key to ediscovery case law.  

The Court then looked at the manual effort required to provide contemporaneous versions of hyperlinked documents because there is no automated solution available. Uber argued that the process was too burdensome, but the Court, citing Shenwick v. Twitter, Inc., recognized that Uber was aware of the limitations and pitfalls with respect to production of hyperlinked files from Google Vault, and yet it still chose Google Vault as its storage method. The irony here, of course, is that there’s no way to provide contemporaneous versions of hyperlinked documents from the other most widely used tool — Microsoft Teams — either. So electing to use Google Vault cannot be viewed as a way to preclude this discovery, only a reality of the technology as it currently exists.

Based on its findings, the Court ordered the following three provisions on metadata, hyperlinked files, and contemporaneous versions:

On metadata, Uber was required to preserve the metadata relationship between emails and hyperlinked files “to the extent feasible with existing technology” and “preserve and produce” all standard metadata fields for all cloud-based documents, and that the metadata be applied properly where that document exists in a load file. That’s a pretty good statement for the plaintiffs here, and that gets them back to a little bit more of a level playing field with regard to metadata, because they will at least be able to conduct a review using metadata — searching, filtering — to be able to identify where they don’t have contemporaneous versions.

On hyperlinked documents, the Court required the producing party — so both sides — to “make all reasonable efforts to maintain and preserve the relationship between any message or email and any cloud-hosted document hyperlinked or referenced within the message or email.” For example, where a collected email links to or references the URL of a Google Drive document, the Court required that “the metadata for that message or email shall include the URLs and Google Document ID of all hyperlinked documents.” Not may, shall. Pay attention to that language — “make all reasonable efforts.” It’s always the same standard that we come back to in ediscovery. Reasonableness is our floor.

As to contemporaneous versions of hyperlinked documents, the Court required Uber to produce

“to the extent feasible on an automated, scalable basis with existing technology, the contemporaneous document version, i.e. the document version likely present at the time an email message was sent, of Google Drive documents referenced by URL or hyperlinks therein. For hyperlinked Google Workspace data archived using Google Vault, Uber is not required to produce the contemporaneous document version at the time the email or message was sent, as this is not possible through an automated process with existing technology. However, Plaintiffs may identify up to 200 hyperlinks for which they seek the contemporaneous referenced document even though the email or message has been archived with Google Vault. Uber shall identify and produce the likely contemporaneous versions that Plaintiffs have requested. The scope of this production does not exempt Uber from any obligation that it preserve historic versions or revision history of any document referenced by URL or hyperlink.”

The Court did allow for the parties to stipulate or ask for a Court order to seek additional contemporaneous documents that are relevant. So above and beyond the 200 documents that it ordered, anything that’s relevant or proportional to the needs of the case can be requested through meet and confer with the parties in the stipulation or by request to the Court.

The Court then turned to the two metadata fields that the parties could not agree on. This is the second issue. Those two metadata fields are LINKGOOGLEDRIVEURLS and the account field, both of which the Court proposed to be excluded in its original Pretrial Order No. 9. LINKGOOGLEDRIVEURLS is a custom created metadata field designed to identify which Google Drive documents that have not been produced.

Uber argued that the plaintiffs dropped this on them at the last minute and that it was overly burdensome. Plaintiffs then proposed an alternative approach to add two metadata fields called missing Google Drive attachments and non-contemporaneous. Both of those are really long metadata field names. Missing Google Drive attachments identifies the Google Drive documents that were not produced, providing links to all linked Google Drive documents that could not be retrieved. You can see the value of that for the receiving party. Non-contemporaneous identifies the produced Google Drive documents that were not the versions in existence at the time the document was hyperlinked, i.e., the non-contemporaneous version of the document that was produced — again, a metadata field that will really assist in review and allow the plaintiffs to request contemporaneous versions of the documents.

The Court granted the plaintiffs’ request for the metadata fields, holding:

Plaintiffs’ new metadata proposals appear helpful to streamline review of Uber’s productions because the metadata will identify which hyperlinked documents are missing from the production and which documents produced are the non-contemporaneous versions. Streamlining review of the production of hyperlinked documents will advance the speedy and less expensive determination of this action.

And with that, the Court cites to Federal Rules of Civil Procedure 1.

The final issue before the Court was the definition of the word attachment, which is fraught, given that files no longer exist that are physically attached to a message with these cloud computing applications like Google Mail. Uber’s definition excluded hyperlinked files as attachments, where the plaintiffs’ proposal included them. The Court adopted plaintiffs’ proposal with the caveat, consistent with its order on contemporaneous versions, and adopted this language for the parties:

Attachment(s) shall be interpreted broadly and includes, e.g., traditional email attachments and documents embedded in other documents (e.g., Excel files embedded in PowerPoint files) as well as modern attachments, pointers, internal or non-public documents linked, hyperlinked, stubbed or otherwise pointed to within or as part of other ESI (including but not limited to email, messages, comments or posts, or other documents). This definition does not obligate Uber to produce the contemporaneous version of Google Drive documents referenced by URL or hyperlinks if no existing technology makes it feasible to do so.

That’s where we leave the Court in terms of what it said. We’ve got distinct definitions, we’ve got metadata fields agreed upon, and we’ve got an order with regard to contemporaneous versions of documents.

Takeaways

This decision from Judge Cisneros is the first time a court has truly waded into the technological issues involving hyperlinked files, weighed the considerations, and provided effective guidance on how to handle hyperlinked files and associated metadata based on the legal standards in ediscovery. It’s what we’ve all been waiting for in hoping to understand a requesting party’s ability to ask for this information and the scope of a party’s obligation to produce this information. While this issue of hyperlinked files weighs most heavily on the party leveraging the technology, the reality is two-fold: that every party now leverages cloud-based applications that involve hyperlinked files, and under the Federal Rules, a receiving party is entitled to the same value of the evidence that a producing party has.

This is not an issue that is limited to large MDL cases like the one before us. It’s inherent in almost every litigation of every size. The issue is the technological limitations to provide that evidence in the form of linked files, and those limitations effectively impose a greater burden on the receiving party to review information and identify where a hyperlinked file or contemporaneous version is not produced that is relevant or key to the issues in the case. Mitigating that burden is what the court’s decision does here by allowing for additional metadata fields and identifying that hyperlinked files are attachments.

So what do we take away as lessons from this decision? Well, several things.

First, you need effective representation that understands the nuances of these technological issues and can present them to the court effectively. If you don’t have it, get it or you will lose. There’s case law proving that.

Second, you need to understand the capabilities and limits of the technology that exists at the time of your dispute, I mean right at the time of your dispute, because it changes very quickly.

This decision is from April 23, 2024 — just a week ago as of this broadcast, and it is based on a Pretrial Order No. 9 that was issued just five weeks earlier on March 15, 2024. The latest, technological advances from Microsoft and Google to try and address these versioning issues with hyperlinked files were released in early 2024, so those would be reflected in the declarations discussing the technological capabilities that the Court relied on here in issuing its decision.  

In two weeks, six weeks, six months, or a year, we will almost certainly see technological advancements either by Microsoft or Google or by third parties to solve this versioning problem and create metadata fields for hyperlinked files. This is not dissimilar to the process that we had to develop way back with emails and physical attachments when we first started in ediscovery. Some of us are old enough to remember that.

Key to this Court’s decision are the declarations submitted on this discovery letter, but also what were obviously productive and thoughtful meet and confer sessions among counsel. I don’t often get to talk about cases in which that’s happened on the Case of the Week, so hats off to counsel for providing an example of how to engage and bring a thoughtful and productive motion to the court. There’s nothing wrong with bringing discovery disputes to the court, particularly of this specific nature. The more thought out and well-reasoned they are, the better the result and the better the law that is made for all of us. Here, Judge Cisneros did an excellent job of balancing the parties’ needs for relevant and proportional evidence with the technological limitations of providing it.

The language adopted by the Court here is what I consider to be an excellent blend of the technological capabilities available with the requirements to preserve and produce discovery under the Rules, and the corresponding limits of relevance and proportionality. Add in a dash of counsel’s obligations under Rule 26(g) that its investigation is reasonable, and the receiving party’s ability to come back and ask for more where warranted, and you have a great recipe for success on this issue with the current technological limitations.

Well done, Judge Cisneros, and thank you.

Conclusion

That’s our Case of the Week for this week. Thanks so much for joining me. We’ll be back again next week with another decision from our eDiscovery Assistant database.

As always, if you have suggestions for a case to be covered on the Case of the Week, drop me a line. If you’d like to receive the Case of the Week delivered directly to your inbox via our weekly newsletter, you can sign up on our blog. If you’re interested in doing a free trial of our case law and resource database, you can sign up to get started.



Categories
Archives
Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from - Youtube
Vimeo
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google
Spotify
Consent to display content from - Spotify
Sound Cloud
Consent to display content from - Sound