Ride the Lightning

Cybersecurity and Future of Law Practice Blog
by Sharon D. Nelson Esq., President of Sensei Enterprises, Inc.

Predictive Coding: Will it Gain Widespread Acceptance?

October 12, 2010

Thanks to Joe Howie for sending me his recent article with Anne Kershaw from Law Technology News on predictive coding. This was not a subject I knew a great deal about, so I enlisted the help of Jesse Lindmar, one of Sensei computer forensics examiners, to help me separate the wheat from the chaff. Thanks for the assist Jesse!

First, what is predictive coding? Here's one definition: "A combination of technologies and processes in which decisions pertaining to the responsiveness of records gathered or preserved for potential production purposes … are made by having reviewers examine a subset of the collection and having the decisions on those documents propagated to the rest of the collection without reviewers examining each record.”

So can you reliably produce e-discovery documents that have never been reviewed? It sounds unsettling, doesn't it? But Joe and Anne make a good case for it in their article.

They surveyed 11 e-discovery vendors who use predictive coding. The results reported that, on average, predictive coding saved 45% of the costs of normal review — beyond the savings that could be obtained by duplicate consolidation and e-mail threading. Seven respondents reported that in individual cases the savings were 70% or more.

And how does it work? Survey respondents described how their processes worked. Six used queries as a component of predictive coding, and five used clustering — with some differences on whether terms could be inferred or not (e.g., whether a document that contained "Ford and Toyota" could find or associate documents that only contained the words "Chevy and Honda").

I asked Jesse for his opinion about predictive coding and he summarized it this way:

"It seems to be a human-computer categorizing method that is a hybrid of other already existing methods. The expected result is to limit what information needs to be reviewed. Pluses are that the process is audited and reproducible (if the audit trail is followed) and, if done correctly, it should be more effective than linear review (although any of the other methods, if used correctly, are better than linear review). Negatives are that there has been no judicial acceptance and if the “coding” methodology is flawed you could be ignoring relevant data. Of course every vendor offering the service calls it a different name and uses a different process, so who knows if results would be reproducible across different platforms (most likely no)? Across the board, the vendors agree its best use is on large scale, limited time-frame cases.”

It doesn't appear that too many vendors (yet) employ predictive coding, perhaps partially (as Jesse notes) because judicial acceptance of it remains uncertain.

I had the good fortune to have lunch with Craig Ball last week (who generously gave me a lightning quick tour of his iPad, enough to engender iPad envy) so I asked Craig what he thought of predictive coding. He is, in his words, "a fan." As Craig observed, there are no silver bullets in e-discovery. And it is also true that GIGO (garbage in, garbage out) applies to predictive coding – you are at the mercy of the programmer's skills. But done correctly, it can save time and money – with quality results. He expects more widespread adoption and believes the bench will welcome predictive coding – so long as it is "done the right way."

Predictive coding is still in its infancy and I would continue to be wary of vendor claims until the baby has walked for a while. But as we struggle to manage vast amounts of electronic data, it looks like predictive coding may prove to be an invaluable tool.

E-mail:       Phone: 703-359-0700

www.senseient.com

http://twitter.com/sharonnelsonesq