Ride the Lightning

Cybersecurity and Future of Law Practice Blog
by Sharon D. Nelson Esq., President of Sensei Enterprises, Inc.

Two Rival Predictive Coding Books: Doubledown for Dummies

December 5, 2012

Thanks to attorney Jeff Reed, a litigation and e-discovery veteran, for penning his thoughts on the two Predictive Coding for Dummies books published by Clearwell (Symantec) and Recommind. You can learn more about Jeff at the end of the post.

*****************

I’m usually a big fan of the John Wiley & Sons “. . . For Dummies” series. I had one for my tiny Honda Civic hatchback back in the early 1980s. The recipe for manifold chicken (a whole chicken, onions, carrots and potatoes wrapped in foil and bunji-corded to the engine when taking a long trip) was a revelation. And the maintenance tips were really helpful as well. In my experience there are two different models these guides follow. The first is a rather thin volume that skims through the basics and maybe alludes to some advanced material, but doesn’t aspire much if any higher. The Honda guide I had fell into the other group. It was a rather burly thing that had tips on simple stuff like topping off the coolant or oil reservoirs, but could also walk you step by step through the complex process of tearing down and rebuilding your entire engine.

Symantec and Recommind both recently published guidebooks titled “Predictive Coding for Dummies.”[1] Two such guides about predictive coding with almost identical names seems on its face to be needlessly confusing. And considering that Recommind devoted 25% less space to the topic even though it was published well after the courts started to parse the issues, my confusion deepened: Had the issues become simpler? Was Symantec’s approach so needlessly complex? Oh well, I guess I had to read the guides after all.

Symantec’s Version (2012)

Unfortunately, Symantec confused me more, right from the outset, as the introductory paragraph told me that it was “perfect” for me, regardless of whether I was an expert or novice. Huh? I thought this book was for Dummies (like me). Now I had to worry about deciphering “expert” terminology? I kept reading, only slightly daunted.

I needn’t have worried. The pamphlet seems to have been written by a Wiley staff writer who was carefully avoiding mistakes, but whose prose jerks this way and that as he (or she) assembles the basic facts of discovery, ediscovery and our now-digital world. Unfortunately, it does so in a way that would leave a high school student bored or laughing out loud. It also wastes space and time explaining what the pamphlet will be explaining in this and that chapter. And then it goes on with a typical topic sentence that again, in more adult fashion, explains what it will be explaining. A reader with a professional interest might well be insulted by the condescending tone or the lack of any deep content for all its 51 pages.

One example of the poor organization and writing involves the misuse of the content icons. As with most of the “For Dummies” Series, the pamphlet states that it uses a set of four visual icons to mark various kinds of important content (“Remember,” “Technical Stuff,” “Tip,” and “Warning!”). At 2. The text associated with the second “Technical Stuff” icon relates to tagging, but the next paragraph receives no special treatment even though it attempts to deal in a superficial way with the “iterative” process (not explained although certainly not Dummy terminology) of training the software through the use of “document training sets” to achieve “appropriate performance levels.”  There is no icon next to this paragraph even though, for many lawyers, the ability to know how many training sets to use, how to determine the appropriate performance levels, indeed even how to select relevant or determinative performance parameters are possibly the most difficult of the technical issues presented in establishing a defensible predictive coding work flow. Similarly, there are no “technical stuff” icons used in Chapter 3 at all, even though it purports to provide definitions of technical terms such as yield, confidence level, precision and recall. At 12-14. These kinds of things continue, but I don’t know that a complete catalogue of them would serve any purpose.

Chapter 3 fails to work for me on another level as well. At 11-20. It sets out a sample document collection followed by six steps of a general workflow. In these steps it talks about a number of different calculations and how important they are to the process, then simply states that the “calculations are beyond the scope of this pamphlet.” Then why the effort to set up the sample document collection? Showing the calculations is essential. Math-averse lawyers and judges need to get over it. The essence of predictive coding is math. It is the only way to verify that the entire system – software, training, performance parameters, and the workflows used to provide the input to the system – provides a defensible result. Some guide for the experts.

The rest of the pamphlet – with one exception – limps along in the same sort of manner. There is a decent amount of information presented and, again, there are no glaring mistakes or errors. But the information is presented in more of a public interest level suitable for press releases rather than at a level that would make sense for either a technical or legal participant in the ediscovery process. The one exception to my conclusion is in the section titled “Asking the Right Questions.” At 39-40. The questions set out should help anyone think more deeply about their ediscovery needs in general and their predictive coding needs in particular, as well as about any vendor they either may be considering or actually using.

The problem with the pamphlet, ultimately, is not that it is merely simplistic or introductory. Any pamphlet with this particular title should not aspire to anything more. But the text does. And it disappoints. Moreover, there is no consistent or reasonably complete attempt to send the introductory reader (or the advanced reader, for that matter) to more sophisticated resources that will help them grapple with the details of the technical and legal issues inherent in the predictive coding process.

Recommind’s Version (2013)

The Recommind pamphlet is subject to some similar criticism. For example, the claim that the “costs and scope of eDiscovery started to escalate in 2006” with the adoption of the Federal Rules specifically targeting ediscovery, at 4, is made without citing to any data supporting the claim. It also flies in the face of the Rules’ commentary that states that the Rules were adopted in response to the cost and scope escalations that were taking place without any guidance from the Federal Rules.

For the most part, the information presented has a better flow than the above mentioned version. This pamphlet also sticks to providing a very general non-technical overview of predictive coding and where it fits in with the general processes of discovery and ediscovery. Because it consistently sticks to these limitations, this guide probably serves its designed ends better. (That is until the last six pages, at 23 – 28, devoted into a somewhat superficial explanation as to why Recommind’s products are superior.) Admittedly, this pamphlet misses as much or more of the deep technical and legal issues as does the Symantec pamphlet. But unlike the Symantec version, Recommind apparently had no intention of trying to include that material. As with the longer Symantec version, the Recommind version could also have benefitted from the inclusion of a bibliography or foot or end notes to more sophisticated resources.

Finally, as with the Symantec version, the Recommind folks include a two-page section on finding a predictive coding solution. At 21-22. Although the questions listed are not as probing, it still provides some guidance to those mere mortals daring to go where angels fear to tread. Again, this section would have benefitted from a reference to more thorough treatments of
the topic.

Recommind picked the “thin” model from the Wiley line-up and stuck to it, turning out a coherently readable guide. It is not particularly useful as a complete how to manual, but that was not its purpose or design. Symantec’s version suffers from a number of problems. It is not particularly well written, fails consistently to use its own tools (like the icons), and fails to fulfill one of its primary objectives (to be a guide for the expert) by failing to include the calculations and other information on the technical issues it broaches. Both provide a suitable primer for the uninitiated, but neither is a resource that should find a permanent niche in anyone’s “ready-reference” bookshelf.

Jeff Reed is a 30-year litigator currently based in the Washington, D.C., area. He has handled litigation as in-house attorney for Fortune 50 companies and as first-chair attorney for large and medium-sized law firms. For the last 10 years he has concentrated on electronic discovery issues all across the EDRM continuum for both private- and public-sector clients. Jeff is alternately amazed and abashed by developments in the ediscovery space. Learn more at www.linkedin.com/in/jeffreyreedesq. Contact Jeff at


[1] Predictive Coding for Dummies, Recommind Special Edition, John Wiley & Sons, Inc., 2013; and Predictive Coding for Dummies, Symantec Special Edition, John Wiley & Sons, Inc., 2012.

Further comments on these two books? Feel free to e-mail me!

E-mail: www.senseient.com

http://twitter.com/sharonnelsonesq