Tags: , eDiscovery
ROI: Machine Learning in Small-Scale Review
The question has persisted since technology assisted review got its start: How big does a case need to be before it makes sense to load it into a TAR system? Is 30,000 documents enough? How about 300,000? In this recent case, it was just 33,000 documents, but TAR enabled the company to cut its review by 72% and get it done in just a week.
Facing a tight budget and a looming production deadline, counsel chose to utilize Catalyst Predict, an industry-leading Continuous Active Learning (CAL) tool. With CAL, the system continuously updates its document rankings to take advantage of additional judgments by reviewers. As training continues, rankings improve so the review team finds relevant documents faster. The number of unfound relevant quickly depletes, allowing you to test and defensibly end the review.
Only 4% of the entire collection was materially relevant. By prioritizing these records and defensibly culling the junk, we saved 53% in review costs.
Utilizing Legility workflows, an experienced review team, and the CAL algorithm, five reviewers were able to perform the work of 20 attorneys and completed the entire review in just seven days.
Through CAL, there is no need for a control set or initial training by a Subject Matter Expert. The review team was able to get started right away. Review is training and training is review.
As the team reviewed documents, Predict continuously learned from their tagging and presented increasingly relevant batches to the reviewers. Relevance in the batches quickly rose to as high as 80%.
In just days, batch relevance dropped to single digits as the team depleted the relevant population. The team stopped review and moved to a systematic sample to determine what level of recall was obtained. The sample determined, with a 98% confidence, that the achieved recall was 94.3% to 100%. After reviewing just 28% of the collection, a defensible stopping point had been reached.
13x Faster Reviewer
With a prioritized review, not only did we find a majority of the relevant documents, we also found them faster! In the first two days of review, 90% of all relevant documents were identified. In seven days, the review was complete. In comparison, a linear review of these same records would have achieved a mere 28% recall after seven days of review.
Does Machine Learning work on smaller datasets? Based on these results, the answer is a resounding YES!
By leveraging Continuous Active Learning and our workflows we:
- Found 94.3-100% of all relevant documents
- Defensibly culled 72% of the collection without review
- Saved 53% in overall review spend