Ride the Lightning

Cybersecurity and Future of Law Practice Blog
by Sharon D. Nelson Esq., President of Sensei Enterprises, Inc.

MD5 COLLISIONS – ARE THEY REAL? A PRACTICAL THREAT?

October 6, 2008

There seems to be a lot of confusion over the validity and use of MD5 hashes, especially as it relates to electronic discovery. In a previous post, I mentioned that identical files will have the same hash value. What about the reverse? Can a single MD5 hash value represent two different files? The scientific answer is yes.

In March 2005, Xiaoyun Wang and Hongbo Yu of Shandong University in China published an article in which they describe an algorithm that can find two different sequences of 128 bytes with the same MD5 hash. This is commonly termed an MD5 collision in the computer forensics field. Does this negate the usage of MD5 hash values as an authenticity mechanism? Certainly not. There have been several examples of different files having the same MD5 values. Generally, these examples are just a grouping of electronically stored information that have no useful purpose other than proving that an MD5 collision is possible. These sample files are contrived in order to prove the point. Since the original discovery, several enterprising individuals have actually created usable files to demonstrate the collision. As an example, there are samples of two different PDF files (with the same MD5 hash value) that have different contents. All of these collision examples use the same 128 bit technique used by Xiaoyun Wang and Hongbo Yu.

There has yet to be a documented case in real life where two “naturally occurring” files of different contents have the same MD5 hash. All of the samples so far have been forced demos. Don’t throw away the MD5 hash as a validation mechanism just yet. Understand what the MD5 can and can’t show. You’ll certainly want your expert to be able to explain an MD5 collision.  If your expert tells you it can never happen, it’s probably time to look for another expert.

Hat tip to Don Lewis, a forensic computer analyst from the Lakewood Police Department in Colorado, for suggesting the post.

E-mail: snelson@senseient.   Phone: 703-359-0700