Ride the Lightning

Cybersecurity and Future of Law Practice Blog
by Sharon D. Nelson Esq., President of Sensei Enterprises, Inc.

MD5 COLLISIONS . . . THE ODDS ARE AN "UNDECILLION"? WHO KNEW?

October 13, 2008

Guess I struck a nerve with the recent MD5 collision posts. Lots of mail – here’s one more, reprinted by the kind permission of my close friend Craig Ball, who is always a generous colleague. My husband/partner John can keep up with Craig, and indeed they seem to keep each other on their toes,  but I’m afraid I am always eating the collective dust of these extraordinary forensics technologists. Ah, well, humility is a virtue, right?

As always, Craig took me to school in the nicest of ways and expanded both my knowledge base – and my vocabulary. I had never heard the word “undecillion.” Should you too be in the back of the class, listen and learn from one of EDD’s true rock stars:

“There is actually a name for the number of zeros (10 to the 36th power) that follows the estimate of the likelihood of collision.  It’s called (I swear) an undecillion, and the spontaneous collision value for MD5 is one-in-340 undecillion.  You used ‘billion’ beautifully, but personally I prefer the almost-as-alliterative 340 trillion trillion trillion.  Sadly, thanks to Bush, banks, brokers and borrowers, we are coming to regard trillion as a prosaic number.

Since I was a boy, the enormous value of a billion has stayed with me by noting that if you had a billion dollars and counted it out at a dollar per second–24 hours a day, seven days a week until you were through–it would take you 34 years.

MD5 is truly "broken" for certain applications, but not for the way we use it in forensics, leastwise not to an extent that should cause us to stop using it near term.

In a nutshell, if you can control both sides of the input data set, you can achieve a collision between functional, intelligible data objects.  You can, for example, create two different executable programs that do different things (e.g., benign and malicious) but that hash identically.  There are some (harmless-but-troubling) Windows examples from Peter Selinger at http://www.mathstat.dal.ca/~selinger/md5collision/.  You can do the same for security certificates with, e.g., two names. 

Consequently, the "breaking" of MD5 has some genuine consequences for network and transactional security.  Imagine getting an engineered certificate approved by a registrar and then using the collision certificate.  Networks would see them as the same because they would hash identically.  Alternatively, if you could get the major antivirus applications to identify your first engineered application code as benign, you could subsequently promulgate colliding malware and achieve deeper destructive inroads into systems.

So why do I say it doesn’t matter much for forensics?  Because with current computing resources you have to control both sides of the equation; that is, you have to vary both collision candidates to secure the match while still producing intelligible content.  In forensics, we don’t have the luxury of modifying both the original evidence and an alternate set of data to engineer a collision.  Instead, we confront a block of data (e.g., hard drive or file) that has a certain MD5 hash value.  Engineering a collision to match a particular MD5 value is not what is happening out there.  They are (if I am understanding the work) engineering two files with different content to arrive at any matching MD5 value. 

We are still unable to take a data set of a known MD5 value and create a colliding dataset that contains alternate intelligible content.  As to how far we are from that achievement, I can only say that commentators routinely underestimate codebreakers, especially those who have an X-Box network or several high end graphics cards they can throw at the problem.  Amazing, isn’t it,  that the hacker who rocks our world won’t be usng a supercomputer by Cray or IBM–it’s more likely to be branded by NVidia or Sony Playstation!"

I am always happy to have Mr. Ball in my court – bowing deeply in the general direction of Texas . . . thank you Craig.

E-mail:    Phone: 703-359-0700