Understanding Hash Values

What is that MD5/SHA-1/SHA-256 thing I see when I go to download an application?

When you download an application installer, you will often see file information listed beside the download link. Usually you will see one or more hash values in the information. So, what’s that all about?

Let’s say you want to download the latest version of LibreOffice. On the download page you will see a link to “Info” which I have highlighted below.

Clicking on the “Info” link brings up a page containing file information. As you can see from the screenshot, there’s quite a bit of information given. Most of it is routine, such as filename, file size, modification date etc. But what are those MD5, SHA-1, and SHA-256 values, and what are they used for?

The answer is that they are examples of 1-way cryptographic hashes, which sounds really complicated. In plainer terms, they are a digital fingerprint for a file. We use them to verify that a file has not been tampered with. Essentially, we can use them to verify the integrity of the file we’re about to download and install on our computer.

The person making the file available to us has calculated the hash values of the file, and because they have made those hash values available to us we can do the same so that we can hopefully verify that the file we have downloaded is exactly the same and has not been interfered with. Even the slightest change to the file will result in completely different hash values.

So, how do we check the hash values for a file?

There are many tools available to check hash values, some command line, some GUI, and every operating system is well catered for. I like to use HashMyFiles, which is freely available from http://www.nirsoft.net/ and which runs as a portable app – so no installation is required. In this case I open HashMyFiles and drag the LibreOffice installer file into the main window (or navigate to it using the File menu) and it runs the calculation and displays the results to me.

I can see from the results that the hash values for the file I downloaded match the values that the uploader has made available, so I now have an extra element of certainty that the file I’m about to install has not been interfered with by a 3rd party.

Let’s see what happens if I make one tiny change to the installer. I’ve opened the file in a Hex editor, and I’m going to change just one value out of the millions that are in there.

I’ve gone to a random place in the file and edited one value from 9C to 00. A tiny change in a 300mb file.

With the change made, I save the file and run the hash calculation again, and you can see that I now receive a completely different result. This is how I can tell that the file has been interfered with.

Hashing has many uses. We use it regularly in Digital forensics to verify integrity of files, particularly forensic images of devices. It is used in databases of child exploitation images and videos to assist investigators in identifying the presence of known child abuse material. This is done using software to hash all the files being investigated and check the values against a database of hash values previously calculated for known child abuse material. Automating that process greatly speeds it up, and also spares a human being from having to look at the same disturbing material over and over again.

It is also heavily used by anti-virus and anti-malware products to identify malicious files and can also be used to de-duplicate or identify multiple copies of data being held by organisations.

The reason that hashing can be used in these ways is that there is an extremely high degree of certainty that two files with the same hash value are exactly the same. There have been some attempts made under laboratory conditions to create scenarios where hash collisions (two or more files generating the same hash value) can be created, but they aren’t a real world problem yet.

I hope you find the above informative, and happy hashing!


Colm Gallagher is the head of CommSec’s forensics business practice.