Finding Similar Pictures with dHash Values
When photos are shared between people using apps such as Whatsapp, Facebook, or Skype, the apps compress the files – losing the original metadata. Since the files are compressed and changed, the original hash value that would uniquely identify that file has changed – thwarting the file being found using a hash list.
To combat that, the dhash feature will find similar images – whether they have been resized, stripped of metadata, or converted using filters.
Examiners are very familiar with the traditional Message Digest 5 (MD5) and Secure Hash Algorithm 1 (SHA-1) for identifying a known file. Hash values are helpful for finding known child sexual abuse images or known files that come with the device such as icons. XAMN mobile forensic data analysis users may have noticed a new picture file metadata attribute named dHash Value.
So what is dHash?
I reached out to fellow employees at MSAB for more clarification. I received the following response from one of our developers.
“It’s a visual hash which can be used to quickly find similar pictures. Creating a dHash value doesn’t change the original picture. It differs somewhat from normal hash algorithms such as SHA-1 in that it doesn’t require an exact match to find similar pictures. The amount of bits that differ between 2 dHash values are a measurement of how similar the images are.” – Chris R.
So how does this process work?
The idea behind this is that you take any picture, change the image to gray scale, resize it (don’t worry this is done in RAM and not changing your picture), then a hash algorithm is used to identify the file based on the aforementioned process.
As an Examiner how does dHash in XAMN help me find similar pictures?
Select a picture. In the Details pane below the picture you can see the picture’s Exposure Information (EXIF) data. Scroll down to find the dHash Value.
Scroll down find the dHash Value and select it. Right click and select Create filter (choose either in current tab or new tab).
Here you can see that the original picture is shown along with a second one (selected below). The file name does not match the original picture and neither does the MD5 hash nor the SHA1 hash.
Note: dHash does not work so well for pictures with transparency.
More information on dHash may be found at either of these two websites:
https://www.hackerfactor.com/blog/?/archives/529-Kind-of-Like-That.html
https://cotin.tech/Algorithm/ImageSimilarityComparison/