This post was syndicated from: The Hacker Factor Blog and was written by: The Hacker Factor Blog. Original post: at The Hacker Factor Blog
I’ve previous written about why I dislike the academic publication process. In particular, the publication process is not fast; papers can take years to become published. Peer review means almost nothing with regards to completeness, accuracy, or applicability. Most papers are far too complicated for anyone but the author to understand. References are usually incomplete and heavily biased toward tight circles of friends. When papers are published, the costs to read the paper is prohibitively expensive; only the wealthy can access these papers (and given the errors and oversights, most papers are not worth the cost). And finally, there’s the entire oneupmanship issue, where one minor change is deemed worthy of a full paper and not just a footnote.
Along these same lines, I think the term “novel” is overused in academic papers. According to Webster’s dictionary, novel refers to something “new and not resembling something formerly known or used”, or something that is “original or striking especially in conception or style”. However, academic articles typically use “novel” to mean “I couldn’t find someone else who found this same minor result.” When I see ‘novel’ in a paper’s title, I associate it with a three year old yelling “Mom! Mom! Mom! Look! Look! Look at what I can do!”
I searched Google Scholar for academic articles that used the word “novel” in their title. There were about 737,000 results. Then I began to read some of the PDF documents. A few did appear to be novel, but the vast majority were minor changes to existing approaches that had minimal differences in the results and were applicable to specific corner-case situations. I would hardly call them “novel”.
One of my loyal blog readers pointed me to an article in the Journal of Forensic Sciences: “A Novel Forged Image Detection Method Using the Characteristics of Interpolation” (Hwang and Har in J Forensic Sci, January 2013, Vol. 58, No. 1). He noted that the results look similar to Error Level Analysis and wanted my thoughts on the algorithm.
The paper’s basic approach: rather than using a JPEG resave to reduce noise in the picture, this approach uses a scaling algorithm. For example, you take your 4000×3000 picture and scale it to 72% of the original size (to 2880×2160). Then you scale it back up to 4000×3000 and compare the difference with the original picture. Digital alterations should appear as an altered area in the picture.
In the paper, they even discuss using different scaling algorithms: nearest neighbor, bilinear, and bicubic. (However, their paper makes it sound like these are the only scaling algorithms.)
My criticisms with this paper are not related to their approach. Rather, I’m more concerned with the lack of limitations, no details on how they determined parameters, failure to mention applicability, and whether this is really “novel”.
But before discussing the limitations, let’s focus on the good parts. First off, the paper is relatively well written and easy to understand. The algorithm is a logical extension to existing knowledge.
How it works
Digital pictures contain high frequency and low frequency components. The low frequencies define the overall structure of the image. The high frequencies fill in details. And camera sensors introduce extremely high frequencies that we may not visually notice.
There are lots of different algorithms that measure signal noise. Error Level Analysis (ELA) is one such example. It measures the lossiness from repeated JPEG compressions. A picture with more high frequency noise will change more during a JPEG save than a lower quality picture. And when a picture is first altered, the modified areas will usually have a significantly different error level potential.
There are two JPEG properties that permit ELA to work. The first is JPEG’s use of integer values. Each JPEG save converts the picture to frequencies using floating-point values. But then the values are stored as integers, resulting in fractional truncations. These lost fractions cause JPEG to be a lossy format since some information is removed each time the picture is saved.
The second JPEG property is the use of discrete cosine transforms (DCTs) with weighted quantization matrices (Q tables). The Q tables identify how large the truncated fractions can be. In general, high frequencies are cut off more than low frequencies — this is because low frequencies define the picture’s shape. If a little of the fine detail is removed, most people won’t notice the impact to the picture. (For more details on how JPEG compression works, see “How I Met Your Mother Through Photoshop“.)
Any algorithm that reduces high frequencies and uses integer truncation should be able to identify similar high-frequency edits.
JPEG compression is not the only way to remove high frequencies from a picture. As I mentioned in “Quality is Job One“, simply scaling the picture smaller will remove high frequencies. If you scale a picture smaller and then scale it back up, it will look blurrier than the original picture because the high-frequency fine details are removed. Specifically, the scaling removes values (lossy transaction) and the integer truncations makes the loss permanent.
In this example, I scaled the picture to 25% of the full size, then scaled it back up. You can see how the scaled larger picture lost high frequencies and looks blurrier compared to the source picture.
The main difference between the JPEG resave and image scaling approaches concerns where the cutoffs occur. With JPEG, each DCT value represents a frequency band. Every frequency band looses a little, but the higher frequencies loose more than the low frequencies. With scaling, you only lose the high frequencies and not a little from every frequency band.
Enough theory! Show me pictures!
So here’s an example from a picture uploaded to FotoForensics. The source picture was scaled smaller, to 80% of the original size using a bicubic algorithm. When it was scaled larger, high frequency details were lost. Then I compared the difference between the rescaled picture and the source:
In this example, you can see how the algorithm highlights the areas that were modified in the source picture. This happens because the unmodified areas lost high frequencies at a different rate compared to the modified areas.
And the bad news?
The first thing to notice is that every sample picture in their paper shows that their algorithm works. But every one of those samples was created by the authors of the paper. In effect, they selectively chose/manufactured pictures that would give the appearance of a high-performance algorithm.
I ran the algorithm past 100 picture from FotoForensics. I used both bilinear and bicubic scaling, and scale values between 50% and 100%. The scale-compare algorithm was able to highlight modifications in ten pictures. It missed all of the others — 90% failure rate. Moreover, the cases where it worked are all extreme pictures — large images where the modifications are so obvious that they were caught by every frequency analysis algorithm I have.
Here is the same example of a modified image (left), along with the ELA (middle) and their scaling comparison (using 80% reduction with bicubic scaling) on the right.
And here’s a typical case where scale-compare missed it. Again, ELA is the 2nd image and scale-compare is on the right:
And here’s another example from Reddit’s Photoshopbattles: “Guy riding a dinosaur“. Yes, they removed two people riding a dinosaur… ELA shows the modified areas, but scale-compare failed to highlight anything.
ELA certainly isn’t perfect. If a picture undergoes multiple resaves, scaling, or significant recoloring, then the noise patterns are all normalized and ELA will not highlight anything abnormal. In every instance where ELA was unable to highlight modifications, scale-compare also failed.
There are other limitations with this research. For example:
- They suggest using bilinear, bicubic, and nearest neighbor for scaling. However, they don’t discuss the circumstances when one will work and another will fail.
- The paper says to use “1/N” and “N” as the reduction and enlargement scaling factors. However, they never explain how to find a good “N” value. Their samples use values such as 86%, 74%, and 53%, but with no explanation on how they came up with those values. (I’m suspecting brute-force trial and error until they found one that looked good for their paper.)
- This scale-compare algorithm can highlight significant differences in noise quality. However, it is mostly limited to detecting smudge and blurs. It typically misses splices.
- The last paragraph in the paper does mention one limitation. It says that scale-compare does not work well for pictures saved using “compression quality <6 in Photoshop”. However, it also does not work with higher compression levels, pictures that have been repeatedly resaved, most web-size pictures (those less than 800×800), and anything from Facebook, Instagram, Twitter, and most online dating sites. Scaling a picture smaller will lessen the ability to highlight modifications, and significant scaling will completely remove this algorithm’s ability to detect alterations. (I’m ignoring the issue about the values from Photoshop Quality 6 changing based on the version of Photoshop.)
- There’s also a lack of information about performance. No metrics about the ability to highlight modifications from an random selection of images, and nothing about speed. Scaling a large picture twice can be significantly slower than saving and loading the image.
Finally, they did not mention any other variations to this approach. For example, you can apply a Gaussian blur, box filter blur, or FFT low-pass filter to the picture in order to remove high frequencies. Each of these will work just as well as their scale-compare algorithm.
Here’s that same sample picture, compared against a box-filter blur of itself. The results are almost the same as the scale-compare algorithm, but took a fraction of the time to compute:
Mom! Mom! Mom! Look! Look! Look at what I can do!
This scale-and-compare method described by Hwang and Har is yet-another approach to detecting changes in high-frequency noise. However, there are no situations where this approach will detect something that other, faster algorithms would miss. Moreover, Hwang and Har left out a significant amount of information, such as how to tune parameters. Finally, their approach used well-known artifacts from scaling and blurring, and the common method of comparing against the original size — none of this is new or “novel”.
Back in 2005, Hsiao and Pei wrote “Detecting digital tampering by blur estimation“. This paper uses a variation of the Gaussian blur and box filter blur comparison that I described. And in 2008, Mahdian and Saic wrote “Blind Authentication Using Periodic Properties of Interpolation“, where they actually used bilinear and bicubic scaling to identify alterations (see the Mahdian/Saic PDF pages 6-7) and extended it to handle rotations and skewing. (And Hwang/Har knew about Mahdian/Saic, since they listed that paper as citation #18.) To put it bluntly, the work by Hwang and Har is an interesting rewrite of existing technologies, but it is incomplete, heavily biased, and certainly not “novel”.