This post was syndicated from: The Hacker Factor Blog and was written by: The Hacker Factor Blog. Original post: at The Hacker Factor Blog
I am very particular about the image libraries I use. The libgif and libpng libraries are nice and solid, but they didn’t have all of the features I wanted. I ended up writing my own libraries. I also wrote my own own TIFF library because libtiff has too many security holes. Frankly, if you see source code that used incremental pointers without boundary checking, like
b1 = *pb++;, then you’re looking at a serious exploit potential. (Just because C permits it does not mean it is a good coding practice.)
For JPEGs, I do rely on libjpeg. Specifically, libjpeg6b (sometimes called libjpeg6-b, libjpeg6-2, or libjpeg62). The Independent JPEG Group (IJG) stopped supporting this version back in 2006, but most applications still use this version. IJG had wanted people to migrate to libjpeg8, but that didn’t happen. Between version 6 and version 8, the entire API changed — nobody wanted to go back and rewrite their code. Version 8 also has a more complicated API, so developers who don’t know better would use version 6 because it was available and easier to program.
Personally, I have a different reason for sticking with version 6. Version 8 does not render JPEGs the same way as version 6. V6 follows the JPEG standard to the letter, while V8 does a few enhancements that can impact some sensitive analysis algorithms. For digital image analysis, standard is better than better.
Unfortunately, libjpeg6b really isn’t maintained anymore, and I’ve found dozens of ways to crash the library. One of these days I’ll write my own complete JPEG decoder and just move away from the IJG code. But for now, I have a pre-parser that sanitizes JPEGs prior to calling libjpeg, just so they won’t crash.
Since I am dependent on libjpeg, I am very sensitive to anytime the library wants to be patched. Last month, my Ubuntu system identified that libjpeg62 wanted to be patched. “Hell no!” I don’t know what the patch does, but I’m not installing anything until I make sure it isn’t going to negatively impact my analysis tools.
The first thing I did was download the source code for the new libjpeg62 (apt-get source libjpeg62). I looked at the list of changes as well as compared code against the original libjpeg6b (the one from IJG, not the one from apt-get). The update added support for a cropping transform and better compiler support. Nothing that impacts rendering, so it’s safe for me to install. In addition, these patches are from 2010… nothing looks new.
Then I looked for ‘why’ it was updated…
Last November, the Full Disclosure mailing list posted a warning about a specific type of corrupted JPEG. It is possible to construct a specialized JPEG that results in an information leak from uninitialized memory. These exploits were given Common Vulnerabilities and Exposures (CVE) identifiers: CVE-2013-6629 and CVE-2013-6630. The posting to Full Disclosure even includes links to some fun sample images.
Warning for non-programmers: If really low-level programming details go over your head, then skip to the next section.
Here’s a technical description regarding how the exploit works…
JPEG is a horrible format that was clearly created by a committee. There’s one section that defines the type of JPEG and the components involved. The common ones are baseline, progressive, and lossless, but there’s a dozen others that nobody ever uses. So everyone with a hex editor can do this from home, let’s all use the same picture. This JPEG is one of the most popular images uploaded to FotoForensics. (We’ve seen over 200 variants of this picture. I think the guy may be from One Direction, but I’m not sure. I don’t follow boy-bands.)
wget -O image.jpg 'http://fotoforensics.com/analysis.php?id=fe81eaebc6c294bc8af1d1e9412f1af94d19c455.101587&fmt=orig'
JPEG uses ’0xff’ to denote tags. The tag “ffc0″ (found at offset 0xf8 in the JPEG) denotes a baseline JPEG “Start of Frame” tag (SOF). This is where the components are defined.
000000F0 0B 0B 0B 0B 0B 0B 0B 0B FF C0 00 11 08 01 E5 01 ................
00000100 F4 03 01 22 00 02 11 01 03 11 01 FF C4 00 1F 00 ..."............
The SOF is followed by a two-byte length (00 11 = 17 bytes, including the 2 bytes that specify the length). The “08″ identifies the precision — this is an 8-bit deep JPEG. (Most JPEGs are 8 bits deep.) Then comes the image size (01 E5 and 01 F4 = 500×485). Finally, we have the components definitions; each component is a color channel. In this picture, there are 3 (03) components. Each component has three bytes: the first defines the identifier (because you can never have enough indirection), the second identifies the subsampling, and the third identifies the quantization table for decoding. In this case, we have “01 22 00″, “02 11 01″, and “03 11 01″. So what does this mean?
The first component (array position  with values “01 22 00″) will use the identifier “01″. It will have 2×2 definitions per minimum coding unit (MCU). (The MCU is the JPEG grid size, so the grid for this element will be four 8×8 grids in one 16×16 grid.) Anything that references component id “01″ will be decoded with quantization table 00.
A little earlier in the file (offset 0x6E) is the quantization table definition (FFDB). I’m not going to dive into that structure because it is not relevant to CVE-2013-6629 or CVE-2013-6630. The structure defines table 00 as being the luminance definition. So component with id “01″ is the luminance. The subsampling identifies that this picture will use 16×16 chrominance subsampling, also called “4:2:0″.
The second component uses identifier “02″, has 1×1 records per MCU (i.e., 1 instance per grid) and uses quantization table 01 (chrominance). The third identifier “03″ also defines 1 chrominance record.
Different JPEG libraries use different component identifiers. Usually the identifiers are either 00, 01, 02 or 01, 02, 03. However, I’ve also seen them named (A,B,C), (Y,U,V), and (R,G,B). The byte value doesn’t really matter; it’s just an identifier.
In this file, we have a couple of Huffman table definitions (FFC4) and then the really key part: the start of stream (SOS), denoted by the FFDA tag at file offset 0x2BB.
000002B0 E9 EA F2 F3 F4 F5 F6 F7 F8 F9 FA FF DA 00 0C 03 ................
000002C0 01 00 02 11 03 11 00 3F 00 EE B4 DF D9 47 C3 9F .......?.....G..
The SOS works with the SOF to define the binary data stream. It is followed by a 2-byte length (000C = 12 bytes), number of components per MCU (3 in this example), and two bytes per component. (I know, 3 components × 2 bytes + 2 byte length + 1 byte number of components ≠ 12 bytes. The SOS header contains other junk that doesn’t impact CVE-2013-6629 and CVE-2013-6630.)
Each two-byte component in the SOS header contains the component identifier and the Huffman table identifier (from all of those FFC4 tags). In this case, the first entry says “01 00″. That means identifier “01″ will use DC table 0 (upper nibble of 00) and AC table 0 (lower nibble). The “02 11″ means identifier “02″ will use DC table 1 and AC table 1. And identifier “03″ looks just like identifier 02. Putting the SOF and SOS definitions together, we now know:
- The JPEG uses 16×16 chrominance subsampling because the luminance defines four 8×8 entries per MCU.
- Identifier 01 is the luminance, and it uses quantization table 0 with Huffman tables DC and AC.
- Identifier 02 is the first chrominance (chrominance-blue or U), and it uses quantization table 1 with Huffman tables DC and AC. There is one U entry per MCU.
- Identifier 03 is the other chrominace (chrominance-red or V), and it uses quantization table 1 with Huffman tables DC and AC. There is one V entry per MCU.
This means the binary data stream that comes after the SOS header will define a series of MCU elements. Each MCU will be in the format “YYYYUV”. The series of MCU elements in the data stream looks like YYYYUVYYYYUVYYYYUV…
I surrender! Show me the exploit!
The JPEG library has some idiot checking for corruptions. An invalid identifier in the SOS header will lead to a corrupt component abort. Similarly, an invalid Huffman table definition will lead to an abort. But… what if we have a valid definition but an undefined component? For example, rather than defining identifiers (1,2,3), what if we defined (1,2,1) and left 3 as undefined? Now we have two problems: we have one component defined twice (CVE-2013-6629), and an undefined component (CVE-2013-6630).
Now we can test this condition. Simply change the component identifier definition. In this example, I changed the definition from (1,2,3) to (3,2,3). My bad header looks like
FF DA 00 0C 03 03 00 02 11 03 11.
If your using a vulnerable version of libjpeg, then you should see colorful garbage right above this line. (And the 16×16 blocks are likely very visible.) If you’re not vulnerable, then you should see a blank space or a broken image icon (because your browser would not render a corrupted JPEG).
For real fun, save the JPEG and view it under different programs, like OpenOffice, Gimp, Image Viewer, ImageMagick (display), and Gnome’s nautilus on Linux, or Safari and Preview on a Mac, or other programs… Every program should show a different colorful picture.
wget -O image-bad.jpg 'http://fotoforensics.com/analysis.php?id=d9eb122455e9b02959deffb55e15f9e1384cc0dc.101587&fmt=orig'
Even if you patched your system, lots of applications include their own copy of libjpeg (or an alternate JPEG library) rather than rely on the system library. Even if one application won’t render the image, other programs will.
The real fun with this picture happens when you reload it. (NOTE: FotoForensics forces your browser to cache the picture, so just reloading this page in your browser will not show you anything different with these pictures.) To see it change, save the corrupt JPEG and open it in various programs, then ‘revert’ or ‘reload’ the picture. Watch the picture and see if it changes. The changes may be minor (different noise patterns) or major (cool new colors!). This happens because one of the components defined in the SOF is undefined in the SOS.
So… what happens when it is undefined? The JPEG library ends up with uninitialized memory. It’s allocated, but it’s not set. You end up with random data. The unset data can change with each reload because you have new uninitialized memory.
NOTE: Some programs cache the rendered image. Loading the corrupted JPEG in Microsoft Word or PowerPoint will render it once. Deleting and inserting the picture won’t change the rendering. However, closing the program and restarting it will render a different colorful pattern.
In the Hello Kitty example that was posted to Full Disclosure, he forces the browser to reload the picture a dozen times. First he loads a good picture, then he loads the corrupted version. He’s hoping that the uninitialized memory will align with a previous deallocation of the good picture. When this happens, the corruption shows part of the previous picture. (Try his example. If you don’t see the duplication corruption, then reload the page or hold down shift to force a reload.)
How bad is it?
As vulnerabilities go, this is a featherweight. The worst that can happen? The library will allocate uninitialized memory. Let’s say the dirty memory happens to include a plain-text password. Those bytes are passed through the inverse-DCT function and the results are converted from YUV to RGB. Both of these steps include lossy and non-reversible calculations. With the best of luck, I might be able to narrow it down to a couple of dozen potential values per character in the password.
On a more realistic attack vector, this could be used to profile the computer’s memory management structure and potentially identify the back-end operating system. However, there are other profiling methods that would be easier and more reliable.
As threats go, this is really a low risk.
Patching vs Fixing
As cool as this bug is, it is not new. It actually dates back to at least June 2004. Yet, CVE did not release an advisory until late 2013. That’s 9 years! While I’ve complained about the slow academic publication cycle, even academic journals publish faster than this. Compared to CVE, Congress’s ability to pass a budget seems streamlined.
There’s a couple of ways to fix this issue…
- One option is to load the SOS header and then check for invalid or omitted entries prior to use and set a reasonable default. The June 2004 advisory included a patch that attempts to repair the JPEG. Since the SOF and SOS usually list identifiers in the same order, the patch replaces the corrupt SOS record with the same order entry from the SOF.
- Libraries can be modified to detect and abort if this condition is detected. For example, Chrome implemented detector that aborts if the corruption is identified.
Adobe implemented a similar abort years ago. Photoshop CS5 pops up an error message that says, “Could not complete your request because no JPEG frame component ID was found equal to an already read scan component ID.” While overly technical for your average Photoshop user, it is completely correct.
- Another option is to detect the duplicate identifier and change it to match the next unused identifier. (I chose this option for my own code.) While this will probably render a lot of garbage, junk is better than aborting when doing forensics on digital pictures. (You cannot analyze anything, including metadata, if the library aborts or the picture won’t load.)
- In every JPEG I have seen, standard libraries use the same ordering as the components definition. Here’s a thought: if you detect this situation then forget the ordering and use the same ordering as the components definition in the SOF block. (This is similar to the 2004 patch, but it replaces all entries rather than just the one that was detected.)
- Probably the best option is to use a smarter initialization sequence. Replace every malloc() with calloc() to clear the memory, and set pointers with default values before loading the pointer settings. This way, even if the memory pointers are invalid, they still point to allocated memory that has been initialized. This gets rid of the uninitialized memory leak and ensures that reloading the image will not change the image.
Although there are many solutions, Canonical’s approach is definitely the wrong answer. The Ubuntu security forums decided to not fix it because libjpeg6b is “ancient”.
sarnold> Michal suggests libjpeg6b will not be updated from upstream
mdeslaur> upstream bug and proposed patch is ancient. Chromium contains
mdeslaur> a patch.
I guess the Ubuntu community missed the part about libjpeg6b being used EVERYWHERE. It may be old, it may have been ignored by IJG since 2006, but it is widely used.
Speaking of 2006… The official code released by IJG back in 2006 does not include any patches for this 2004 exploit. Call it oversight or incompetence, your choice.
There are other variations of this bug. For example, rather than changing the component IDs, change the Huffman table IDs. In the SOS header, changing the second byte for any of the components (e.g., changing the luminance from “00″ to 01, 10, or 11) will create a different type of corruption that still accesses uninitialized memory. However, there is no CVE for this bug. (And don’t expect one for the next few years; CVE assignments are not fast.)
There’s a couple of things that bother me about this latest update to libjpeg. First, the source code from apt-get says that nothing changed since 2010… so why was a patch pushed out?
Second, nothing in this patch addresses the recent CVE exploits. The description for this latest libjpeg62 update explicitly says it is for CVE-2013-6629 and CVE-2013-6630, but I don’t see that in the source code. I see the code changes that should be there, but they are not in the patch that was pushed out. After applying the patch, I checked every libjpeg on the system (find /usr -name libjpeg*) and none of them had recent timestamps. While I applied the patched version to my system, applications that still use the system-wide libjpeg6b act as if there is no patch.
More problematic to these bugs is the lack of ownership. For example, CVE-2013-6629 says that there are patches for Chromium, Thunderbird, and a few other packages, but not for libjpeg in general. And even though libjpeg6b is included on virtually every Linux distribution as well as many widely-used open source projects, there does not appear to be any maintainer who has taken ownership of this library. I can see who the Ubuntu maintainer is. I can find the RedHat maintainer. I can even find the maintainers for SuSE and Debian and other Linux versions. They all report that they applied the same patch, but I don’t see who provided the initial solution to all of these vendors.
Maybe I’m just missing something… (Shouldn’t
apt-get update and
apt-get upgrade be enough to patch this?) Or maybe there are bigger problems than uninitialized memory in libjpeg.