A researcher has demonstrated that lossy image compressors can be used to hide arbitrary code inside PDF documents. The method could be highly efficient for malicious actors because security products are designed to ignore such data.
It’s not uncommon for cybercriminals to hide malicious code in PDF files. The malicious code is usually designed to exploit vulnerabilities in the application that is used to open the document, in most cases Adobe Reader.
Exploits can be hidden inside PDF files by using data compressors, such as Lempel–Ziv–Welch (LZW) and Deflate, and even image compressors, such as CCITTFaxDecode and JBIG2Decode. Security products are designed to scan PDF files for payloads compressed using these algorithms.
On the other hand, antiviruses and PDF forensic tools usually ignore data compressed with lossy compressors such as JPXDecode and DCTDecode. Lossy compression uses inexact approximations for representing the encoded content, which leads to a certain amount of information being discarded.
Lossy compression is efficient for images, but not for code, which is why security solutions assume that lossily compressed data can’t contain malicious code.
However, CSIS researcher Dénes Óvári has demonstrated that hiding malicious code in a JPEG image compressed with the DCTDecode lossy compressor is possible. The experts has determined that while encoding a color JPEG image would result in data loss that would lead to corrupted code, a high-quality grayscale JPEG image could do the trick.
“Although this is not a security breach in itself (an exploit still needs to be used inside the stream for malicious activity), the fact that the usage of DCTDecode for this purpose has seemingly been ruled out by the industry means that even known threats could be hidden in this way from anti-virus scanners and/or researchers,” Óvári wrote in a research paper published on Virus Bulletin.
“In order to provide users with maximum protection, the DCTDecode stream must no longer be overlooked: in PDF reader implementations, the referencing of uncompressed image data as parameters from objects expecting binary data should be prohibited. We should also perhaps re-examine the handling of other file formats in which data in JPEG format is assumed always to be lossily compressed, while a greyscale mode is still available,” the expert added.