Virtual Event: Threat Detection and Incident Response Summit - Watch Sessions
Connect with us

Hi, what are you looking for?



Research: Security Agencies Expose Information via Improperly Sanitized PDFs

Most security agencies fail to properly sanitize Portable Document Format (PDF) files before publishing them, thus exposing potentially sensitive information and opening the door for attacks, researchers have discovered.

Most security agencies fail to properly sanitize Portable Document Format (PDF) files before publishing them, thus exposing potentially sensitive information and opening the door for attacks, researchers have discovered.

An analysis of roughly 40,000 PDFs published by 75 security agencies in 47 countries has revealed that these files can be used to identify employees who use outdated software, according to Supriya Adhatarao and Cédric Lauradoux, two researchers with the University Grenoble Alpes and France’s National Institute for Research in Computer Science and Automation (Inria).

The analysis also revealed that the adoption of sanitization within security agencies is rather low, as only 7 of them used it to remove hidden sensitive information from some of their published PDF files. What’s more, 65% of the sanitized files still contained hidden data.

“Some agencies are using weak sanitization techniques: it requires to remove all the hidden sensitive information from the file and not just to remove the data at the surface. Security agencies need to change their sanitization methods,” the academic researchers say.

PDF files, the researchers note, represent collections of indirect objects (eight types of objects: arrays, boolean, dictionaries, names, numbers, streams, strings, and the null object) that are used to store data. These objects may include hidden data not visible when viewing the PDF.

Per the NSA, there are 11 main types of hidden data in PDF files, namely metadata; embedded content and attached files; scripts; hidden layers; embedded search index; stored interactive form data; reviewing and commenting; hidden page, image and update data; obscured text and images; PDF comments that are not displayed; and unreferenced data.

Metadata associated with images within a PDF file can be used to gather information about the author, the same as comments and annotations that haven’t been removed before publishing, and PDF metadata.

Advertisement. Scroll to continue reading.

There are several tools that can be used for sanitizing PDF files, including Adobe’s Acrobat, and there are four levels of sanitization: Level-0: full metadata (no sanitization), Level-1: partial metadata, Level-2: no metadata, and Level-3: properly cleaned files (full sanitization, with all objects having been removed).

For their research, the academics used a set of 39,664 PDF files. Of these, 1,783 (4%) were found to include author name, 30,155 (76%) contained metadata on the PDF producer tool, and 16,805 (42%) revealed the operating system used.

The files also leaked email addresses – including official ones – (in 52 files), hardware brand (581 files), and paths (1,814 PDFs).

“During our analysis we observed that many agencies include more than one author publishing the PDF files. It is possible to download all the PDF files published on a security agency’s website and observe the author habits, OS trends,” the researchers note.

The analysis also allowed for the identification of 159 employees at 19 agencies that haven’t updated tools over a period of two years, which could be abused by threat actors in targeted attacks, especially since nearly half of the PDF files leaked operating system data.

While 9,509 (24%) of the analyzed PDF files have been sanitized before publishing, only 3,313 (8%) were sanitized with Level-3. The researchers note that only 3 agencies out of 7 that appear to care about sanitization are doing it properly.

“The issue is that popular PDF producer tools are keeping metadata by default with many other information while creating a PDF file. They provide no option for sanitization or it can only be achieved by following a complex procedure. Software producing PDF files need to enforce sanitization by default. The user should be able to add metadata only as an option,” the academics conclude.

Related: Adobe Open Sources Tool for Sanitizing Logs, Detecting Exposed Credentials

Related: Researchers Disclose New Methods for Replacing Content in Signed PDF Files

Written By

Ionut Arghire is an international correspondent for SecurityWeek.

Click to comment

Daily Briefing Newsletter

Subscribe to the SecurityWeek Email Briefing to stay informed on the latest threats, trends, and technology, along with insightful columns from industry experts.

SecurityWeek’s Threat Detection and Incident Response Summit brings together security practitioners from around the world to share war stories on breaches, APT attacks and threat intelligence.


Securityweek’s CISO Forum will address issues and challenges that are top of mind for today’s security leaders and what the future looks like as chief defenders of the enterprise.


Expert Insights

Related Content

Application Security

Cycode, a startup that provides solutions for protecting software source code, emerged from stealth mode on Tuesday with $4.6 million in seed funding.

Data Protection

The cryptopocalypse is the point at which quantum computing becomes powerful enough to use Shor’s algorithm to crack PKI encryption.

Artificial Intelligence

The CRYSTALS-Kyber public-key encryption and key encapsulation mechanism recommended by NIST for post-quantum cryptography has been broken using AI combined with side channel attacks.


The three primary drivers for cyber regulations are voter privacy, the economy, and national security – with the complication that the first is often...

Cybersecurity Funding

Los Gatos, Calif-based data protection and privacy firm Titaniam has raised $6 million seed funding from Refinery Ventures, with participation from Fusion Fund, Shasta...

Application Security

Fortinet on Monday issued an emergency patch to cover a severe vulnerability in its FortiOS SSL-VPN product, warning that hackers have already exploited the...

Application Security

Many developers and security people admit to having experienced a breach effected through compromised API credentials.


Out of the 335 public recommendations on a comprehensive cybersecurity strategy made since 2010, 190 were not implemented by federal agencies as of December...