Now on Demand Ransomware Resilience & Recovery Summit - All Sessions Available
Connect with us

Hi, what are you looking for?

SecurityWeekSecurityWeek

Compliance

Research: Security Agencies Expose Information via Improperly Sanitized PDFs

Most security agencies fail to properly sanitize Portable Document Format (PDF) files before publishing them, thus exposing potentially sensitive information and opening the door for attacks, researchers have discovered.

Most security agencies fail to properly sanitize Portable Document Format (PDF) files before publishing them, thus exposing potentially sensitive information and opening the door for attacks, researchers have discovered.

An analysis of roughly 40,000 PDFs published by 75 security agencies in 47 countries has revealed that these files can be used to identify employees who use outdated software, according to Supriya Adhatarao and Cédric Lauradoux, two researchers with the University Grenoble Alpes and France’s National Institute for Research in Computer Science and Automation (Inria).

The analysis also revealed that the adoption of sanitization within security agencies is rather low, as only 7 of them used it to remove hidden sensitive information from some of their published PDF files. What’s more, 65% of the sanitized files still contained hidden data.

“Some agencies are using weak sanitization techniques: it requires to remove all the hidden sensitive information from the file and not just to remove the data at the surface. Security agencies need to change their sanitization methods,” the academic researchers say.

PDF files, the researchers note, represent collections of indirect objects (eight types of objects: arrays, boolean, dictionaries, names, numbers, streams, strings, and the null object) that are used to store data. These objects may include hidden data not visible when viewing the PDF.

Per the NSA, there are 11 main types of hidden data in PDF files, namely metadata; embedded content and attached files; scripts; hidden layers; embedded search index; stored interactive form data; reviewing and commenting; hidden page, image and update data; obscured text and images; PDF comments that are not displayed; and unreferenced data.

Metadata associated with images within a PDF file can be used to gather information about the author, the same as comments and annotations that haven’t been removed before publishing, and PDF metadata.

There are several tools that can be used for sanitizing PDF files, including Adobe’s Acrobat, and there are four levels of sanitization: Level-0: full metadata (no sanitization), Level-1: partial metadata, Level-2: no metadata, and Level-3: properly cleaned files (full sanitization, with all objects having been removed).

Advertisement. Scroll to continue reading.

For their research, the academics used a set of 39,664 PDF files. Of these, 1,783 (4%) were found to include author name, 30,155 (76%) contained metadata on the PDF producer tool, and 16,805 (42%) revealed the operating system used.

The files also leaked email addresses – including official ones – (in 52 files), hardware brand (581 files), and paths (1,814 PDFs).

“During our analysis we observed that many agencies include more than one author publishing the PDF files. It is possible to download all the PDF files published on a security agency’s website and observe the author habits, OS trends,” the researchers note.

The analysis also allowed for the identification of 159 employees at 19 agencies that haven’t updated tools over a period of two years, which could be abused by threat actors in targeted attacks, especially since nearly half of the PDF files leaked operating system data.

While 9,509 (24%) of the analyzed PDF files have been sanitized before publishing, only 3,313 (8%) were sanitized with Level-3. The researchers note that only 3 agencies out of 7 that appear to care about sanitization are doing it properly.

“The issue is that popular PDF producer tools are keeping metadata by default with many other information while creating a PDF file. They provide no option for sanitization or it can only be achieved by following a complex procedure. Software producing PDF files need to enforce sanitization by default. The user should be able to add metadata only as an option,” the academics conclude.

Related: Adobe Open Sources Tool for Sanitizing Logs, Detecting Exposed Credentials

Related: Researchers Disclose New Methods for Replacing Content in Signed PDF Files

Written By

Ionut Arghire is an international correspondent for SecurityWeek.

Click to comment

Trending

Daily Briefing Newsletter

Subscribe to the SecurityWeek Email Briefing to stay informed on the latest threats, trends, and technology, along with insightful columns from industry experts.

Join the session as we discuss the challenges and best practices for cybersecurity leaders managing cloud identities.

Register

SecurityWeek’s Ransomware Resilience and Recovery Summit helps businesses to plan, prepare, and recover from a ransomware incident.

Register

People on the Move

MSSP Dataprise has appointed Nima Khamooshi as Vice President of Cybersecurity.

Backup and recovery firm Keepit has hired Kim Larsen as CISO.

Professional services company Slalom has appointed Christopher Burger as its first CISO.

More People On The Move

Expert Insights

Related Content

Application Security

Cycode, a startup that provides solutions for protecting software source code, emerged from stealth mode on Tuesday with $4.6 million in seed funding.

Data Protection

The cryptopocalypse is the point at which quantum computing becomes powerful enough to use Shor’s algorithm to crack PKI encryption.

Artificial Intelligence

The CRYSTALS-Kyber public-key encryption and key encapsulation mechanism recommended by NIST for post-quantum cryptography has been broken using AI combined with side channel attacks.

Compliance

The three primary drivers for cyber regulations are voter privacy, the economy, and national security – with the complication that the first is often...

Compliance

Government agencies in the United States have made progress in the implementation of the DMARC standard in response to a Department of Homeland Security...

Artificial Intelligence

Two of humanity’s greatest drivers, greed and curiosity, will push AI development forward. Our only hope is that we can control it.

Data Protection

While quantum-based attacks are still in the future, organizations must think about how to defend data in transit when encryption no longer works.

Application Security

Virtualization technology giant VMware on Tuesday shipped urgent updates to fix a trio of security problems in multiple software products, including a virtual machine...