Compliance

Research: Security Agencies Expose Information via Improperly Sanitized PDFs

Most security agencies fail to properly sanitize Portable Document Format (PDF) files before publishing them, thus exposing potentially sensitive information and opening the door for attacks, researchers have discovered.

Ionut Arghire

March 15, 2021

An analysis of roughly 40,000 PDFs published by 75 security agencies in 47 countries has revealed that these files can be used to identify employees who use outdated software, according to Supriya Adhatarao and Cédric Lauradoux, two researchers with the University Grenoble Alpes and France’s National Institute for Research in Computer Science and Automation (Inria).

The analysis also revealed that the adoption of sanitization within security agencies is rather low, as only 7 of them used it to remove hidden sensitive information from some of their published PDF files. What’s more, 65% of the sanitized files still contained hidden data.

“Some agencies are using weak sanitization techniques: it requires to remove all the hidden sensitive information from the file and not just to remove the data at the surface. Security agencies need to change their sanitization methods,” the academic researchers say.

PDF files, the researchers note, represent collections of indirect objects (eight types of objects: arrays, boolean, dictionaries, names, numbers, streams, strings, and the null object) that are used to store data. These objects may include hidden data not visible when viewing the PDF.

Per the NSA, there are 11 main types of hidden data in PDF files, namely metadata; embedded content and attached files; scripts; hidden layers; embedded search index; stored interactive form data; reviewing and commenting; hidden page, image and update data; obscured text and images; PDF comments that are not displayed; and unreferenced data.

Metadata associated with images within a PDF file can be used to gather information about the author, the same as comments and annotations that haven’t been removed before publishing, and PDF metadata.

There are several tools that can be used for sanitizing PDF files, including Adobe’s Acrobat, and there are four levels of sanitization: Level-0: full metadata (no sanitization), Level-1: partial metadata, Level-2: no metadata, and Level-3: properly cleaned files (full sanitization, with all objects having been removed).

Advertisement. Scroll to continue reading.

For their research, the academics used a set of 39,664 PDF files. Of these, 1,783 (4%) were found to include author name, 30,155 (76%) contained metadata on the PDF producer tool, and 16,805 (42%) revealed the operating system used.

The files also leaked email addresses – including official ones – (in 52 files), hardware brand (581 files), and paths (1,814 PDFs).

“During our analysis we observed that many agencies include more than one author publishing the PDF files. It is possible to download all the PDF files published on a security agency’s website and observe the author habits, OS trends,” the researchers note.

The analysis also allowed for the identification of 159 employees at 19 agencies that haven’t updated tools over a period of two years, which could be abused by threat actors in targeted attacks, especially since nearly half of the PDF files leaked operating system data.

While 9,509 (24%) of the analyzed PDF files have been sanitized before publishing, only 3,313 (8%) were sanitized with Level-3. The researchers note that only 3 agencies out of 7 that appear to care about sanitization are doing it properly.

“The issue is that popular PDF producer tools are keeping metadata by default with many other information while creating a PDF file. They provide no option for sanitization or it can only be achieved by following a complex procedure. Software producing PDF files need to enforce sanitization by default. The user should be able to add metadata only as an option,” the academics conclude.

Written By Ionut Arghire

Ionut Arghire is an international correspondent for SecurityWeek.

Latest News

Click to comment

CIEM Chat: How to Reduce Cloud Identity Risk

March 26, 2024

Join the session as we discuss the challenges and best practices for cybersecurity leaders managing cloud identities.

Virtual Event: Ransomware Resilience & Recovery Summit

April 17, 2024

SecurityWeek’s Ransomware Resilience and Recovery Summit helps businesses to plan, prepare, and recover from a ransomware incident.

You Against the World: The Offenders Dilemma

Foreign attackers have many more toolsets at their disposal, so we need to make sure we’re selective about our modeling, preparation and how we assess and fortify ourselves. (Tom Eston)

Why Intelligence Sharing Is Vital to Building a Robust Collective Cyber Defense Program

With automated, detailed, contextualized threat intelligence, organizations can better anticipate malicious activity and utilize intelligence to speed detection around proven attacks. (Marc Solomon)

Know Your Audience When Speaking to Security Practitioners

How can security practitioners make sense of the vendor landscape and separate those who talk a good game from those who can execute, perform, and solve real problems for enterprises? (Joshua Goldfarb)

Cybersecurity Mesh: Overcoming Data Security Overload

A significant cybersecurity challenge arises from managing the immense volume of data generated by numerous IT security tools, leading organizations into a reactive rather than proactive approach. (Torsten George)

The OODA Loop: The Military Model That Speeds Up Cybersecurity Response

The OODA Loop can be used both by defenders and incident responders for a variety of use cases such as threat assessment, threat monitoring, and threat hunting. (Etay Maor)

Application Security

Source Code Security Firm Cycode Launches With $4.6 Million in Funding

Cycode, a startup that provides solutions for protecting software source code, emerged from stealth mode on Tuesday with $4.6 million in seed funding.

Eduard KovacsSeptember 24, 2019

Quantum computing and the cryptopocalypse

Data Protection

Cyber Insights 2023 | Quantum Computing and the Coming Cryptopocalypse

The cryptopocalypse is the point at which quantum computing becomes powerful enough to use Shor’s algorithm to crack PKI encryption.

Kevin TownsendFebruary 2, 2023

Artificial Intelligence

AI Helps Crack NIST-Recommended Post-Quantum Encryption Algorithm

The CRYSTALS-Kyber public-key encryption and key encapsulation mechanism recommended by NIST for post-quantum cryptography has been broken using AI combined with side channel attacks.

Kevin TownsendFebruary 21, 2023

Compliance

Cyber Insights 2023 | Regulations

The three primary drivers for cyber regulations are voter privacy, the economy, and national security – with the complication that the first is often...

Kevin TownsendFebruary 2, 2023

Compliance

DMARC Implemented on Half of U.S. Government Domains

Government agencies in the United States have made progress in the implementation of the DMARC standard in response to a Department of Homeland Security...

Eduard KovacsJanuary 3, 2018

Artificial Intelligence

ChatGPT, the AI Revolution, and the Security, Privacy and Ethical Implications

Two of humanity’s greatest drivers, greed and curiosity, will push AI development forward. Our only hope is that we can control it.

Kevin TownsendApril 3, 2023

Data Protection

How Quantum Computing Will Impact Cybersecurity

While quantum-based attacks are still in the future, organizations must think about how to defend data in transit when encryption no longer works.

Marie HattarAugust 30, 2023

Application Security

VMware Patches VM Escape Flaw Exploited at Geekpwn Event

Virtualization technology giant VMware on Tuesday shipped urgent updates to fix a trio of security problems in multiple software products, including a virtual machine...

Ryan NaraineDecember 13, 2022

SECURITYWEEK NETWORK:

ICS:

SecurityWeek

Compliance

Research: Security Agencies Expose Information via Improperly Sanitized PDFs

More from Ionut Arghire

Latest News

Trending

CIEM Chat: How to Reduce Cloud Identity Risk

Virtual Event: Ransomware Resilience & Recovery Summit

People on the Move

Expert Insights

You Against the World: The Offenders Dilemma

Why Intelligence Sharing Is Vital to Building a Robust Collective Cyber Defense Program

Know Your Audience When Speaking to Security Practitioners

Cybersecurity Mesh: Overcoming Data Security Overload

The OODA Loop: The Military Model That Speeds Up Cybersecurity Response

Related Content

Application Security

Source Code Security Firm Cycode Launches With $4.6 Million in Funding

Data Protection

Cyber Insights 2023 | Quantum Computing and the Coming Cryptopocalypse

Artificial Intelligence

AI Helps Crack NIST-Recommended Post-Quantum Encryption Algorithm

Compliance

Cyber Insights 2023 | Regulations

Compliance

DMARC Implemented on Half of U.S. Government Domains

Artificial Intelligence

ChatGPT, the AI Revolution, and the Security, Privacy and Ethical Implications

Data Protection

How Quantum Computing Will Impact Cybersecurity

Application Security

VMware Patches VM Escape Flaw Exploited at Geekpwn Event

SECURITYWEEK NETWORK:

ICS:

More from Ionut Arghire

Latest News

Trending

Daily Briefing Newsletter

CIEM Chat: How to Reduce Cloud Identity Risk

Virtual Event: Ransomware Resilience & Recovery Summit

People on the Move

Expert Insights

You Against the World: The Offenders Dilemma

Why Intelligence Sharing Is Vital to Building a Robust Collective Cyber Defense Program

Know Your Audience When Speaking to Security Practitioners

Cybersecurity Mesh: Overcoming Data Security Overload

The OODA Loop: The Military Model That Speeds Up Cybersecurity Response

Related Content

Application Security

Source Code Security Firm Cycode Launches With $4.6 Million in Funding

Data Protection

Cyber Insights 2023 | Quantum Computing and the Coming Cryptopocalypse

Artificial Intelligence

AI Helps Crack NIST-Recommended Post-Quantum Encryption Algorithm

Compliance

Cyber Insights 2023 | Regulations

Compliance

DMARC Implemented on Half of U.S. Government Domains

Artificial Intelligence

ChatGPT, the AI Revolution, and the Security, Privacy and Ethical Implications

Data Protection

How Quantum Computing Will Impact Cybersecurity

Application Security

VMware Patches VM Escape Flaw Exploited at Geekpwn Event