Data Protection

Hadoop Data Encryption: “P.S. Find Robert Langdon”

August 10, 2016

“P.S. Find Robert Langdon” and an out-of-order Fibonacci sequence are part of the cryptic message in the opening scenes of “The Da Vinci Code.” Following that, we see cryptographer Sophie Neveu with Professor Langdon embark on a hunt to recover and unlock the secrets of the cryptex keystone. There are many other great movies I have enjoyed in which cryptography plays a role – from “A Beautiful Mind” to the “Imitation Game.” These films remind us that cryptography comes in many forms and has been used since early times to protect secrets, way before the invention of computers. Even so, encryption was not built into Apache Hadoop from the start – it was added over time and implemented across components. And today it has become a common method used to protect big data at financial institutions, healthcare organizations, telecommunication companies and government agencies.

HDFS Encryption

HDFS natively supports encryption of data via a mechanism called Encryption Zones. How it works is that an Encryption Zone is basically an HDFS directory that has been associated with an encryption key. Once the directory has been associated with the encryption key, all files in the directory and subdirectory will be encrypted automatically.

When using HDFS encryption, not all data in HDFS needs to be encrypted; you can have some directories with public or non-sensitive data in cleartext while sensitive data gets encrypted. Hadoop users can have their own Encryption Zones to protect their data from other users, and I will go more into that in the next section.

A common misconception about native HDFS encryption is the belief that the data is encrypted when written to disk on the data nodes like most disk encryption solutions. In fact, the data actually gets encrypted before it is sent to the data node. That architecture has two nice side effects: one is that the data is also protected in transit and the other is that it also prevents the keys from being exposed on the data nodes where the data is stored.

The cryptographic algorithm used to encrypt the HDFS data is industry proven AES. The default is AES-128 but for organizations that have standardized on the stronger AES-256 it is configurable. Each individual file in the Encryption Zone is assigned its own randomly generated key. This is good because in the event that a key were to be compromised, it would therefore only be usable on a single file.

There are some caveats to HDFS Encryption, however, that you should be aware of:

1) You cannot create an Encryption Zone on a non-empty directory. In other words, if you were expecting to take an existing directory with data and set it up as an encryption zone to have it all encrypted… No can do. That’s just not how it works. You need an empty directory to create an encryption zone. From there you can use the distcp command to copy all the data from its current directory into the new, empty and now encrypted directory (and then delete the old files).

Advertisement. Scroll to continue reading.

2) You cannot make your entire “/” root directory an encryption zone.

3) Nested encryption zones are not currently supported but they are on the roadmap. You may want to set up the entire /user directory as an encryption zone and allow users to then create their own nested encryption zones, but this is not yet currently supported. We will hopefully support this soon – perhaps by the time this article is published.

4) Test your applications. There are many scenarios where applications can break due to user permissions to the encryption keys, and restrictions on how files may be moved and copied in/out of encryption zones.

Hadoop Key Management Server (KMS)

The key to encryption (pun intended) is the actual management of the encryption keys. Each Encryption Zone will have its own unique key with associated user permissions to use that key. Each individual file will also have its own unique and randomly generated key. So where do those keys get stored, how are they managed, and how are they protected? This is where Hadoop KMS comes into play. Apache Hadoop KMS is a three-fold pluggable key management service that:

● generates and stores keys for Encryption Zones

● generates and encrypts/decrypts keys for files

● protects and manages the permissions to these keys

KMS is an independent service that runs separately from the Hadoop cluster and typically is managed by the Information Security team and not the Hadoop administrators. It is important to create a separation of duties when it comes to managing permissions to encryption keys. Hadoop administrators may have access to the hdfs user and all the data but not to the keys.

Permissions to keys are defined in KMS as Access Controls Lists (ACLs). These ACL permissions define which users/groups have access to encryption keys, also known as the whitelist. ACLs can also be created to define which users or groups are blocked from accessing keys, aka the blacklist. Using ACLs, users can block privileged and administrative users such as hdfs users from accessing user data.

KMS by default uses an implementation with the Java KeyStore (JKS). The JKS stores its secrets in file, which is typically only protected by filesystem permissions. The JKS implementation, however, has downsides and is not recommended for production systems. One of those downsides is security. You can put a password on a JKS file, but then where do you store that password – in a text file on the server? And there are also scalability and availability challenges; there are no built-in replication, redundancy and backup mechanisms for this JKS file.

In this article I only touched on HDFS encryption of data at rest and key management. But not all data resides in HDFS. There is sensitive data that may be outside of HDFS, there are temporary files and, of course, there is also data that is in transit. In my next column I will detail those other areas of encryption.

I leave you with the now-infamous anagram from “The Da Vinci Code” for your deciphering pleasure:

“O, Draconian devil!

Oh, lame saint!

So dark the con of Man”

###

Written By Eddie Garcia

Latest News

Click to comment

CIEM Chat: How to Reduce Cloud Identity Risk

March 26, 2024

Join the session as we discuss the challenges and best practices for cybersecurity leaders managing cloud identities.

Virtual Event: Ransomware Resilience & Recovery Summit

April 17, 2024

SecurityWeek’s Ransomware Resilience and Recovery Summit helps businesses to plan, prepare, and recover from a ransomware incident.

Navigating Vendor Speak: A Security Practitioner’s Guide to Seeing Through the Jargon

As a security industry, we need to focus our energies on those professionals among us who know how to walk the walk. (Joshua Goldfarb)

SD-WAN: Don’t Build a Dead End, Prepare for Future-Proof Secure Networking

SD-WAN must be scalable, stable, secure, and fully operational to serve as a strong base for seamless modernization and progression to SASE. (Etay Maor)

You Against the World: The Offenders Dilemma

Foreign attackers have many more toolsets at their disposal, so we need to make sure we’re selective about our modeling, preparation and how we assess and fortify ourselves. (Tom Eston)

Why Intelligence Sharing Is Vital to Building a Robust Collective Cyber Defense Program

With automated, detailed, contextualized threat intelligence, organizations can better anticipate malicious activity and utilize intelligence to speed detection around proven attacks. (Marc Solomon)

Know Your Audience When Speaking to Security Practitioners

How can security practitioners make sense of the vendor landscape and separate those who talk a good game from those who can execute, perform, and solve real problems for enterprises? (Joshua Goldfarb)

Application Security

Source Code Security Firm Cycode Launches With $4.6 Million in Funding

Cycode, a startup that provides solutions for protecting software source code, emerged from stealth mode on Tuesday with $4.6 million in seed funding.

Eduard KovacsSeptember 24, 2019

Quantum computing and the cryptopocalypse

Data Protection

Cyber Insights 2023 | Quantum Computing and the Coming Cryptopocalypse

The cryptopocalypse is the point at which quantum computing becomes powerful enough to use Shor’s algorithm to crack PKI encryption.

Kevin TownsendFebruary 2, 2023

Artificial Intelligence

AI Helps Crack NIST-Recommended Post-Quantum Encryption Algorithm

The CRYSTALS-Kyber public-key encryption and key encapsulation mechanism recommended by NIST for post-quantum cryptography has been broken using AI combined with side channel attacks.

Kevin TownsendFebruary 21, 2023

Compliance

Cyber Insights 2023 | Regulations

The three primary drivers for cyber regulations are voter privacy, the economy, and national security – with the complication that the first is often...

Kevin TownsendFebruary 2, 2023

Data Protection

How Quantum Computing Will Impact Cybersecurity

While quantum-based attacks are still in the future, organizations must think about how to defend data in transit when encryption no longer works.

Marie HattarAugust 30, 2023

Application Security

VMware Patches VM Escape Flaw Exploited at Geekpwn Event

Virtualization technology giant VMware on Tuesday shipped urgent updates to fix a trio of security problems in multiple software products, including a virtual machine...

Ryan NaraineDecember 13, 2022

Application Security

Fortinet Ships Emergency Patch for Already-Exploited VPN Flaw

Fortinet on Monday issued an emergency patch to cover a severe vulnerability in its FortiOS SSL-VPN product, warning that hackers have already exploited the...

Ryan NaraineDecember 12, 2022

Cybersecurity Funding

Data Protection and Privacy Firm Titaniam Raises $6 Million in Seed Funding

Los Gatos, Calif-based data protection and privacy firm Titaniam has raised $6 million seed funding from Refinery Ventures, with participation from Fusion Fund, Shasta...

Kevin TownsendFebruary 10, 2022

SECURITYWEEK NETWORK:

ICS:

SecurityWeek

Data Protection

Hadoop Data Encryption: “P.S. Find Robert Langdon”

More from Eddie Garcia

Latest News

Trending

CIEM Chat: How to Reduce Cloud Identity Risk

Virtual Event: Ransomware Resilience & Recovery Summit

People on the Move

Expert Insights

Navigating Vendor Speak: A Security Practitioner’s Guide to Seeing Through the Jargon

SD-WAN: Don’t Build a Dead End, Prepare for Future-Proof Secure Networking

You Against the World: The Offenders Dilemma

Why Intelligence Sharing Is Vital to Building a Robust Collective Cyber Defense Program

Know Your Audience When Speaking to Security Practitioners

Related Content

Application Security

Source Code Security Firm Cycode Launches With $4.6 Million in Funding

Data Protection

Cyber Insights 2023 | Quantum Computing and the Coming Cryptopocalypse

Artificial Intelligence

AI Helps Crack NIST-Recommended Post-Quantum Encryption Algorithm

Compliance

Cyber Insights 2023 | Regulations

Data Protection

How Quantum Computing Will Impact Cybersecurity

Application Security

VMware Patches VM Escape Flaw Exploited at Geekpwn Event

Application Security

Fortinet Ships Emergency Patch for Already-Exploited VPN Flaw

Cybersecurity Funding

Data Protection and Privacy Firm Titaniam Raises $6 Million in Seed Funding

SECURITYWEEK NETWORK:

ICS:

More from Eddie Garcia

Latest News

Trending

Daily Briefing Newsletter

CIEM Chat: How to Reduce Cloud Identity Risk

Virtual Event: Ransomware Resilience & Recovery Summit

People on the Move

Expert Insights

Navigating Vendor Speak: A Security Practitioner’s Guide to Seeing Through the Jargon

SD-WAN: Don’t Build a Dead End, Prepare for Future-Proof Secure Networking

You Against the World: The Offenders Dilemma

Why Intelligence Sharing Is Vital to Building a Robust Collective Cyber Defense Program

Know Your Audience When Speaking to Security Practitioners

Related Content

Application Security

Source Code Security Firm Cycode Launches With $4.6 Million in Funding

Data Protection

Cyber Insights 2023 | Quantum Computing and the Coming Cryptopocalypse

Artificial Intelligence

AI Helps Crack NIST-Recommended Post-Quantum Encryption Algorithm

Compliance

Cyber Insights 2023 | Regulations

Data Protection

How Quantum Computing Will Impact Cybersecurity

Application Security

VMware Patches VM Escape Flaw Exploited at Geekpwn Event

Application Security

Fortinet Ships Emergency Patch for Already-Exploited VPN Flaw

Cybersecurity Funding

Data Protection and Privacy Firm Titaniam Raises $6 Million in Seed Funding