Securing Apache Hadoop – How Enterprises Can Pass Internal and External Security Requirements and Audits
Over the past several years, thousands of IT products and solutions have emerged to solve every imaginable enterprise business problem. We’ve seen a non-stop procession of apps, clouds, platforms, services, infrastructures and more spring to life. Many of these emergent technologies and solutions are showing promise and in some cases even ROI — with Apache Hadoop among them.
Apache Hadoop, the standard framework for Big Data processing, has swiftly gained momentum across the enterprise. Enterprises are already running production-grade clusters on Hadoop as well as a large ecosystem of tools built around it, including Pig, Hive, Sqoop and Yarn, to name a few. The Hadoop technology stack is moving so quickly that some of the early technology like MapReduce is already being displaced by newer technology like Apache Spark.
As always, new challenges arise amid innovation. With security and compliance now dominating concern lists, many organizations are struggling to figure out how to scale security alongside their digital footprints. No IT security and risk professional wants to stall progress, nor do any want to allow their organizations to move at a pace that can lead to breaches, violations, lawsuits and job loss at even the highest leadership levels.
As a result, many enterprises — especially within highly regulated industries — aren’t able to move as quickly as they would like towards implementing Big Data projects and Hadoop. They don’t have to hesitate though, as many of the security and compliance challenges are now surmountable.
This article provides a look at how advancements in Hadoop security and compliance help those charged with maintaining security and compliance reduce associated risks and the size of the overall problems.
Laying the Security Groundwork
While the Hadoop platform continues to evolve, there are many security capabilities enterprises can implement today. To ensure secure and compliant Hadoop usage, security and risk professionals should make sure that they start with these basics:
1. Implement basic security measures. Most of the basic security measures are applicable when it comes to the Hadoop platform. Always create users and groups, map users to groups, assign and lock down permissions by group and enforce strong passwords. Build user onboarding and off-boarding processes with periodic audit reports. Limit super users, apply fine-grain permissions on a need-to-know basis and avoid coarse-grain and broad-stroke permissions.
2. Seek executive sponsorship. Executive sponsorship is crucial — security professionals need to demonstrate why security is a good investment to reduce risk. Security projects are often not backed until after an incident or failed audit. Prepare well and present the security initiatives needed to lock down the enterprise’s Big Data platform.
3. Harden the OS and lock down the Java VM. Don’t forget that Hadoop runs on an operating system and most of the software runs in a Java VM. Lock down the OS and Java VM according to security best practices. For example: enable built-in Linux firewall (iptables), disable root remote access, force SSH keypair login, use limited sudo, restrict root access and shut down and remove non-required services, just to name a few.
Taking Hadoop Security to the Next Level
Once the security basics are covered, security and risk professionals should then dig deeper into securing Hadoop. Security capabilities are being added every day. Many features are available that can help enterprises pass internal and external security requirements and audits. Dive into these main security areas to further secure Hadoop: perimeter, data, access and monitor .
1. Build a perimeter. Hadoop now supports industry-standard Kerberos to block access to non-authenticated users. And, with integration to LDAP and Active Directory, Hadoop can tie into centralized user and identity management systems.
2. Encrypt data. Due to compliance regulations — including HIPAA, PCI and internal policies —security professionals need to protect data from more than just unauthorized users. Extend protection to clear-text access over the wire using SSL, at rest using Linux encryption or via the soon-to-be-available HDFS encryption.
3. Configure users and permissions. Set permissions for users, groups or roles by defining access control lists. A separate Apache project that was started by Intel and contributed to open source as Apache Rhino has merged efforts with the Cloudera Apache Sentry project for Hadoop security and role-based access control.
4. Monitor, audit, detect and resolve issues. A crucial component of any security model is the capability to monitor, measure and audit the security process. Ensure that the enterprise’s security model is working as expected and that any suspect or actual security breach or non-compliance is quickly detected and resolved.
Hadoop is helping enterprises analyze and derive insights from data in ways they couldn’t before. Tapping into the benefits of Hadoop requires enterprises to secure their information assets to reduce risks that might cause problems down the road. The good news is, it’s possible today to ensure security and compliance in Hadoop, and continued innovation in the platform will let enterprises strengthen that security over time.
In the following columns I will explore these security layers in depth, covering the current, upcoming and future security capabilities of Hadoop.