Connect with us

Hi, what are you looking for?

SecurityWeekSecurityWeek

Incident Response

Hadoop Audit and Logging “Back in Time”

In my previous contributions I covered authentication and authorization in Hadoop. This time I will be covering Audit, the third of the three AAAs of Information Security.  Audit and monitoring are critical to data security.

In my previous contributions I covered authentication and authorization in Hadoop. This time I will be covering Audit, the third of the three AAAs of Information Security.  Audit and monitoring are critical to data security. Through audit, we can ensure that the security controls that are in place are working correctly and identify attempts to circumvent them.

Logs are a common method to record the actions of an application and allow administrators and auditors to go “Back in Time” to review a user’s actions. Much like your credit card or bank statement, these logs provide evidence of transactions performed. In absence of a time machine, these logs may be the only means to provide a historical view of what took place in a Hadoop cluster at a given moment in time.

As you all know by now, Hadoop has many different components and it just so happens that they have different types of audit logs. I will cover the auditing capabilities of several components in this article.

HDFS Audit Logs

HDFS is at the core of Hadoop, providing the distributed file system that makes Hadoop so successful. HDFS has two different audit logs, hdfs-audit.log for user activity and SecurityAuth-hdfs.audit for service activity. Both of these logs are implemented with Apache Log4j, a common and well known mechanism for logging in Java. The log4j properties can be configured in the log4j.properties file with: 

log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit

log4j.category.SecurityLogger

Below is an example log for user Marty McFly after a listing of files/directories and an attempted copy to directory /user/doc which was denied.

Advertisement. Scroll to continue reading.

2015-07-01 12:15:10,123 INFO FSNamesystem.audit: allowed=true  [email protected]

 (auth:KERBEROS) ip=/192.168.2.10 cmd=getfileinfo src=/user/martymcfly dst=null perm=null

2015-07-01 12:15:10,125 INFO FSNamesystem.audit: allowed=true  [email protected]

 (auth:KERBEROS) ip=/192.168.2.10 cmd=listStatus src=/user/martymcfly dst=null perm=null

2015-07-01 12:15:46,167 INFO FSNamesystem.audit: allowed=false [email protected]

 (auth:KERBEROS) ip=/192.168.2.10 cmd=rename src=/user/martymcfly/delorean dst=/user/doc perm=null

MapReduce Audit Logs

Like HDFS, MapReduce also has two logs mapred-audit.log for user activity and SecurityAuth- mapred.audit for service activity. The log4j configuration can be found in the log4j.properties file with: 

log4j.logger.org.apache.hadoop.mapred.AuditLogger

log4j.category.SecurityLogger

YARN Audit Logs

For YARN the user audit log events are not in a separate file but rather mixed into the daemon log files. To enable the service logging in YARN as with HDFS and MapReduce you enable the log4j property with:

log4j.category.SecurityLogger

Hive Audit Logs

 Hive is a bit different and uses the Hive Metastore for service logging. To identify the Hive audit events amongst the other logged events you can filter lines containing org.apache.hadoop.hive.metastore.HiveMetaStore.audit. Hive log events will also contain information to identify which database or table is being operated on.

HBase Audit Logs

HBase has a separate file for audit logs, though playing back the activity for a user is a bit trickier as the events can be spread amongst the HBase nodes. The events will contain information about the column family, column, table and action performed. The log4j configuration can be found in the log4j.properties file with:

log4j.logger.SecurityLogger

Sentry Audit Logs

While logging user operations are important, logging admin operations and changes to user permissions is extremely important. Apache Sentry also uses log4j and has a dedicated file that is configured with:

log4j.logger.sentry.hive.authorization.ddl.logger 

Cloudera Impala Audit Logs

Each Cloudera Impala daemon will have its own audit log file. The format is a bit different and uses JSON for easier parsing of events. Like Hive, Impala will log information about the database, table and even SQL statement performed.

Monitoring and Log Analysis for the added benefit of Event Analysis and Alerts

Once you have set up all the Hadoop logging, an equally important step is to monitor the cluster proactively for security events, breaches and suspicious activity. And what better place to do this but Hadoop itself!

Among the many other great use cases for big data, one is to use Hadoop for log ingestion and security analytics. In the past, important information contained in log files was discarded during log rotations, but now with Hadoop, smart organizations are storing all log data for active archiving. Organizations then take advantage of the large ecosystem of tools that are available for advanced persistence threat (APT) analytics, security forensics, cyber intelligence and user behavior machine learning built on Hadoop.

Stay tuned for upcoming articles on new methods and approaches to capture network, packet and DNS data on Apache Hadoop to detect potential threats using machine learning.

It is always a good idea to make sure you have enabled logging correctly even on existing clusters or after performing upgrades. And if you are not currently storing logs in Hadoop you should definitely start now. 

Written By

Click to comment

Trending

Daily Briefing Newsletter

Subscribe to the SecurityWeek Email Briefing to stay informed on the latest threats, trends, and technology, along with insightful columns from industry experts.

Understand how to go beyond effectively communicating new security strategies and recommendations.

Register

Join us for an in depth exploration of the critical nature of software and vendor supply chain security issues with a focus on understanding how attacks against identity infrastructure come with major cascading effects.

Register

Expert Insights

Related Content

Cybercrime

A recently disclosed vBulletin vulnerability, which had a zero-day status for roughly two days last week, was exploited in a hacker attack targeting the...

Data Breaches

LastPass DevOp engineer's home computer hacked and implanted with keylogging malware as part of a sustained cyberattack that exfiltrated corporate data from the cloud...

Incident Response

Microsoft has rolled out a preview version of Security Copilot, a ChatGPT-powered tool to help organizations automate cybersecurity tasks.

Data Breaches

GoTo said an unidentified threat actor stole encrypted backups and an encryption key for a portion of that data during a 2022 breach.

Application Security

GitHub this week announced the revocation of three certificates used for the GitHub Desktop and Atom applications.

Incident Response

Meta has developed a ten-phase cyber kill chain model that it believes will be more inclusive and more effective than the existing range of...

Cloud Security

VMware described the bug as an out-of-bounds write issue in its implementation of the DCE/RPC protocol. CVSS severity score of 9.8/10.