Security Experts:

Connect with us

Hi, what are you looking for?


Incident Response

Hadoop Audit and Logging “Back in Time”

In my previous contributions I covered authentication and authorization in Hadoop. This time I will be covering Audit, the third of the three AAAs of Information Security.  Audit and monitoring are critical to data security.

In my previous contributions I covered authentication and authorization in Hadoop. This time I will be covering Audit, the third of the three AAAs of Information Security.  Audit and monitoring are critical to data security. Through audit, we can ensure that the security controls that are in place are working correctly and identify attempts to circumvent them.

Logs are a common method to record the actions of an application and allow administrators and auditors to go “Back in Time” to review a user’s actions. Much like your credit card or bank statement, these logs provide evidence of transactions performed. In absence of a time machine, these logs may be the only means to provide a historical view of what took place in a Hadoop cluster at a given moment in time.

As you all know by now, Hadoop has many different components and it just so happens that they have different types of audit logs. I will cover the auditing capabilities of several components in this article.

HDFS Audit Logs

HDFS is at the core of Hadoop, providing the distributed file system that makes Hadoop so successful. HDFS has two different audit logs, hdfs-audit.log for user activity and SecurityAuth-hdfs.audit for service activity. Both of these logs are implemented with Apache Log4j, a common and well known mechanism for logging in Java. The log4j properties can be configured in the file with:


Below is an example log for user Marty McFly after a listing of files/directories and an attempted copy to directory /user/doc which was denied.

2015-07-01 12:15:10,123 INFO FSNamesystem.audit: allowed=true  [email protected]

 (auth:KERBEROS) ip=/ cmd=getfileinfo src=/user/martymcfly dst=null perm=null

2015-07-01 12:15:10,125 INFO FSNamesystem.audit: allowed=true  [email protected]

 (auth:KERBEROS) ip=/ cmd=listStatus src=/user/martymcfly dst=null perm=null

2015-07-01 12:15:46,167 INFO FSNamesystem.audit: allowed=false [email protected]

 (auth:KERBEROS) ip=/ cmd=rename src=/user/martymcfly/delorean dst=/user/doc perm=null

MapReduce Audit Logs

Like HDFS, MapReduce also has two logs mapred-audit.log for user activity and SecurityAuth- mapred.audit for service activity. The log4j configuration can be found in the file with:


YARN Audit Logs

For YARN the user audit log events are not in a separate file but rather mixed into the daemon log files. To enable the service logging in YARN as with HDFS and MapReduce you enable the log4j property with:


Hive Audit Logs

 Hive is a bit different and uses the Hive Metastore for service logging. To identify the Hive audit events amongst the other logged events you can filter lines containing org.apache.hadoop.hive.metastore.HiveMetaStore.audit. Hive log events will also contain information to identify which database or table is being operated on.

HBase Audit Logs

HBase has a separate file for audit logs, though playing back the activity for a user is a bit trickier as the events can be spread amongst the HBase nodes. The events will contain information about the column family, column, table and action performed. The log4j configuration can be found in the file with:


Sentry Audit Logs

While logging user operations are important, logging admin operations and changes to user permissions is extremely important. Apache Sentry also uses log4j and has a dedicated file that is configured with:


Cloudera Impala Audit Logs

Each Cloudera Impala daemon will have its own audit log file. The format is a bit different and uses JSON for easier parsing of events. Like Hive, Impala will log information about the database, table and even SQL statement performed.

Monitoring and Log Analysis for the added benefit of Event Analysis and Alerts

Once you have set up all the Hadoop logging, an equally important step is to monitor the cluster proactively for security events, breaches and suspicious activity. And what better place to do this but Hadoop itself!

Among the many other great use cases for big data, one is to use Hadoop for log ingestion and security analytics. In the past, important information contained in log files was discarded during log rotations, but now with Hadoop, smart organizations are storing all log data for active archiving. Organizations then take advantage of the large ecosystem of tools that are available for advanced persistence threat (APT) analytics, security forensics, cyber intelligence and user behavior machine learning built on Hadoop.

Stay tuned for upcoming articles on new methods and approaches to capture network, packet and DNS data on Apache Hadoop to detect potential threats using machine learning.

It is always a good idea to make sure you have enabled logging correctly even on existing clusters or after performing upgrades. And if you are not currently storing logs in Hadoop you should definitely start now. 

Written By

Click to comment

Expert Insights

Related Content

Data Breaches

GoTo said an unidentified threat actor stole encrypted backups and an encryption key for a portion of that data during a 2022 breach.


A recently disclosed vBulletin vulnerability, which had a zero-day status for roughly two days last week, was exploited in a hacker attack targeting the...

Incident Response

Cygnvs emerges from stealth mode with an incident response platform and $55 million in Series A funding.

Data Breaches

T-Mobile disclosed another massive data breach affecting approximately 37 million customer accounts.

Incident Response

A new Mississippi Cyber Unit will be the state’s centralized cybersecurity threat information, mitigation and incident reporting and response center.


Albanian prosecutors on Wednesday asked for the house arrest of five public employees they blame for not protecting the country from a cyberattack by...


Thoma Bravo will spend $1.3 billion to acquire Canadian software firm Magnet Forensics, expanding a push into the lucrative cybersecurity business.

Application Security

GitHub this week announced the revocation of three certificates used for the GitHub Desktop and Atom applications.