Big data installations often have almost no security built-in to protect the data being collected and stored, and there are very few “relevant and usable security tools,” according to a recent whitepaper.
All big data installations are built on the Web services model, with few or no capabilities to foil common Web threats foiling such as SQL injection and cross-site scripting attacks, research group Securosis wrote in “Securing Big Data: Security Recommendations for Hadoop and NoSQL.” Most big data APIs are vulnerable to “everything on the OWASP Top Ten, the paper claimed.
The report, sponsored by Vormetric, identified security problems big data users face, ways third party security products work with big data clusters, as well as providing recommendations on how these systems should be secured.
One of the critical problems facing big data projects is that most security tools cannot scale with the amount of data being inserted, Securosis said.
Database security in general means both the actual data and the application managing the data have to be protected. Many of the big data platforms lack many common security controls that are part of other data management platforms, such as configuration management, access control, auditing, and security gateways, Securosis wrote. NoSQL variants generally offer a single security control and not a comprehensive set of tools. Security is usually not turned on by default.
“Even Hadoop web consoles allow access without any form of authentication,” Securosis wrote.
Data at rest needs to be encrypted so that unauthorized users can’t access the data, but only “one or two obscure NoSQL variants” provide encryption for data at rest, and most do not, according to the paper. To make matters worse, most available encryption products are not scalable or transparent enough to work with big data, Securosis found.
Patch management can also be a challenge since not all the servers in the cluster may be running the same software. Administrators are also reluctant to reboot or change the software in order to minimize user complaints. Big data systems also need their own tools to handle auditing, logging, monitoring, filtering and blocking to look for errors and malicious activity.
“Security in typical big data implementations is largely an afterthought,” said Derek Tumulak, vice-president of product management at Vormetric.
Big data projects are common, almost the norm, among the enterprises Securosis spoke with for the report. The enterprises may have embraced the technology and pushed vast amounts of data into the clusters, but many have only user passwords in place to protect the data.
Hadoop can use Kerberos to authenticate users and add-on services to the cluster, but if a Kerberos ticket is stolen or duplicated, a rogue client can be added onto the network, according to the paper. “We know it is a pain to set up, but strong authentication of nodes is a principal security tool for keeping rogue servers and requests out of your cluster,” Securosis wrote.
Regardless of the kind of cluster it is, several critical security issues can be addressed by a “handful of” security measures,” such as using file-layer encryption to protect data at rest, Tumulak said.
Securosis recommended that administrators use Kerberos to evaluate nodes and client applications, use file- or operating system-based layer encryption to protect data at rest, validate nodes when being deployed, and use SSL or TLS network security to authentication all connections.
All transactions, anomalies and administrative activity should be logged and administrators need to think about a central key management server to protect encryption and other keys.
Despite the increased numbers, coming up with a definition of big data was a significant challenge, wrote Adrian Lane, CTO of Securosis. Big data clusters often have a few essential characteristics, but there are hundreds of possible permutations, making a concrete definition “elusive.” Big data is often thought of as a NoSQL database because big data can be unstructured and not have traditional relational constraints, but relational big data clusters do exist, Lane said.
Securosis decided that big data referred to self-organizing clusters built on a distributed file model such as Hadoop and could handle insertion and analysis of massive amounts of data. Anything beyond that “gets a bit fuzzy” and the range of potential uses was “nearly limitless,” Lane said.
The full report is available here in PDF format.
Related Reading: Examining The Security Implications of Big Data
Related Reading: Shaping Up “Big Security Data”