The IT world is quickly embracing Big Data. Huge data stores are the next big step in analyzing the massive amounts of data being collected in the name of analytics. For example, startups are using these systems to analyze trillions of DNA strips to gain an understanding of our genealogy. Well-established companies are adopting the technology to map and time transportation systems across the world to make our traveling cheaper and easier. But while Big Data – and its underlying technology, NoSQL– is becoming a buzzword in information systems, there has not been much investigation into the security implications.
Big Overview of Big Data
NoSQL is a common term to describe data stores that house different types of structured and unstructured data in high quantities. Due to this diversity, these data stores are not accessed through the standard SQL language. Up until recently, we often categorized our conception of data stores in two groups: relational databases (RDBMS) and file servers. The new kid in town, NoSQL, opened our minds up to a database that, unlike the conventional relational concepts, does not follow a structural form. The main advantages of this approach are scalability and availability of the data together with the flexibility of the data storage. With a technology where each data store is mirrored across different locations in order to guarantee constant up-time and no loss of data, these systems are commonly used to analyze trends. These systems are not suitable for financial transactions requiring a real-time update, but could be employed at a financial institution to analyze the most efficient or busiest branch.
NoSQL = NoSecurity?
Many may claim that the developers of different NoSQL systems have purposefully pushed security out from their systems. For instance, Cassandra has only basic built-in authentication procedures. This lack of security is considered their feature and built with the idea that database administrators do not need to trouble themselves with security concerns. Security, then, should be an offloaded process to be dealt with by a dedicated team. In any way we look at it, NoSQL provides its share of security challenges:
• Model maturity. Very little thought has been given to the security model of Big Data. Current standard SQL technologies include strict access controls and privacy management tools and it is not obvious that they are required in the NoSQL model. In fact, NoSQL should have its own new model. For example, column and row level security are much more important in NoSQL data stores than in the traditional SQL data stores. Further, as NoSQL allows a constant addition of attributes to the data records, forward-looking security becomes very important and organizations need to define the security of these future attributes.
• Software maturity. Looking back, database and file servers have seen their share of security woes over the years – and these are systems that have gained mileage over the years. The same cannot be said for NoSQL. Even if some lessons had been made, and some complexities have been removed from NoSQL data storages (while others have been added), we can certainly expect at least five years of vulnerability turmoil in those platforms- simply because it’s new code.
• Staff maturity. Even the most experienced DBAs are new to NoSQL. This means that these individuals will first focus on getting it to work – which is already hard enough – and maybe only later have time to consider security. If and when they get to that stage, they are bound to make integration mistakes.
• Client software. Since not enough security has been built into the server software, security needs to be built into the applications that are accessing the software. This in turn leads to a plethora of security issues:
o Adding authentication and authorization processes to the application. This requires more security considerations, which make the application much more complex. For example, the application would need to define users and roles. Based on this type of data, the application can decide whether to grant the user access to the system.
o Input validation. Once again, we are seeing issues that have haunted RDBMS applications return to haunt NoSQL databases. For example, at last year’s BlackHat conference, researchers showed how a hacker could use a “NoSQL Injection” to access restricted information. Although the schedule for BlackHat 2012 is still pending, it is yet to see what the future conference holds in store for NoSQL. In the meanwhile, you can catch some more vulnerabilities in “The Web Application Hacker’s Handbook: Finding and Exploiting Security Flaws” which contains a new separate chapter focused solely on the security of programming frameworks used for NoSQL.
o Application awareness. In the case where each application needs to manage the security, it will have to be aware of every other application. This is required in order to disable access to any non-application data.
o When new data types are added to the data store, the data store administrator would have to figure out what application cannot access specific data.
o Vulnerability-prone code. There are a certain amount of NoSQL products, but a magnitude more of applications and application server products. The more applications, the more code which in general is more prone to bugs.
• Data redundancy and dispersion. RDBMS security 101 talks about data normalization – storing a piece of data in a single location. But Big Data systems have totally shifted this paradigm. Inherent to these systems is the duplication of data to many tables in order to optimize query processing. Data is dispersed across different data repositories in different servers, in different parts of the world. It will be very difficult for organizations to actually locate and secure all these pieces of confidential information.
• Privacy. These issues will not be driven because of security issues, but because of legitimate use-cases for Big Data where data is correlated from different activities from different applications from different systems. Take for example Google’s change of privacy terms from just a couple of months ago which allows Google to consolidate their information across all services. These all have the potential to severely impact our ability, as individuals, to evade tracking by enterprises – even if we use multiple online identities. Ironically, these enterprises are now on a limb. On the one hand they are trying to keep this data within their boundaries, mainly due to proprietary and regulatory concerns. Recently however, scientists have started to raise concerns regarding this practice requesting enterprises to disclose these datasets to validate research results.
Sizing Up for Big Data
NoSQL is still in its infancy and unfortunately we cannot anticipate any general NoSQL security solutions within the next year or so. In the meanwhile, for organizations who want to take the leap forward they should first carefully choose their development teams. The teams should include industry veterans –very capable and experienced people who have already proved the deployment of new projects with a security mindset. Heavily relying on code reviews to ensure that the software is solid in terms of security is another must. Finally, it’s important to try and reduce as much as possible direct exposure of the platform to end-users through intensive input validation and network segregation. A Big Thanks We have only entered the era of Big Data. The cost of storage is decreasing, technologies are advancing to allow us to easily access and analyze the data, which has become the number one resource to enterprises and consumers. This can take us anywhere, and I’m looking forward to see where it does.
In the meanwhile, after nearly 40 (!) columns, it’s time now for me to take a break while I seek new opportunities. I’d like to deeply thank Mike Lennon – SecurityWeek’s editor – who listened to me ramble on security during BlackHat 2010 and decided that I could probably put all that into paper. I have yet to prove him right. And of course, Brian Prince who helped make the impossible happen by taking the garble and turning them into words.
Related: Survey: Few Know What “Big Data” Is, Yet Many Are Concerned About It