Data Protection

Securing Big Data: More Data Needs More Protections

Big Data Means Exposing More Data to Internal Misuse or Accidental Exposure, and Exposing More Data to a Successful Attacker

Jon-Louis Heimerl

June 18, 2012

Big Data Means Exposing More Data to Internal Misuse or Accidental Exposure, and Exposing More Data to a Successful Attacker

As these stories often go, a friend who recently started a new job asked me if I had any thoughts about the security of “Big Data.” My first thought was, to some extent, that big data is more a buzzword than anything else. Big data is just more data, so it faces the same types of issues as any data, right?

But this is an over simplification of the issues. To some extent, managing big data is kind of like having kids. Two kids are not twice as much work as one kid – it is more like an exponential relationship. Two kids is work². As your big data store grows, do the potential control issues grow at least as fast?

Importance of Protecting Big Data Obviously “sizing” is the big issue. But, beyond just “more,” you have to appreciate that not only means more data, but it means more complicated data, more sensitive data, and a related chance for exposure due to errors, or vulnerabilities. It also means exposing more data to internal misuse or accidental exposure, and exposing more data to an attacker who succeeds in penetrating your perimeter. At a very basic level, if a potential attacker knows that you have large volumes of high quality data, it may very well elevate your attack profile, since you are more likely to be viewed as an attractive target. You are accounting for that when you do your organizational risk analysis, right?

But, when we talk about big data we are not really just talking about volume or quantity. Most people who have the big data discussion will talk about Velocity, which is a function of the speed at which data enters your environment. Personally, I find the concept of Variety more interesting. What kind of data is it? It is much easier to manage large volumes of all PHI data than it is PHI data, PCI data, medical telemetry data, and demographic data all mixed together. The more data is contextually alike can have a huge impact on how that data is managed. How similar/dis-similar is the data? It is much easier to manage large volumes of database files than it is a complex combination of database files, flat text files, system logs, application specific data, customized format data, graphics, etc. The more diverse the specific pieces of information, the more complex the infrastructure required to support it.

IT implementations supporting big data have a whole host of throughput, availability, and data access controls that are more related to the operations supporting the data than to security of that data. Scalable infrastructures, parallel processing, data replication, and massive in-memory processing are just a few of the discussions to be had about big data operations. But, for us security geeks, what does a classical view of big data tell us about security?

1. Understand the data. Obviously, this is the best place for me to do my standard “Have you done a Business Impact Assessment (BIA)” rant. The question is pretty simple – you have lots of data, but do you know “what” you have? Do you have PCI data, or PHI data, or private corporate data, or private customer/consumer data? Before you worry about anything else, you have to understand what is actually in the big data. For one thing, understanding your data helps you add context to the data early in the process of managing it, and, probably just as importantly, lets you identify aberrant data as you see it. Cleansing big data of irrelevant, erroneous, or toxic data is not a task that you should take lightly.

2. Understand the size constraints. Yes, availability really is a security issue, so make sure you size your infrastructure appropriately. Is your network fast enough to support the throughput demands of the data you are managing, both Velocity and consumption? Do you have enough CPU capacity that you can support the movement and management of the data through any required applications, databases, and storage devices? Do you have enough disk space that you can easily store the data? Do you have a robust enough drive management process that you are not single threading a drive and trying to write too much active data to the same storage device(s)? These are all the standard IT issues on the functions required to just manage high volumes of data. Appreciate that higher volumes of data are, in many ways, just harder to protect, and that your solution has to scale with the data and demands on the data. Can you encrypt petabytes or exabytes of data in a real-time enough manner to make the data consumable while meeting operational constraints, including meeting timing requirements?

3. Understand the timing constraints. Again, this is highly reliant on the fact that you have done your job in step #1. But, timing is tres important. Does your data have a lifespan? In more straightforward English, clinical medical information obviously has a more sensitive duty lifespan than typical manufacturing metrics. Said even more plainly, some data is not as valuable if it cannot be managed and analyzed in a timely manner. Do you think the Phalanx anti-missile system would have any value if it took five minutes to evaluate a threat and respond? (the correct answer is, of course, “no”) This obviously drives IT capacity and throughput requirements, and supports availability security objectives. Sometimes it just does not matter, but in many instances, old data may not be relevant, so lifespan of the data is often more important than we often appreciate.

Advertisement. Scroll to continue reading.

4. Understand the appropriate level of data context. This is a direct expansion of the above three issues, and really what helps us make sense of the big data. When the data has context we can manage it as information as opposed to bits and bytes. Is it PHI data, or PCI, or personal information or something else? Contextual data can be mined for details, correlation information, and actively managed with meaning, instead of just “data.” Treating the data in an intelligent manner also lets us treat data with similar context in a similar manner – we build contextual relationships in the data.

Security for Big Data And there’s the rub. While big data may be “data”, we really don’t want it to be “data” as much as we want it to be “information” (which is data with context). The big data is more valuable as a source of analytics about the data, than it just is, “the data.” That is why context and correlation are so very important when you talk about big data – we need to make the data intelligent by using the available context to help ensure that we can consume relevant information. So, you are not just talking about “medical data,” you are talking about unique patient identifiers, pre-existing conditions, allergies, current prescriptions, contra-indications, and a whole host of demographic information about the patient as well as the provider. You are not just talking about “manufacturing data,” you are talking about specific inventory items, in-stock, re-order/manufacture points, required supplies, vendors, price of goods, selling price, buyer (and all their information, like industry, geographic location, volumes, discounts, and specific/customized delivery contracts). You are not just talking about security event data, you are talking about the detail that your IDS and internal systems are reporting attacks against a system named Mordor, which is a Windows Server 2008, R2 SP1, running Oracle 11g Enterprise, sitting in the Princeton, N.J., data center in row 3, rack A12, and it holds all of your clinical patient records, so, does indeed fall under HIPAA and HITECH.

In some ways, this all just exacerbates the problem. We are taking large quantities of potentially valuable, dynamic and complex data, and attaching contextual analytics to that data – assigning even greater value to the big data because of the context, and the information we can glean from the data. The fact is the analytics themselves, and even the process used to create those analytics, are also highly valuable. This highlights the need to protect the analytics modeling and results, along with access to them as well.

After all, without the relevant intelligence we can get from good analytics of the big data, it really is just a bunch of data.

Related Reading: Examining The Security Implications of Big Data

Written By Jon-Louis Heimerl

Latest News

Click to comment

CIEM Chat: How to Reduce Cloud Identity Risk

March 26, 2024

Join the session as we discuss the challenges and best practices for cybersecurity leaders managing cloud identities.

Virtual Event: Ransomware Resilience & Recovery Summit

April 17, 2024

SecurityWeek’s Ransomware Resilience and Recovery Summit helps businesses to plan, prepare, and recover from a ransomware incident.

You Against the World: The Offenders Dilemma

Foreign attackers have many more toolsets at their disposal, so we need to make sure we’re selective about our modeling, preparation and how we assess and fortify ourselves. (Tom Eston)

Why Intelligence Sharing Is Vital to Building a Robust Collective Cyber Defense Program

With automated, detailed, contextualized threat intelligence, organizations can better anticipate malicious activity and utilize intelligence to speed detection around proven attacks. (Marc Solomon)

Know Your Audience When Speaking to Security Practitioners

How can security practitioners make sense of the vendor landscape and separate those who talk a good game from those who can execute, perform, and solve real problems for enterprises? (Joshua Goldfarb)

Cybersecurity Mesh: Overcoming Data Security Overload

A significant cybersecurity challenge arises from managing the immense volume of data generated by numerous IT security tools, leading organizations into a reactive rather than proactive approach. (Torsten George)

The OODA Loop: The Military Model That Speeds Up Cybersecurity Response

The OODA Loop can be used both by defenders and incident responders for a variety of use cases such as threat assessment, threat monitoring, and threat hunting. (Etay Maor)

Application Security

Source Code Security Firm Cycode Launches With $4.6 Million in Funding

Cycode, a startup that provides solutions for protecting software source code, emerged from stealth mode on Tuesday with $4.6 million in seed funding.

Eduard KovacsSeptember 24, 2019

Quantum computing and the cryptopocalypse

Data Protection

Cyber Insights 2023 | Quantum Computing and the Coming Cryptopocalypse

The cryptopocalypse is the point at which quantum computing becomes powerful enough to use Shor’s algorithm to crack PKI encryption.

Kevin TownsendFebruary 2, 2023

Topics for 2023 Cybersecurity Insights Series

CISO Strategy

SecurityWeek Cyber Insights 2023 Series

SecurityWeek spoke with more than 300 cybersecurity experts to see what is bubbling beneath the surface, and examine how those evolving threats will present...

Kevin TownsendFebruary 13, 2023

Incident Response

Amazon’s Shuttering of Alexa Ranking Service Hits Cybersecurity Industry

Amazon has shut down Alexa.com.

Eduard KovacsMay 6, 2022

CISO Conversations

CISO Conversations: HP and Dell CISOs Discuss the Role of the Multi-National Security Chief

Joanna Burkey, CISO at HP, and Kevin Cross, CISO at Dell, discuss how the role of a CISO is different for a multinational corporation...

Kevin TownsendMay 10, 2023

Artificial Intelligence

AI Helps Crack NIST-Recommended Post-Quantum Encryption Algorithm

The CRYSTALS-Kyber public-key encryption and key encapsulation mechanism recommended by NIST for post-quantum cryptography has been broken using AI combined with side channel attacks.

Kevin TownsendFebruary 21, 2023

Risk Management

Cyber Insights 2023 | Supply Chain Security

The supply chain threat is directly linked to attack surface management, but the supply chain must be known and understood before it can be...

Kevin TownsendFebruary 2, 2023

CISO Conversations

CISO Conversations: Code42, BreachQuest Leaders Discuss Combining CISO and CIO Roles

In this issue of CISO Conversations we talk to two CISOs about solving the CISO/CIO conflict by combining the roles under one person.

Kevin TownsendMarch 1, 2023

SECURITYWEEK NETWORK:

ICS:

SecurityWeek

Data Protection

Securing Big Data: More Data Needs More Protections

More from Jon-Louis Heimerl

Latest News

Trending

CIEM Chat: How to Reduce Cloud Identity Risk

Virtual Event: Ransomware Resilience & Recovery Summit

People on the Move

Expert Insights

You Against the World: The Offenders Dilemma

Why Intelligence Sharing Is Vital to Building a Robust Collective Cyber Defense Program

Know Your Audience When Speaking to Security Practitioners

Cybersecurity Mesh: Overcoming Data Security Overload

The OODA Loop: The Military Model That Speeds Up Cybersecurity Response

Related Content

Application Security

Source Code Security Firm Cycode Launches With $4.6 Million in Funding

Data Protection

Cyber Insights 2023 | Quantum Computing and the Coming Cryptopocalypse

CISO Strategy

SecurityWeek Cyber Insights 2023 Series

Incident Response

Amazon’s Shuttering of Alexa Ranking Service Hits Cybersecurity Industry

CISO Conversations

CISO Conversations: HP and Dell CISOs Discuss the Role of the Multi-National Security Chief

Artificial Intelligence

AI Helps Crack NIST-Recommended Post-Quantum Encryption Algorithm

Risk Management

Cyber Insights 2023 | Supply Chain Security

CISO Conversations

CISO Conversations: Code42, BreachQuest Leaders Discuss Combining CISO and CIO Roles

SECURITYWEEK NETWORK:

ICS:

More from Jon-Louis Heimerl

Latest News

Trending

Daily Briefing Newsletter

CIEM Chat: How to Reduce Cloud Identity Risk

Virtual Event: Ransomware Resilience & Recovery Summit

People on the Move

Expert Insights

You Against the World: The Offenders Dilemma

Why Intelligence Sharing Is Vital to Building a Robust Collective Cyber Defense Program

Know Your Audience When Speaking to Security Practitioners

Cybersecurity Mesh: Overcoming Data Security Overload

The OODA Loop: The Military Model That Speeds Up Cybersecurity Response

Related Content

Application Security

Source Code Security Firm Cycode Launches With $4.6 Million in Funding

Data Protection

Cyber Insights 2023 | Quantum Computing and the Coming Cryptopocalypse

CISO Strategy

SecurityWeek Cyber Insights 2023 Series

Incident Response

Amazon’s Shuttering of Alexa Ranking Service Hits Cybersecurity Industry

CISO Conversations

CISO Conversations: HP and Dell CISOs Discuss the Role of the Multi-National Security Chief

Artificial Intelligence

AI Helps Crack NIST-Recommended Post-Quantum Encryption Algorithm

Risk Management

Cyber Insights 2023 | Supply Chain Security

CISO Conversations

CISO Conversations: Code42, BreachQuest Leaders Discuss Combining CISO and CIO Roles