Adding Context Around Data Can Make a Huge Difference in How you Manage and Protect Information.
Why is context important? Because without context, information is not really information; it is just data. Data is just “stuff”, while information is what that stuff means.
Would you like some plain language context for my point?
African or European?
What are you, some kind of Pyjak?
My short list above just has “stuff”, unless you know how the data is used – unless you have context. Is “42” simply 6x7, or is it really the answer to life, the universe and everything? And if that is still insufficient, you will need the context of Douglas Adams (just Google “42”). If “African or European” are just words to you, you will need to the context of Monty Python. (Google “African or European”). “Pyjak” is probably the most obscure, and you won’t know it unless you know what video game I am currently playing (Google Pyjak).
So, you have three pieces of data; “42”, “African or European”, and “Pyjak”. And those pieces of data do not tell you much if you consider them as just data points. If you add context to each of them, you know more about what those data points mean. If you have that context, chances are that you at least smiled internally for each one on the list, but if you had no context, you probably read the list and think “huh?”
Your management of security data follows the exact same rules. Data is more valuable if viewed in context. If you have an IDS reporting that IP 188.8.131.52 is experiencing a port scan that is simply a piece of data. You still have to figure out what that data means to you. Is it important or is it noise? You need that context.
Cool Data Rules
That is where your Business Impact Assessment (BIA) comes in. Basically, you have to identify your cool data and the systems that support it. The rest of your information security program follows the first step. What does a BIA tell you?
1. What is my cool data? What kind of internal data do I generate and use? What kind of client data to I gather and use? Do I have PHI data or financial data for credit cards or bank accounts? What data do I have that is not as important?
2. What are my regulatory requirements? If you don’t know that you have PHI, how would you know you are responsible for HIPAA and HITECH compliance? If I don’t have credit card data, how would you know if you are responsible for PCI compliance? I can tell you for a fact that it is a pretty sickening feeling for the CISO of a retailer to discover for the first time, in the middle of a PCI audit, that his company does, in fact, store track two and three data.
3. Where is my cool data? Where do I process and store my cool data? What are my alternate site and backup options for resilient operations? Everything else being equal, would you rather have your cool data stored in Miami or Denver? (For the safety of the data, not for purposes of a beach/ski vacation).
4. What systems support my cool data? Not just the fact that you have a computer, but the fact that it is a Windows server, and which version of operating system and patch level it has. What application software and utilities support your cool data, with, again, version numbers?
The context gives you information on how to build your entire security program. You go from supporting “data” to supporting “PCI data”, along with all that means to be PCI compliant. Simply put, you know that the PCI data at BigBlueBank, and the environment that supports it, is going to receive more advanced security controls than the inventory control system at Joe’s Hat, Boot, and Shoe Company. Well, at least it probably deserves more advanced controls. While the two data sets are both important to their respective companies, the specific regulatory requirements placed on the PCI data are likely to result in enhanced controls at BigBlueBank. The number of size 10 boots in stock is just not as sensitive as credit card data. PCI has elevated requirements for a variety of technical controls, including data segregation and encryption, as well as incident response, policy, procedure, and training. If you add St. Mary’s Hospital to the mix, you can imagine that their trauma center has slightly stronger availability/resiliency requirements than they do at Joe’s Hat, Boot, and Shoe Company. The context within which the data works shapes the entire environment.
In the same way, the supporting information adds context to the raw security data. Your IDS alert that was previously just “data” gets a whole new meaning if you have the context to know whether 184.108.40.206 is the system that holds your card database, or is an internal website that has no real value. Without security context, you might not know that you have an alert so you can gather that you are being attacked. While, with good context, you can tell that the server being attacked is named “Mordor”, and is a Windows Server 2008, R2 SP1, running Oracle 11g Enterprise, that is sits in the Princeton, N.J., data center in row 3, rack A12, and it holds all of your clinical patient records, so falls under HIPAA and HITECH. That information, and context, should make a huge difference in how you manage and protect the information.
What’s up, Joe?
Joe’s Hat, Boot, and Shoe Company is a real company. They don’t make hats, boots, and shoes, but they are a manufacturing company. Their focus is on operations management – produce and ship product in a timely and efficient manner, while maintaining quality, and minimizing excess inventory in finished product and unused supplies. Nowhere in their priority do they have any focus on “Information Security”. Their most effective security measures are the things that they do to keep the floor running. To that end, their command and control environment monitors performance as well as basic security functions. Their IDS system reports alerts by IP address of all systems in the environment. They have a completely different system that monitors performance metrics and status. In a heartbeat, they can tell you how many hats machine FLOOR301 has produced in the last hour.
However, if their IDS reports an alert, they have to go check. Many of the IT staff know which IP is which system, but for the most part, they are all just “systems”. As a result, every alert is initially treated with the same level of urgency, whether it actually comes from a critical system or not.
Three consecutive “login failure” messages come in as low/moderate importance, but the context matters:
• If the three messages come from external client server world01 that has thousands of users. On face value, this is not really a concern.
• If the three “login failure” messages come from the operations assembly server FLOOR301, which only talks to an internal application. The user ID and password for the assembly server is stored in the application’s database. In Joe’s case, this means that the “login failure” is one of two things: either the application server data is corrupt, or someone is trying to break into FLOOR301, both of which are bad.
Without context, the alerts from FLOOR301 and world01 are equal. With context, they would be able to ignore the alerts for world01 and immediately escalate FLOOR301. As it is, Joe’s IT/IS staff is overworked. Consider that they have thousands of internal servers and workstations, and a couple dozen assembly line systems. More often than not, alerts from critical assembly systems get lost in the sheer flood of data. Staff spends time looking at massive amounts of trivial data, and all too often stumble across the important stuff in the process of flushing the tripe from their queue.
If Joe’s team can add even a little context to their data management process, they can gain better intelligence about what they are seeing. Adding context around data makes it information, and with that context, the information is easier to manage. Alerts are more meaningful. Critical issues can be identified more quickly, then managed and resolved more efficiently. Staff has a work load that is more clearly prioritized, and non-critical issues can be dealt with in a more time-appropriate process.
And, one of the most valuable things about contextualized data is that it can be used to add context to other data. Earlier in this article, we talked about 42, African or European, and Pyjaks. I included some context about each of them. When looked at in context, you can gather that I read, that I watch irreverent British comedies, and that I play video games. If you look at the common context of all three, you can infer other information about how I spend my time as well as my social-economic status (I can, for example, afford a gaming computer good enough to play Mass Effect 3 at the highest quality settings). But, that sounds a lot like “correlation”, which is a subject for another time.