Hi, my name is Mike, and I’m a big data skeptic – especially when it comes to security. It’s pretty clear this puts me in the minority, especially among the noise-makers – the Rolling Thunder Big Data Revue is in full swing, passing through airports and filling billboards all over town.
I don’t mean to be Mr. Grinch, but I can’t shake the feeling that our industry is like a dog chasing a car – heaven knows what we’ll do if we catch it! I don’t mean to overplay my hand. Big data isn’t a bad idea – I just see a mismatch between the hopes and the likely near-term delivery. Just look at recent history – it’s not saying all that much to suggest that SIEM deployments haven’t been the source of joy and endless security boon that was hoped for. (Too much doggle in the boon, perhaps? No, that’d be a step too cynical.)
Here’s the problem: data mountains need data mountaineers. The data won’t analyze itself.
At the risk of stating the obvious, we’re in an industry with negative unemployment – the skills shortage pervades everything. We can build out sensors to gather more and more data, because it’s “obvious” that more data is better, but we rapidly build mountains bigger than we can climb. It’s not the storage – there’s no problem there, indeed, there’s a whole industrial complex who are nothing short of delighted to facilitate our spooky desire to keep all the bits. It’s the thinking.
Academically, I think of the mania a while back for “genetic algorithms”. The appeal is obvious – we know evolution works, so let’s steal that idea, mutating algorithms and getting them to compete in little computing arenas we can watch but don’t have to drive. There’s no denying that it’s sexy (in a manner of speaking – sidebars possible on what it means to “breed” algorithms, but that’s a different topic). Better yet, it appeals to the “inspired laziness” of the best technologists – in theory, I get the machines to do the work, and I get to sit back and look like a genius. It looks like a no-brainer – sexy, easy, let the machines do the work. Unfortunately, it can too easily slide into being a “no brainer” of the other sort. Genetic algorithms have produced a few modest successes, but no Nobel Prize breakthroughs. It’s just not that easy.
For security, we can’t just assume that if we pile up the data, someone else will figure out the analytics needed to find the signal we need. That’s what keeps me skeptical – I’m still hearing too much talk about the ability to store and retrieve, and not enough about what we’ll do with the data pile. After all, what do we even mean by “big data”? As far as I can tell, “big” in this context just means “of a size humans can’t handle”. We’ve moved the bar up quite a bit over the last few years, giving human analysts better and more scalable tool belts so they can handle tougher and tougher problems. Those promoting big data for security analytics seem to be specifically targeting the stage after that – the world where humans can no longer do the work. Fair enough, but what replaces them exactly?
As we’ve found in other areas, algorithms are rugged, but dumb. (Think of them as the male models of the data analytics world.) They can crunch on immense amounts of data, but the technical accomplishment of crunching is not the point – targeting the search is. By way of analogy, consider the Mars rovers. It’s prohibitively expensive to send people to Mars today, so we send robots. But let’s be careful – exactly how autonomous are these “robots”? Loosely speaking, they can operate for about a day without checking back in with human operators, here on the blue planet. The rovers are smarter than R-C cars, but not truly autonomous in a way that would allow us to cut the cord. (Visiting Europa, while a thrilling prospect, will require us to cross the minor technical speed bump of true autonomy, and as a result, I’m afraid I’m not holding my breath waiting for a free-diving autonomous sub. It’s another of those problems, like getting robots to walk, that proves to be harder than we initially thought.)
People ask me if the security analytics we do is a form of big data. I generally say no – we’re only “medium data”. By that, I’m joking about disk sizes (you don’t need CEO sign-off for the disk array for my kind of analytics), but also suggesting that there are interesting problems to solve within the existing data silos, before we build the super-silo where we pile everything into a great mountain and just assume that doing so will bring about spontaneous enlightenment.
I’m not saying big data is a bad path for the industry – my point is just that the road looks longer and harder than the hype would indicate. People can get pretty frustrated when promises go unfulfilled. (Where are the jet packs and flying cars we were promised?) I do believe in security analytics, and I do believe in scaling up our search for the clues that allow us to prevent, detect, respond and predict. (Tip of the hat to Brett Wahlin for the quadfecta.) I just wish I heard more focus on the goals we will unleash our robot mountaineers to accomplish, with some sense of how they will achieve them, and not so much about the girth of the rock pile.