AI in Cyber - Judging the Efficacy of Machine Learning in Cyber Applications

AI in Cyber - Judging the Efficacy of Machine Learning in Cyber Applications

By Ken Phelan
Posted in Security
On March 20, 2019

I’m fresh back from RSA this week, which means that in the last 10 days I’ve seen approximately one billion new cyber security applications. Many of them make claims regarding AI and its value to their platform. It’s my job to make some judgement about the reality of that claim. Here’s what’s going on in the back of my head when someone tells me about their great AI.

First of all, when people talk about AI in this context, what they generally mean is machine learning. Machine learning is a process where computers take large amounts of data, tie it to specific outcomes and eventually “learn” to predict outcomes based on new data as it comes in. So for your AI to predict breaches, you need a lot of data leading up to breaches which you then correlate to the actual breaches. Theoretically, you learn to predict breaches from data.

On a high level, this is an excellent idea. Let’s dig into it.

First of all, how much data is enough? No self-respecting data scientist will ever give a straight answer to this question but as a general rule of thumb, it’s more than you have. Hospitals often ask me about applying AI to their operations. There just isn’t enough scale in an individual hospital to start this kind of project. You need data from thousands of patients (preferably more), not hundreds.

But data isn’t the real problem, everybody has data. The real problem is events. How many breaches can one client possibly see? You can’t correlate the data without a significant number of events.

So what makes for an effective cyber security machine learning story? First of all, it’s generally cloud based with multiple client data aggregation. No product will sit in your environment looking only at your data and outcomes doing machine learning. Sorry. Secondly, make sure the product has breaches and outcomes that it can map. Many vendors claim to see 98.7 percent of internet traffic or something to that effect. Data without outcomes is pretty much useless from a machine learning perspective. Without outcomes all you can map is frequency. “Sure are seeing a lot of port 80 on the internet today”. Not particularly interesting.

It's also important to note that most of what’s sold as IA in cyber is just plain I. When your SIEM or UBA product explains that they look for a certain behavior or pattern of behaviors and then alert, that’s not machine learning. That’s a programmer coding to look for a data pattern. That doesn’t make it bad; in fact, it’s probably the exact reason that you bought your SEIM or UBA, because they have a number of those kinds of features built in. It’s just not AI.

I guess that’s the bottom line. You want products that are intelligent. On some level it doesn’t matter if that intelligence comes from machine learning or other sources. But you should definitely look for and expect that intelligence out of the box. The story that a product will come into my environment stupid and gain intelligence over time just doesn’t play with me. Maybe someday, but not today.

Ken Phelan

Ken Phelan

Ken is one of Gotham’s founders and its Chief Technology Officer, responsible for all internal and external technology and consulting operations for the firm. A recognized authority on technology and operations, Ken has been widely quoted in the technical press, and is a frequent presenter at various technology conferences. Ken is the Chairman of the Wall Street Thin Client Advisory Council.