Thursday, January 08, 2015

Securing Big Data - Part 3 - Security through Maths

In the first two parts of this I talked about how Securing Big Data is about layers, and then about how you need to use the power of Big Data to secure Big Data.  The next part is "what do you do with all that data?".   This is where Machine Learning and Mathematics comes in, in other words its about how you use Big Data analytics to secure Big Data.

What you want to do is build up a picture of what represents reasonable behaviour, that is why you want all of that history and range of information.  Its the full set of that across not single actions but millions of actions and interactions that builds the picture of reasonable.  Its reasonable for a sys-admin to access a system, its not reasonable for them to download classified information to a USB stick.

A single request is something you control using an ACL, but that doesn't include the context of the request (its 11pm, why is someone accessing that information at all that late?).

You also need to look at the aggregated requests - They've looked at the next quarters sales forecast while also browsing external job hunting sites and typing up a resignation letter.

Then you need to look at the history of that - Oh its normal for someone to be doing that at quarter end, all the sales people tend to do that.

This gives us the behaviour model for those requests which leads to us understanding what is considered reasonable.  From reasonable we can then identify anomalous behaviour (behaviour that isn't reasonable).

No human defined and managed system can handle this amount of information, but Machine Learning algorithms just chomp up this sort of data and create the models for you.  This isn't a trivial task and its certainly massively more complex than the sorts of ACLs, encryption criteria and basic security policies that IT is used to.  These algorithms need tending, they need tuning and they need monitoring.

Choosing the right type of algorithms (and there are LOTS of different choices) is where Data Scientists come in, they can not only select the right type of algorithm but also tune and tend it so it produces the most effective set of results consistently.

What this gives you however is business centric security, that is security that looks at how a business operates.  Anomalous Behaviour Detection therefore represents the way to secure Big Data by using Big Data.

The final challenge is then on how to alert people so they actually react.

No comments: