The tech news has been abuzz recently with stories about how machine learning is making impressive strides in areas like autonomous vehicles, face recognition, and language translation. You can now be automatically tagged in Facebook photos of events you don’t even remember attending. But, while these are impressive achievements that may enhance some of your personal experiences, they have little bearing on your business. You might be surprised to find out these are just some of the more exciting applications that have caught the media’s attention, but the truth is that there are very few businesses that can’t benefit from machine learning.
The term “machine learning” is a broad concept covering any algorithm that analyzes past information and uses it to add context to current information. A simple linear regression is a form of machine learning: it analyzes past data to predict future data and to provide context for the current values. In a sense, the machine has learned how this metric “behaves” and can put parameters around the current value (standard deviation, probability bands, etc.), adding knowledge that isn’t obvious if all you have is the single, current metric value. Knowing these parameters also allows the computer to examine a period of data and detect anomalous values. The complexity of statistics and probability and how they are applied to business data make this a rich field for data analysis and has given rise to the position of “data scientist”. The data scientist helps the business extract relevant data and analyze it in a way that returns actionable results.
There is nothing new about statistics and probability. Many of us suffered through those classes in school (the data scientist was the kid who kept correcting the teacher). What is new is the availability of vast stores of data collected from business processes and powerful computers that can dig deep and reveal facts about the business that are often completely unexpected. For the business, big data and machine learning go hand-in-hand.
So if statistics and probability aren’t new, what is new in machine learning? The breakthrough in the field that has garnered so much attention recently is the success of neural networks. These software frameworks attempt to replicate some of the function of the brain by breaking tasks into small subtasks governed by adjustable parameters. The subtasks analyze the data, and reach a conclusion. When training a neural network, the outcome of the analysis, success or failure, is fed back to the network so that it can adjust the parameters and try again (reinforcement). This process is repeated over and over, further tuning the parameters to minimize the error in the model.
The mathematics at the core of neural networks was worked out in the 1980s and 1990s, but it wasn’t effective until computers could handle much larger models with the processing speed to run them in a reasonable amount of time. The biggest advance came with the adaptation of the frameworks to use the fast mathematical processing cores on graphics processing unit (GPU) co-processors. These cards have advanced from dozens of cores to thousands of cores on a single PCIe card, and software libraries have been developed to offload neural network calculations onto the cards. With this increased processing power, the frameworks have expanded to support multiple layers of “neurons”, each layer specializing in a particular task and passing down its findings as context for the next layer. These are called deep neural networks (DNNs) or convolutional neural networks (CNNs), or more generally, deep learning.
It is deep learning that is at the root of the sensational stories about machine learning. Its ability to approach problems that are too complex to write a traditional algorithm to solve is revolutionary in artificial intelligence. Imagine trying to write rules to identify a dog in any photograph regardless of the dog’s breed, size, or orientation in the picture—assembling that many rules would take too long to be practical. Deep learning analyzes thousands or millions of pictures of dogs, learning from each one, and building a model of how to identify a dog, effectively writing its own generalized dog-detection algorithm. Each layer of the CNN might look for progressively more complex structures. The first layer might find edges, the next layer detects simple shapes, the next layer responds to combinations of shapes such as a pair of circles for eyes, etc. The output of each layer is added to the sum from above, and at the end a classification emerges. If it is correct, the weights of the parameters used are strengthened; if it is false, the parameters are changed slightly, and then the process runs again. Thousands of iterations over thousands or millions of images repeat until the misclassifications are minimized. All the network needs is a curated collection of photos that sufficiently covers the range of images it will be expected to analyze. That’s where the data scientist comes in: the output of the neural network is only as good as the training data it is fed and the convolutional layers that are designed. When it’s done correctly, the results can be truly amazing.