Machine Learning is a broad concept covering algorithms that can receive input data and use statistical analysis to predict an output value within an acceptable range. It analyzes past data to predict future data and to provide context for the current values. In a sense, the machine has learned how this metric “behaves” and can put parameters around the current value (standard deviation, probability bands, etc.), adding knowledge that isn’t obvious if all you have is the single, current metric value. Knowing these parameters also allows the computer to examine a period of data and detect anomalous values.
That all sounds very technical, but you may be wondering how it relates to your business. Getting business value from these techniques requires identifying data in your organization that can be analyzed to present new insights into products, customers, employees, or business processes. Fortunately, there are examples available in most industries, and those examples may be directly applicable or they may suggest similar areas of investigation for your business.
Here is a quick summary of some of the areas where machine learning is being used today:
- Brand visibility analysis: analyze photos, videos, and web pages for logos or product placement
- Content analysis: find trends in social media posts or photos
- Sentiment analysis: detect changing public opinion of a brand in social media posts based on key terms
- Medical imaging: amazing strides are being made in analysis and detection of abnormalities in X-Rays, CAT scans, MRIs, etc.
- Diagnosis (decision support): made famous by IBM’s Watson, deep learning can apply vast stores of medical diagnostic data to recognize obscure ailments
- Asset utilization: hospitals and medical centers have a limited inventory of expensive diagnostic equipment; machine learning analyzes how the equipment is being used and can suggest relocating existing assets or purchasing more
- Pharmacological research: neural networks have been applied to protein folding, molecular energy models, and many other aspects of drug exploration
- Portfolio management: although past performance is no guarantee of future returns, a rigorous statistical analysis with machine learning can inform trade decisions
- Algorithmic trading: in this case, the model is actually making the trades using a deep learning model trained on historical trades
- Fraud detection: financial institutions are always looking for a better way of detecting fraud without inconveniencing their honest customers; machine learning can compare a single customer with a similar class of customers and look for anomalies
- Loan/insurance decisions: the same type of machine learning model can compare a loan applicant to a similar pool of applicants and look at the historical outcomes of those loans
- Inventory management: reallocate inventory in stores or warehouses based on trends in buying behaviors or weather predictions
- Recommendation engine: online stores in particular try to suggest items based on a customer’s past purchases
- Buyer preference tracking: similar to the recommendation engine, this tracks customers purchases online and in-store and targets customers for discount mailings and other offers
- Sales associate placement: in a brick and mortar store, analysis of customer shopping patterns can suggest the best place for sales associates to make themselves available
- Rapidly respond to changing market conditions: customer purchases can be analyzed to look for trends across the market and suggest new products that would fit the latest customer preferences
Many of those examples rely on data collected from business processes (web store, financial history, patient symptoms) that is essentially text-based. So the first step is to find a convenient tool to ingest large amounts of text and then extract and analyze subsets of data. This is what Splunk, the platform for machine data, was designed for. It scales horizontally to index and search any volume of text up to hundreds of terabytes per day. It has built-in commands for trending and anomaly detection, or you can add the free Splunk Machine Learning Toolkit to add more sophisticated algorithms such as forecasting and data clustering. If you have already collected data in a Hadoop environment but are finding it difficult to sift through all that information, Splunk Analytics for Hadoop allows you to search and extract that data using the agile, interactive Splunk interface without the data leaving HDFS. If you need the power of Tableau or QlikView or some other analysis and visualization tool, the Splunk ODBC Driver allows dynamic access to Splunk data from any ODBC-compatible program.
If you want to use deep learning to explore your data, you can use the above tools to extract the subset of data for analysis. All of the major deep learning frameworks are open source, so you can pick the one that you think is most appropriate. Although you can run the framework on any hardware, having a GPU can accelerate the compute-intensive learning phase by 50 times or more. Start with an NVIDIA consumer graphics card like GeForce or Quadro if you’re beginning on a limited budget. If you want more reliability in the GPU than consumer grade, look at the NVIDIA Tesla cards that are engineered to run 24x7 in data center conditions.
As your models grow, you are going to need more processing speed. When a model runs a few iterations on your desktop workstation overnight and looks promising, it’s time to move it to a more powerful server and let it run for a few thousand iterations. The right hardware for those tasks is NVIDIA DGX. DGX Station is a workstation tower with 4 NVIDIA Tesla V100 GPUs and the DGX-1 is a 3U rackmount server with 8 Tesla V100 GPUs. That’s some very powerful hardware: the DGX-1 using those Tesla V100s has 40,960 cores that can theoretically handle 960 TFlops in a 3U package, which can save a lot of power, cooling, and space in the data center.
But just as important is the DGX software package. Much of a data scientist’s time is taken up by maintaining the neural network frameworks, and all of their dependencies, as they are updated often. On top of that, optimal results are only attained if you are using versions of the tools optimized for your specific hardware. So instead of tuning models, the researcher spends time searching for and installing software and drivers to stay up-to-date. With your annual DGX maintenance, you get access to the NVIDIA Cloud Management Portal:
- A cloud environment where you manage all of your DGX hardware
- Submit jobs on the portal and let the software handle scheduling and allocation of the GPUs across one or more DGX devices
- NVIDIA maintains all of the popular deep learning frameworks in Docker containers, optimized for DGX hardware, in a cloud repository accessible by your DGX devices
- The appropriate Docker container is delivered on the fly to the DGX device and your job runs
- A DGX-1 or DGX Station becomes a stateless device that can be deployed or rebuilt in about an hour with no deep learning expertise
It’s an efficiency win-win scenario delivered by fully managed software maintenance and unprecedented processing power. To scale your models even further, simply add more DGX devices to the managed cluster and let the scheduler handle the rest.
News of the remarkable capabilities of machine learning is more common every week; a simple search will find you examples from your specific industry. Gotham Technology Group can help you with selecting and sizing the tools you need to begin using machine learning to unlock the insights in your business data.