How new standards can help businesses operationalize analytics
Blog Posts:Mu Sigma
Published On: 26 February 2015
At most organizations, big data analytics breakthroughs come in a lab-like setting, with staff who are steeped in analytics knowledge and have access to specialized tools. But when the time comes to operationalize those breakthroughs – to make them an ongoing part of business operations – things often fall apart.
Why? In order to successfully operationalize analytics, decision scientists have to translate their work into formal business requirements. These become the IT team’s starting point for implementing the solution. This spawns a sequential, waterfall software implementation project that typically takes several months to execute. In the meantime, the fast pace of change in most businesses can render the original models or KPIs irrelevant.
So how can businesses accelerate the process? New standards are emerging to address this issue. One of the most promising ones we’ve seen in our work with clients is a standard developed by the Data Mining Group called PMML, which stands for Predictive Model Markup Language. PMML enables teams to port data transformations and predictive models between different environments, and it is becoming the de-facto standard to represent data mining models. PMML enables immediate deployment of models for decision automation without requiring code re-writes or provisioning additional infrastructure. It facilitates use of state-of-the-art technologies in both environments without having to sacrifice the requisites. PMML supports several predictive models such as Regression models, Naïve Bayes, Clustering models, SVM, Decision trees, neural networks, k-Nearest Neighbors, Time series and Random Forest, among others.
For example, one could develop a churn propensity model in R and export a PMML file that could be deployed on a production environment for scoring. The target platform can be a proprietary platforms like SAS or open source platforms like openscoring. These consume PMML and provide APIs that can be called from existing business applications (e.g., a CRM system) to re-calculate and update churn propensity scores in real-time or batch scenarios. Another example is an open source project Cascading Pattern, which allows PMML files to be deployed on Hadoop for scaling big data volumes.
Businesses struggling with operationalizing data analytics services – and in our experience, that’s basically all businesses – need to continually evaluate emerging standards like PMML. These new technologies have the potential to make a big impact on our ability to scale analytics efforts.