A suite of popular statistical algorithms, integrated in the form of R
packages, for Big Data analysis. Written in MapReduce, muHPC™
algorithms leverage the power of parallel computation.
Want to accelerate your Big Data analysis?
Expedite your advanced Big Data analytics initiatives
Simple, easy-to-use functionalities for complex processes
Gain access to the latest in Big Data research through a flexible support model
Before analyzing any data, it is crucial to first understand that data. At the most basic level, this entails knowing what rows and columns exist in the data, and what values are missing. Going beyond this, an analyst might wish to find out how columns in the data are distributed – for continuous variables, measures such as the mean, median and mode, the location of quintiles/quartiles/percentiles of data, etc
The muEDA package lets users do all this and more on their distributed Hadoop data, in a manner native to Hadoop, using Java MapReduce, and it’s all as simple as calling a function in R.
Being able to find clusters in your data is important be it for customer segmentation or market basket analysis. The muKMeans package allows users to use the k-means algorithm, a centroid-based method that clusters n data points into k clusters by minimizing the within-cluster sum of squared errors. muKMeans is built on Java MapReduce and can run natively on Hadoop.
Building regression models for linear and non-linear relationships in data is an analytical staple, be it logistic regression for clickstream attribution or a GLM model with a poisson link for insurance claims frequency. The muGLM package allows users to build Linear and Generalized Linear Models with their data on Hadoop using the rmr package.
Recommender systems can be used to personalize user experience. From recommending restaurants based on customer eating preferences to suggesting movies based on customer viewing history, recommender systems find application in a variety of areas including music, news, books, search queries and social media. The muRecommender package is built to run natively on Hadoop, using Java MapReduce. It leverages Alternating Least Squares, Singular Value Decomposition and other complex algorithms to recommend items based on latent factors.
Being able to categorize customer purchase intent, customer sentiments and customer complaints is important for marketers to personalize promotions and target the right audience. Mu Sigma’s muRandomForest package is built on Java MapReduce and runs natively on Hadoop. It constructs multiple decision trees while training a model and uses an ensemble learning method to make predictions.
Hidden Markov Models are used to model Markov processes with unobserved or hidden states. They find use in speech and gesture recognition as well as in making stock predictions based on latent market factors. Built on R and Spark, the muHMM package identifies hidden states underneath recorded observations to build models that help in understanding the transitions between hidden states.