Built a scalable analytics framework to anticipate effects of weather on sales for a leading home improvement retailer

What We Did: Developed a scalable analytical framework to identify and quantify effects of weather on sales of different product categories for better inventory management
The Impact We Made: Highly accurate fulfillment forecast which reduced stock outs by 12% and also decreased stock holding costs
Summary – Analytics at scale
Mu Sigma helped a leading home improvement retailer build a scalable analytics framework to anticipate the effects of weather on sales of different product categories for better forecasting and inventory management. Our client wanted to reduce or eliminate instances of weather-related stock-outs or overstocks.
The Mu Sigma solution provided a highly accurate forecast of demand and quantity to stock, which reduced stock outs by 12% and decreased stock holding costs as well. The technology platform on which we constructed the solution enabled the client to scale it across geographies with minimal cost and quick turnaround.
About The Client – A leading retailer
The client is a Fortune 500 retailer, operating more than 1,500 stores across multiple countries.
The Challenge – Massive data volumes
The complexity of this problem was accentuated by the volume of data it involved, along with a large number of variables: 200+ micro-regions, 5000+ product categories per region, and many independent weather-related variables (temperature, humidity, precipitation, etc.). In addition to these, volume, time and cost constraints required an innovative approach to problem solving.
The Approach – Hypothesis driven forecast mechanism
Our approach involved three key steps:
Step 1: Utilize a hypothesis-driven approach to maintain focus
We started with a list of standard set of hypotheses such as, “Does the effect of weather on product sales vary by region?” and “Does the effect vary by season?” This list was further enriched by secondary research, interviews with multiple business stakeholders and store visits. The latter ensured that the hypothesis driven approach was focused and relevant in the business scenario.
Step 2: Identify significant relationships among variables to maintain scope
We then went about identifying significant relationships between weather (independent variables) and product sales (dependent variables) for millions of combinations. To manage this scale, we leveraged the following concepts:

Simultaneously, we took a discovery-driven approach to help identify additional factors such as time leads and combination of independent variables (e.g. low temperature and low humidity) which need to be considered in the framework. We did this by leveraging graph theory – where a human could mine through previously determined correlations and identify patterns/new connections.
Step 3: Leverage cloud-computing for computing scale
Due to the cost constraints and sheer volume of data, we recommended that the client take advantage of Amazon Web Services (AWS), and perform the analysis in a cloud environment. This saved the client months of development time and money – there was no sense building up expensive internal infrastructure that had far more capacity than they needed for their day-to-day analytics work.
Within AWS, we leveraged a capability called Elastic Map Reduce (EMR) to take advantage of distributed frameworks such as Spark and Hadoop. EMR distributed the data and processing across multiple clusters in an automated fashion and allowed us to implement powerful software like R, Python and PHP.
The Outcome – Improved performance in product availability
- 12% improvement in product availability with a reduction in stock holding costs as well
- A stable and scalable analytical framework used across geographies with minimal cost involvement
- Performance metrics identified and now used as a benchmark for goal setting as well as identifying outliers for increased focus
- Robust feedback mechanism ensuring that the framework will stay up-to-date with changing scenario