Why Big Data is a marathon, not a sprint


  • BLOG
  • July 30th, 2019
  •   5854 Views

Big Data, in all shapes and forms, has been an integral part of headlines over the past few years, as more and more organizations look to capitalize on its potential. Despite the hype, however, many organizations are yet to uncover the true potential of their Big Data projects. Is it because most think of Big Data as a short distance sprint that requires large investments? Is it because the finish line isn’t quite clear?

One example of a Big Data sprint: A CIO suddenly orders his staff to acquire hundreds of large servers. He wants to proclaim to the world that his company had built the largest Hadoop cluster on the planet. Despite the staff asking, “Where’s the business case?” procurement and installation proceed. Within 24 months, the CIO leaves the company with a large mass of hardware, no business case and no Big Data business value.

Start only if you can envision the finish line
All too often, analytics firms jumping headfirst into Big Data fall short due to lack of business vision. Typically, in such cases, the focus is on getting more data faster rather than figuring out where and how the data will be used and which business questions it will answer. This is analogous to starting to sprint without a finish line in sight, assuming it is around the corner, but then getting exhausted and stopping short because you can’t find the end.

Running a marathon is hard, and it requires discipline. An American College of Sports Medicine study found that the average dropout rate among first-time marathoners was 70 percent. Many runners get caught up in the mystique and allure of the marathon, and then they never run another.

The same principle applies to Big Data. As an organization are you choosing Big Data because you’re caught up in the hype and mystique, or do you have complete clarity on how Big Data will benefit your business? If you can’t envision the finish line, then think again!

Break it down into incremental goals
Mark Watmore, head coach of the Colorado State University Cross Country team, said in an interview, “Distance runners are experts at pain, discomfort, and fear. You’re not coming away feeling good. It’s a matter of how much pain you can deal with, on those days. Any serious runner bounces back. That’s the nature of their game: taking pain.”

Similarly, organizations that are planning Big Data Analytics need to get comfortable with the possibility of encountering discomfort, pain and failure.

Most marathoners follow the strategy of breaking down an overall goal into much smaller goals in order to better endure discomfort. So, instead of stressing about running 26 miles, the better strategy is to start small – 2 miles to start, and then adding increments of 1-2 miles every week during training.

The Big Data parallel to this is starting small from a technology and analytical infrastructure perspective and scaling it over time as necessary. Experiments that help determine which infrastructure is best for the organization can be key to building a scalable and cost-effective Big Data solution. A detailed perspective on how to go about this experimental way of choosing the right infrastructure can be found @ The Big Data Sailboat.

Start slow
“Keep it very slow, as slow as you can jog, and you’ll soon adapt,” said Barry Magee, a bronze medalist in 1960 Rome Olympic Marathon. First time marathoners are advised to run very, very slowly when starting out. The reason is that your body needs to adapt to the new stress being put on it. Running fast from day one brings the risk of sickness or injury.

As an organization adopts Big Data, a lot of stress is put on various stakeholders – the technology teams who are bringing the infrastructure together, the business owners who are providing the use case, the analytics’ practitioners who are helping answer the business questions, etc. It is very important to start slow and articulate clear use cases; define roles, responsibilities and outcomes; and identify consumption channels. Slow in this case refers to putting thought into making sure that the end result is clear business actions.

Add Variety 
Most coaches suggest that running in exactly the same environment, the same path, the same time of day, etc. is not a good strategy. Variety in training schedules in terms of pace, path, terrain, time, intensity and weather can build more endurance and better prepare runners for what’s to come.

In analytics consulting projects, variety in use cases across functional areas such as marketing, supply chain, e-commerce, etc. can help test the robustness of infrastructure, the ability to execute the use case under constraints of the functional area and success in consumption of actionable results.

Ensure purposeful practice
Preparation for a marathon cannot be shortchanged. Diet, running and exercise regimens require tremendous planning, cadence and discipline. So do Big Data implementation. We have seen many Big Data implementation go awry because of improper planning and program management.

The potential of Big Data is highly dependent on the amount of thought put in before implementation.