Next Waves of Big Data: Version 2.0 and Beyond


What we have seen thus far in the world of big data is version 1.0.   The key components are:

  • Distributed data storage and computational ecosystem:  We have the infrastructure and tools for processing  almost infinite data such as Hadoop/MapReduce, and variations thereof such as Dryad, NoSQL databases (HBase, Cassandra, AsterixDB, MongoDB, CouchBase to name a few –  do we really need so many open source NoSQL variants?) 
  • Integration of structured, semi-structured and unstructured data:   We see the beginning of investments in integrating “internal” data such as operations and supply chain, finance, sales, marketing, and customer service across the organization.
  • Cloud computing and easier deployment/monitoring of platforms:  Cloud-based X-as-a-service where X includes platform, infrastructure, database, and analytics (still nascent). Reducing IT costs is a powerful driver for firms, but the value from cost reductions is short-lived. 
  • Offline/batch processing:  Although there are few real-time applications such as RTBs in advertising, fraud detection, internet security, personalization and recommendation systems, big data and analytics are primarily linked to trend analyses and “slow” decision-making (daily, weekly, monthly, quarterly) environments.  
  • Data Scientists and Engineers:  The users of the frameworks are primarily tech-savvy software engineers, computer scientists, mathematicians, and statisticians.

In version 2.0, over the next 1-2 years, we will see many companies in the ecosystem help businesses explore big data (primarily analyzed via offline) and start the broader use of real-time analytics and optimization.
  • Data visualization and search tools for data discovery:  Companies such as Panopticon, Datameer, KISSmetrics, Domo, Cloudera and many others provide simple visualization and data discovery tools to ask questions through SQL-like queries and search paradigms.
  • Data integration facilitators:   Startups such as DataTamer, Data Gravity, Cloudant, and many others will ease the pain of continuously integrating data from disparate sources – both internal and external. 
  • Development of a commercial ecosystem around real-time analytics platforms:  New real-time computational frameworks such as Spark, Storm, Graphlab, and others form the next wave of real-time, distributed, in-memory analytics in multicore and multi-GPU clusters. 
  • Data-driven business analyst:   Business analysts start using data visualization and search tools to develop hypotheses of the business and markets and ask pointed questions at the data.
In the next 3 to 5 years, we expect version 3.0.  Key changes:

  • Commoditization of the platform and the cloud providers:  Consolidation of many players in the Hadoop-like ecosystem into few “end-to-end” platform providers:  Hadoop and another framework for batch analytics, and couple of players for real-time analytics.    
  • Shift in value to big data applications:  Platform market becomes a commodity and the value shifts to innovative business applications such as drug discovery, operational efficiency improvements, marketing and sales automation, etc,.   This shift in value is similar to the substantial shift in value from wireless operators (with substantial capex to build out the network and improve reliability) to smartphone makers and application developers in the mobile sector.  Real winners in big data will come from creative application-focused firms where businesses realize measurable benefits from big data investments – leveraged through identification of patterns and strategic anticipation, optimization of business operations, predictive analytics, and what-if simulations.

Where do you think the big data ecosystem is heading?

Buzz of Social Media but Value from Email Marketing


Despite all the buzz of social media and big data analytics, and social network analyses tools, a study sponsored by Lyris (a digital marketing solutions provider) and conducted by Economist Intelligence Unit identifies the following top three critical skills needed for marketer’s success:

  1. Ability to use data analysis to extract predictive findings from ‘Big Data’
  2. Understanding of best practices of email delivery
  3. Ability to generate insights about drivers of consumer behavior from multiple data sources

It is surprising that email marketing – a tool in the marketer’s toolkit for 15-20 years – is in the top 3 critical skill gaps.  Why?

  • Email marketing works and small % improvement have an substantial impact.  Revenue per thousand (RPM) for email marketing is 10X to 1000X the RPM for digital ads.  Small % improvement to email performance beats order of magnitude improvements in performance of digital ads.   Recent investments in start-ups such as MovableInk (supporting “live” or real-time personalized emails) substantiate its effectiveness and incremental opportunity hidden in improving email marketing. 
  • Prior relationship matters:  Having a conversation with consumers who have expressed interest in what you have to offer or bought from you in the past is more valuable and effective than looking for a prospect.  Even Facebook with the custom audience ad product permits marketers to leverage house email lists.  You upload your house file of email addresses and Facebook “matches” the email addresses to their internal list of email addresses, and voila you have a custom audience segment on Facebook. 
  • Email marketing is cheap to execute:  Compared to retargeting ads, email marketing is order of magnitude cheaper.  Why look for a consumer with some prior relationship with you somewhere around the web when you can contact them directly via an email?

Getting Value from Big Data

In the last decade, new concepts such as big data, expertise of so called data scientists, and distributed computational frameworks such as Hadoop/MapReduce have received a lot of share of mind. Further, substantial investments have been made in leveraging large quantities of data with the hope (and prayer) of improving short-term and long-term business performance. But the returns on such investments are either slow to realize or non-existent. Why? Here are some plausible reasons and suggestions:
  1. There is no substitute to creativity and intuition: The hypothesis that collecting and analyzing large amounts of data moving at warp speed uncovers new insights is hyped. Patterns will emerge from such analyses, but the patterns need to be valuable to the business and timely actions need to be taken to change the status quo for realizing value. Human intuition, creativity, and interpretation are critical to convert big data analyses into recognizable patterns and data intelligence.
  2. Executing is more important then just knowing: Identification of a pattern such as positive customer experiences (and quick resolution when there is a negative experience) improve loyalty of your most valuable customers is one thing. But how does the company change its core customer service processes, incentive structures, and leverage real-time data to manage and influence customer dialogues is another matter altogether. It requires ad-hoc and real-time decisioning tools and intelligence on top of big data and changes in human behavior to follow-through on the identified pattern to realize value. It is no different from the limited value realized if insights you have garnered from small data (market research, customer satisfaction surveys and the like) in yesteryears gather dust on the shelves. 
  3. Don’t ignore the art of story-telling: Ultimately, success of a business depends on sound strategies, anticipating the future, and impeccable execution. Decision-makers, managers and employees need simple and easily understandable stories emanating from big data and answer 
    “what-ifs” to change behavior and prioritize future investments and actions. 
  4. Small data plus big data: Big data and patterns thereof need to be synthesized with small data for making strategic business decisions. Big data with supporting offline and real-time decision engines have natural strengths in tactical, granular, and operational aspects of running a business, such as recommendations, real-time bids for advertising, etc. But they need to be synthesized into coarser-grained patterns to support business strategy refinements and identification of new opportunities where small data typically have excelled.
  5. Focus on data that (potentially) matter to your business: You don’t need to gather and store every bit for eternity and then expect patterns to emerge for business decisions. One needs to prioritize types of bits based on potential value, invest technology and analytical resources in concert with the potential value and business strategy. For example, for managing customers in CRM systems, we don’t need a 360-degree view of a customer but just the right view to help make smarter customer-level investment decisions.