Category Archives: Big Data

Defining Big Data: Another “spin”

Big Data continues to be ambiguous to many that researchers and experts continue to refine and update what it is.   A recent research paper by Machina Research gives another spin – 5S of Big Data.

5S_Big_Data

The first four dimensions to characterize big data are variations of the 3Vs or 5Vs.  Significance is a dimension which caught my attention, since from my experience the relative value or importance of disparate data sources  vary significantly across types of business decisions – long-term versus short-term versus real-time, reactive versus proactive, marketing versus operational, human versus machine, strategic versus tactical, predictive versus causal, and business model shift versus continuous improvement.

   ***************************

The debate continues on how to define Big Data, while from my vantage point the discussion should center on the business value of data and  as an aid to improve the speed and quality of practical business decisions.  My mantra:

Ask not what it is.  Ask what it can do for you.

Big Data Generates Business Value – Outside Usual Suspects

There are very few demonstrated successes of big data at organizations excluding Amazon, Facebook, Netflix, Google, Twitter, etc,.   A recent article Big Data Success: 3 Companies Share Secrets in Information Week, highlights three companies (MetLife, British Airways, and Tivo Research Analytics) developing and implementing big data initiatives.   Some common themes:

  1. Size of data doesn’t matter.  Integrating data from different sources and “connecting the dots” can generate substantial business value.
  2. Time to value can be short.
  3. Business goal-driven big data project instead of data-driven big data project
  4. Business sense is as important as data sense.
  5. Perfection is the enemy of the good
  6. Simple analytics with creative data enrichment/fusion may beat advanced analytics on limited data.

It is heartening to read about such successes and business value generation instead of getting bogged down by how to define Big Data.

Defining “Big Data”

Given the confusing and varied interpretations of Big Data, couple of academics from University of St. Andrews, conducted a meta-analysis of extant definitions in a recent paper Undefined By Data: A Survey of Big Data Definitions.

1) Gartner:  three fold definition encompassing the “three Vs”: Volume, Velocity, Variety.

2) Oracle:  Derivation of value from traditional relational database augmented with new unstructured data sources.

3) Intel:  Links big data to organizations generating a median of 300TB of data weekly.

4) Microsoft: Process of applying serious computation power – latest in machine learning and AI – to seriously massive and complex sets of information.

5) Method for an Integrated Knowledge Environment project:  The high degree of permutations and interactions within a data set defines big data.

6) National Institute of Standards and Technology.  Data which exceed(s) the capacity or capability of current or conventional methods and systems.

The authors then attempt to coalesce these definitions and venture a new one:  Big data is a term describing the storage and analysis of large and or complex data sets using a series of techniques including, but not limited to:  NoSQL, MapReduce and machine learning.

                                                       *******************************

Instead of trying to define Big Data (a pointless exercise), focusing on what it can do (the value to businesses, consumers, and governments) is a more fruitful path to pursue.