Big Data - Is it? 

Where ever you go these days, you hear the phrase 'Big Data'', in the news, on the web, at work and even in the pub.

Every day the volume of data around us grows, as IoT (Internet of things) becomes the daily norm*, as we post more and more to social media, as streaming services grow, the volume of data around us grows at an exponential rate.

It is predicted that there will be around 40 trillion gigabytes of data globally by the end of 2020, the estimated global data volume in 2010 was 1.2 trillion gigabytes, that grew to 2.8 trillion gigabytes by 2012.  The last two years has produced an estimated 90% of all data every created.  By 2025 it is predicted there will be an estimated 165 trillion gigabytes of data globally.

So what is "Big Data"?

Big data is a combination of structured, semi structured and unstructured data which is collected by organisations for advanced analytics, data mining and machine learning.

What is structured, semi-structured and unstructured data?

  • Structured data is what many of us are familiar with from a working perspective, such as spreadsheets and databases.
  • Semi structured data is such sources as streaming data from sensors (IoT) or server log files;
  • Unstructured data is such sources as text, image and document data (i.e. marketing data held in Word documents)
OK, so big data is just data but lots of it, from different sources. Or is it?  
You could spend all night in the pub discussing this and never really come to a consensus, though many may argue there are much more fun topics to be discussing in the pub, I'd probably agree with them, despite being a bit of a data geek.

Take EPOS (Electronic Point of Sale) data as an example from a large retailer with a few thousand stores and over 2500 products sold each day.  Around 2 years’ worth of data, equals around 17 billion rows.  Is this big data or just lots of data?

If it's not big data, what is it? and what is the difference?

'Small Data', relatively simple really.  What differentiates the two?  How we use it.

We use 'small data' to answer or deal with a distinct or specific challenge.  For example, an email is sent out to a group of customers in error.  We use small data, the list of customers to whom the email was sent, to create a mailing list to send out an apology email.

Where as we may have used 'big data' to generate the original email distribution list, using combined data from our website on the customers purchase history, alongside historical behavior data on how they have interacted with emails they have received from us in the past, social media activity from people who have mentioned our brand or products and tied in demographical data to filter our list down to a group of customers in a specific region or country.

So back to the original question, Big Data, does size matter? In my opinion,  No!!  Data is data, there maybe lots of it, from various sources and various formats, but it all boils down to data is data.

Feel free to comment below on this, always interested to hear your thoughts on my topic posts.  This post is intended to generate conversation as much as inform.  I look forward to hearing your thoughts.

* 3.8 Billion connected devices globally by the end of 2019.


