perpetuo the ever present now...

Oh-oh! Our crumbs are data holing! - (or has current tech reached 'peak' Big data)...

by clif high, Thorsday, July 10, 2014 9:30am

with respect..

i have worked as a software engineer with special expertise in debugging complex system interactions (further specialized on those systems that are SQL (standard and variants) based) since desk top (client server) computing started its spread through the corptocracy (early 1980s).

At one point i was brought in to correct some flaws in what was billed as the 'largest, non military, client server software system on the planet'. This was pre SAP 'enterprise wide systems management' software that operationally spanned 4/four continents and over 100 thousand desktops and 20 thousand servers.

i have been a member of SQL standards development committees, and have written articles (in several languages) detailing such interesting topics as "Ghosting data in tempdb", and "record level lock conflict resolution strategies".

This is by way of saying that i have some considerable number of decades in the field now known as 'BIG DATA'.

My expertise of recent years in predictive linguistics has brought me to the conclusion that our species has reached Peak Big Data.

By this i mean that Big Data is failing as the "crumbs" issue develops into full blown "data holing". So, a definition of the 'crumbs' issue in Big Data would have you understand that

1) all data is not equal in 'meaning' (to any system, or to any question/association being sought),

b) data is HUGELY, mind bogglingly diverse in form and format (including such aspects as 'age', and 'maturity', and other meta data layers, any given datum can have thousands of 'qualifying attributes'), and be in any of hundreds of forms/formats,

iii) data is 'structured' by its representation with the database housing it. In other words, not all data base designers are equal in skill.

IV) those who design the queries against Big Data databases are divergent in skill. Further, recent trends in automated query software are also constraining the potential range of queries.

All of which leads to the issue of 'crumbs' of data that 'leak out' of the system. By which the IT guys mean that your (their) query was not sufficiently well designed to locate the data that it did not find in the vast virtual space of your combined data base systems. Thus these 'datum not seen' become 'crumbs' in the system.

Some of these crumbs of data may actually be both meaningful, and had they been located, decision altering.

Get that concept. Big Data is fundamentally flawed by crumbs. If you cannot exhaustively query the data then you cannot truly debug the query itself which makes the returning data sets suspect in quality as they are essentially unknown in quantity. And, given the complexity of data, these may be 'unknow-able'.

So... the mere presence of the 'crumbs' issue has now devalued Big Data. Basically it is a case of you don't know what you don't know as you cannot find the data you can't locate.

But your IT guys know it is there.

And they are very concerned about the crumbs as they are both growing, and getting 'sticky'. The new phenomena of 'sticky crumbs' is where data from one system is scrubbed out/altered, only to reappear later in a previous state due to being 'restored' in some other, inter-connected system that is not quite in sync. Basically it is system level conflicts that produce 'sticky crumbs'. The real problem here is that it can lead to 'data holes' where one IT group recognizes a sticky crumb situation, and uses either a masking technique or direct purging to 'protect' their system. The result is a 'data hole' that prevents their system from reacting properly to that datum, or class of data. Thus making a 'hole' in their data vision similar to cataracts in humans.

Recently the meme of 'crumbs' and other data anomalies have taken very large jumps in the consciousness of those who serve in the Big Data farms. The rise of language expressing concern over increasing amounts of 'crumbs', and the huge costs to locate and clean up the continuing 'crumbs' issue is the first clue that Big Data has reached some form of threshold moment. Further in this vein are the language choices being expressed in these postings (fora and articles et al). It mirrors that seen in Big Oil as the 'Peaking problem' was developing in the consciousness of the workers in that industry.

Given the language, and absent any fundamental change addressing the problems, the current technology may indeed have reached its limits in the area of BIG DATA.

You will know when Peak Big Data has reached your life as one day soon a service tech on a phone near your ear will say those spine chilling words, 'Sorry, but i think you're lost down a data hole.'

And that will ruin your day.

If you feel compelled by universe to support our efforts, get up, go outside, have a walk, do some jumping jacks (look them up) or Hindu squats, and see if you can't shed the feeling. If it is still with you, and you want to give into it, we accept donations of Bitcoin at the address below, and PayPal below that: