7
Comments

Is There a Single Version of the Truth?

The “single version of the truth” has been a BI cliché for quite a while. Particularly, there has been a long and noble quest in search of “the single view of customer” – a Holy Grail, lost in the profusion of customer databases that grow like weeds in the data center. Bold and valiant BI practitioners go out in search of it. But is there any sense in their quest?

“Is there a single version of the truth?” is not just an IT question. It’s a question that has beguiled epistemologists through the ages. When someone thinks of a tree, the abstract concept or picture they hold in their mind is informed by their life experience of trees. Someone from the tropics might think mainly of palms. Someone from the Northwest would likely think of a Douglas Fir. But are they both thinking of the same conceptual “tree?”

Definitions are slippery. “Tree,” is metadata. The actual tree you might be talking about is real: an instance of a tree. From a philosophic perspective, “tree” is essentially a Platonic form, referring to a class which individuals enrich via their experiences. If we were to show a giant Redwood to Plato himself, would he register it as a tree at all? Maybe he would coin a new word, for example, “Super-Tree.” 

The BI Context

If we apply all this thinking to digital data, the same problem emerges.  Imagine a customer who always shops online for a product. One day she decides to break the mold and go to a retail outlet of the same company she visits online and, for a change, she buys her chosen product there. We can easily imagine how the company would create two customer records. The company likely sees the two contexts as utterly different channels to market. One Jane Doe inhabits a browser, clicks on discount offers, likes red shoes and buys a great deal of nail polish. Another Jane Doe carries a platinum card, likes fashion accessories and spends 2 hours in the store trying on clothes – and bought some nail polish. Are they really the same person? Where’s the overlap?

But the truth is that they are the same person. And from this, two questions naturally present themselves: 

  • Is complete data integration desirable? and
  • Is complete data integration possible? 

Let’s dispose of the first question quickly. From a BI perspective, complete data integration is obviously desirable. Unless Jane Doe is an irretrievably split personality (unlikely in the extreme) we can learn how to instill customer satisfaction in both aspects of her and we can cross sell to both of them too, down either channel. If there are many such Jane Does we are missing a significant business opportunity if we don’t do this. The only obstacle would be if data integration technology were too expensive to justify the investment - and it isn’t.

The second question is a little thornier. Turn your attention to the diagram below. This is intended to illustrate the full universe of data that exists in most organizations. It shows the various possible sources of data:

  • Transactional Systems: These are the corporate applications that tend to use databases and from which data warehouses and data marts are usually populated.
  • Non-transactional Systems: These are the email systems, content management and document management systems and all those systems that gather and contain unstructured or partly structured data.
  • External System: Data from outside the organization, whether structured or unstructured.
  • Guerilla Systems: These are system
    s created by IT users, often to get around difficulties in the transactional systems that they are obliged to use, but sometimes simply to extend the capabilities of those systems. They are labeled guerilla systems primarily because they are “below the radar.” Often IT departments are unaware of their existence. Nevertheless they can be important systems, and if you can find them you can also get at their data.

In the illustration these data sources are all pointing to a magic technology box labeled MDM (Master Data Management). The goals of MDM are in collecting, consolidating, aggregating, matching, controlling, distributing and making sense of the information that flows into, through, and out of an organization. The MDM theory is that we can tame the corporate universe of data with a single point of reference:

  • By surveying all systems and data sources, so that we can define the data universe.
  • By analyzing these data sources and exposing all the metadata they contain, so that we know its meaning in a unified way.

This is, of course, a Herculean labor. However, MDM technology exists and has been applied beneficially in many situations. It’s just that a whole corporate universe of data is a big thing and nobody has climbed that mountain yet (as far as I know).

What Is So Good About A Single Version Of The Truth Anyway?

We can change the question we rode in on, to: “Is there a single version of the truth that emerges if we unite all our data together?” 

Well, logic suggests there could be. Metadata contains no contradictions – it only describes data and doesn’t apply hard and fast rules to it. And if we successfully unite a wide variety of data sources without encountering contradictions, we will get a very rich set of data around any given entity, whether it is a product, customer, business partner or whatever. But, if we gathered all the data surrounding any given entity, we would never have reason to use it all at once in any given context. The benefit would not be in having a single version of customer but being able to generate many different views of customer. We would have a data rich version of the customer.

And let’s be clear that gathering together all the data we hold on the customer will not provide us with everything that could be known about the customer. Our data rich version of the customer will never be truly complete.

Just as the definition of “tree” is inherently slippery from a philosophical perspective, the definition of “customer” is slippery from a data perspective. Definitions are moving targets anyway. Just as we could surprise Plato with a giant Redwood, we could surprise a BI analyst with new attributes for a customer – and we regularly do.

Beside The Point

Entertaining though it may be, ultimately all the philosophizing is beside the point. The IT user doesn’t want a “single version of the truth,” she wants “an unambiguous and data rich version of the truth.” The “Internet” version of Jane Doe lacks some important data, and so does the “in store” version of Jane Doe. Far better would be a richer and consistent set of data about Jane Doe. That’s what MDM seeks to deliver and, clearly, it’s a sensible idea.

A single version of the truth? That’s just a slogan.

Share and Enjoy:
  • Print
  • LinkedIn
  • Facebook
  • Twitter
  • Digg
  • Technorati
  • StumbleUpon

7 Responses

[...] Is There a Single Version of the Truth? -’The IT user doesn’t want a “single version of the truth,” she wants “an unambiguous and data rich version of the truth.”‘ Comments (0) Trackbacks (0) Leave a comment Trackback [...]

09.02.10

Many of times, I have been chistized for suggesting to do a Functional Business Model or a Business Plan, from the Business side. Imagine IT doing the customer’s business plan for them. Our own management refuses to do any if possible because “we know their business”; arrogant? yes. Why not them, the business? Simply, because, to their own assertion, don’t trust themselves embarking on a project of magnitude. But when we take it over, we do an IT version of the truth. It takes common understanding and logic, and experience in doing business definition, to attain a valid reflection of the business. Traditionally, our anxiety to automate a business, obscures the intended purpose of defining the system. With this in hand, strategies can be put in place to introduce a moderated approah with confidence. I mention confidence because the results of this study promote trust from the business. This is foundational. Payoff goes a long way. We are so geared to be repairmans like in a mechanic’s shop, that our focus is in making it work with little functionality. Is this why we are so quick to repair a web site and lose system integrety? There are two types of professionals and they cannot be more distrant from each other. The technicians and the thinkers. The technicians focus is ‘get it’ done. The thinkers struggle to gain a foothold and to create (the art portion) a system of trust and subtance worth the expense.

09.02.10

Hi Robin

Thanks for an excellent post. Sadly, I feel in far too many enterprises talk of “richer data” and “various versions of the truth” is mostly hyperbole to cover the fact that too many practitioners do not know how to eradicate duplication.

To me “rich data” is data that makes the enterprise rich!

One version of the truth is easily achieved by managers, analysts and DBAs knowing, understanding and properly implementing Unique Identifiers (UIDs) for all data entities.

I explain in more detail why and how in http://www.integrated-modeling-method.com/data-modeling/data-quality-one-version-of-the-truth

Once again, thanks.

Regards
John

[...] of data and turning them into one single, perfect source of high quality data. A noble goal, but Robin Bloor made the point that there is no one single version of the truth, and no one really cares: “The IT [...]

09.02.10

Hi, regarding the codified enterprise (what we can call transactional data), I am definitely sure we can achieve single version of the truth as long as people manage data lifecycle appropriately and share the importance of data and associated data quality at the enterprise level, even though we have applications portfolio made of silos. It’s not that difficult. Mostly an educational and a stupid management issue. Regards

[...] some of our community to ponder on it’s viability or even if it exists. Robin Bloor’s ‘Is there a single version of the Truth’ and  Beyond a single version of the truth in the Obsessive Compulsive Data Quality blog are [...]

[...] some of our community to ponder on it’s viability or even if it exists. Robin Bloor’s ‘Is there a single version of the Truth’ and  Beyond a single version of the truth in the Obsessive Compulsive Data Quality blog are [...]

Leave Your Response

You must be to post a comment.

Search

Welcome to Pervasive Software's Data Integration Blog

Log in

Lost your password?

Register For This Site

Join

Join us as we spread the word.