2
Comments

What is Data?

Let’s try to answer this question in a top-down manner.

A great deal of data lives in computers – exabytes of it. Recent estimates published by Caltech suggest that the world generates 2 exabytes of data per year, but that figure is itself growing. The amount of data we store grows by about 60 percent a year and there’s no end in sight. It just keeps on growing. Every now and then we have to invent a new “largest measure of data”; megabyte, gigabyte, terabyte, petabyte, exabyte, zettabyte and the very latest word in data volumes; yottabyte. A yottabyte is a billion billion megabytes. Right now there are no yottabytes of data, but there probably will be in a few decades.

The words “data”, “information” and “knowledge” don’t quite convey the utter importance of data within an organization. It is, in many ways, the life blood of an organization.

Before computers existed we kept information on paper mainly, but also stored it in photographs or on film. In those days we stored a lot less data, partly because storing it was expensive. Go back centuries, to before printing was invented, and we stored even less data. Books had to be written by hand, so data storage was really expensive. There was probably only a few gigabytes of stored data in the whole world, even counting copies of books. And there were monasteries whose only purpose was to write out new copies of the Bible – Xerox machines of a kind.

When storing information is expensive, you naturally store only the most valuable information. So holy books and historic texts are what was preserved – books of knowledge, rather than books of information, and certainly not books of raw data. Data has come a long long way since then, there are vast amounts of it and it’s valuable.

The Spectrum of Data

There is clearly a spectrum of value (and quality) that embraces all data and it is worth examining. Take a look at the diagram. It divides data into four “levels of value”: data, information, knowledge and understanding. I’ll define what I mean by these words just to be clear:

  • Think of data as information that has no context and hence transmits very little meaning. When we add context to data it acquires more informational value. So for example the number five, is data. “Five people arrested in a brawl last night in central Austin.” is information.
  • Think of information as simple statements. If you group data together you get information. “Rome is the capital of Italy” is information. A record in a database table can be information. If you organize and analyze information you can get knowledge – and greater value.
  • Think of knowledge mainly as a way of doing something, for example, cooking a cheesecake. You need a recipe (a set of information) plus instructions in order to make cheesecake. However you can also mine information to create “knowledge” which has value if you can exploit it.
  • Understanding provides the potential to gain or display insight and act on it. By applying knowledge you achieve understanding. It is beyond computers, but only just.

From the computer’s perspective; data, information and knowledge are its domain. It doesn’t really do “understanding.” It has computational power that far excels any human, and it can combine this with algorithms to, say, beat the best chess player in the world. But even so it doesn’t really understand the game of chess. Computers have learning capability which, in some applications, can lead them to refine their behavior. However, in all instances, they are simply calculating in a sophisticated way. They don’t understand a damn thing.

Data is the stuff that’s stored. Add metadata (or mark-up information) and you get information. With programs we add logic and we get data as knowledge. Bring these three things together and we have computing in all its variety.

Systems and Applications

Computers process data. They act as systems or they run applications. There’s a difference between the two. An application normally has a single purpose in relation to data, whether it’s to manipulate a photograph, build a spreadsheet or play a game. A system is a group of interacting and interdependent elements that act a whole for a specific purpose. Systems are more complex than applications and, of course, they may be made up of many applications.

Telephone systems are primarily computer systems, nuclear reactors are partly computer systems and corporations are, in the main, run by computer systems. In most cases the computer systems are the engine of the organization, which automates the organization and contains most, if not all of the essential data of the organization. Although the analogy is a little overused, computer systems really do form the nervous system of the organization, carrying out and recording a good deal of corporate communications.

They also serve as the memory of the organization, formalizing and storing a good deal of the knowledge of the organization and acting on it. They are the prime analytical capability of the organization, providing the ability to aggregate, manipulate and investigate all of the data that they store.

If an organization loses all its data or even a good part of it, it usually goes out of business. Roughly speaking, about half the companies that lose their data in a disaster never reopen, and about a further 40 percent will be out of business within two years. It’s easy to understand why. For most organizations the computer systems are the vital organs of its body. If they shut down permanently, it’s over.

What is Data?

The words “data”, “information” and “knowledge” don’t quite convey the utter importance of data within an organization. It is, in many ways, the life blood of an organization. In truth organizations only circulate and process three things:

  • Raw materials – to create a finished product (or service) of some kind.
  • Money – to finance all activities
  • Data – in order to automate and control all operations

Cut off the supply of any of these and you kill the organization. Damage the supply and you hobble the organization. Manage the processing of these things well and the organization thrives.

In its simplest form, data may be just a few binary digits of information, but as it aggregates into fields within records within databases within the whole data fabric of an organization, it becomes a huge enabler of the organization itself. Add to it the understanding of the people that are employed by the organization and you get a whole economic ecosystem – a living and functioning organization.

Share and Enjoy:
  • Print
  • LinkedIn
  • Facebook
  • Twitter
  • Digg
  • Technorati
  • StumbleUpon

2 Responses

[...] Before computers existed we kept information on paper mainly, but also stored it in photographs or on film. In those days we stored a lot less data, partly because storing it was expensive. Go back centuries, to before printing was invented, and we stored even less data. Books had to be written by hand, so data storage was really expensive. There was probably only a few gigabytes of stored data in the whole world, even counting copies of books. And there were monasteries whose only purpose was to write out new copies of the Bible – Xerox machines of a kind. [...]

[...] from my previous blog posting on this topic, it occurred to me that it might be useful to approach the question “what is data?” by starting [...]

Leave Your Response

You must be to post a comment.

Search

Welcome to Pervasive Software's Data Integration Blog

Log in

Lost your password?

Register For This Site

Join

Join us as we spread the word.