1
Comments

The Joys and Sorrows of Third Normal Form: A Metadata Story

Long before metadata became a problem, it was a blessing. Prior to the invention and proliferation of databases, the contents of any given file were defined only within the programs that used them. Files never declared anything about the data they contained in any standard way. This meant that you had to depend upon a program library – or worse the program that wrote the data – to have any way of reading a file.

So databases when they emerged, were a blessing with their easily accessible schemas providing a map of all the data. Life got even better with the advent of SQL; not a perfect data language by any means, but at least a standard one, for reading and writing data.

So let’s salute all of that, and discuss schema design.

What’s wrong with this picture?

The schema of any given database holds a list of the database tables and items held within each row of the table, and it provides enough definition information for any programmer or program to use the data. That’s fine as it stands. But, as we all know, we frequently compromise the logical design of a database for the sake of performance. So instead of having, for example, a third normal form (3NF) definition of the data, our schema might define a star schema. In other words, for pragmatic reasons the database designer squashed 50 or so tables into a 7 table logical design with a long fact table in the middle.

In practice that means that we lost logical information. There is no automatic process that allows us to simply take that star schema and deduce what the 3NF model looked like. It would be useful if the database held two schemas, the 3NF schema and the schema that actually got implemented. But few databases do that. It’s worth noting here, by the way, that this I something that Teradata’s database accommodates. You provide a 3NF schema and it works out how it’s going to store the data, but Teradata is a shining exception.

That isn’t the only problem with schemas. Conceptually, a schema is a logical-to-physical translation, but it doesn’t distinguish between what is logical and what is physical. Take for example the situation where you want to improve database performance and you particularly want to do so when you process queries relating to eye color. Let’s say that the values for eye color in the database table PERSON can be blue, grey, green, brown or dark brown. To speed up access we define eye_color as an index.

For performance reasons that proves to be a good idea. But what does it mean logically. Well we could have set up a table called EYE_COLOR and defined eye_color as a foreign key in our PERSON table. And that would be logically correct, since there is a one-to-many relationship between a person and their eye color. But if we follow that line of logic, we’ll end up with a table for every non-numeric item in the database – and that doesn’t make any sense either logically or physically. 3NF doesn’t require that. The point here is that the database schema doesn’t identify which constructs are are logical and which are physical.

 

The Limitations of Relational Database

Quite a while ago, I remember being told in a seminar on database, that 3NF was a mathematically correct definition of the structure of data. It isn’t and it’s important to understand that. Indeed, there are quite a few things in relational database that are not mathematical correct at all; for example, relational algebra isn’t a true algebra. But let’s not worry about that here.

Designing a data model to 3NF for a relational database is a good idea. This is because 3NF tables are (with very few exceptions) free of update, insert and delete anomalies. Also, once you get the idea, arriving at a 3NF design isn’t too difficult.

So, 3NF designs are a good starting point for transactional databases and it works very well in many circumstances. But sadly, it works very poorly in others – or to be honest it doesn’t work at all in some circumstances. Indeed, the whole relational database idea works badly for quite a lot of data. You can deduce this from the fact that there are many applications that don’t use databases, in fact the majority of applications don’t.

If relational database expressed a universal model of data, then files wouldn’t exist. All data would be stored in databases and programmers would use them all the time – they’d use free open source databases. This doesn’t happen because a variety of data types and data structures don’t fit well into relational databases.

There was a big move in the 1990s to fix this issue with the invention of the object relational database (ORDBMS). The idea was to accommodate inconvenient (complex) data types – and thus broaden what a database could store, but this attempt to stretch the model complicated the situation. It made it possible to store geographic data, documents, photographs and other such objects within a relational structure – as specific data types.  You could even define a table as a data type. The database became extensible, which was a good thing.

It was workable, but it was a kludge by any reasonable definition. It required program code to be added to the database to unpack the various objects that were stored there, which was partly a good idea, because it ensured that the same code was always used to access the data. But it broke the simplicity of the relational view of data and it made the ORDBMS schema more complex to understand. As for applying 3NF for designing the data model for such a database – well you had to simply treat the “objects” as data types and ignore what they contained.

In practice, the ORDBMS wasn’t simple enough for most people’s needs and tastes so while such databases were used (Oracle has ORDBMS features) they were rarely used for their ORDBMS features – and, of course they never came anywhere close to eliminating files systems.

The conclusion of the industry remains that 3NF is a good modeling technique for data that fits well into a relational database. Beyond that you are dealing with objects, so you may as well use UML, an object modeling technique, and model the objects. When you need them to persist you store them, and if you have to store them in a relational database (and it may even be a good idea) you work from the object model to define the relational database schema. And that, of course, is a kludge too.

But never mind 3NF modeling has its place and can be useful. It’s just not universally applicable.

Share and Enjoy:
  • Print
  • LinkedIn
  • Facebook
  • Twitter
  • Digg
  • Technorati
  • StumbleUpon

One Response

11.21.11

Hi Robin

I don’t believe that the problem with MDM and relational databases is that the method and techniques are inadequate or have been pushed to their limits, rather that few people actually know how to analyse, model, design and build quality systems in a structured manner.

Because they have never mastered the fundamentals of logical data modelling, they espouse to have abandoned it for a “better” solution and grasp something more “chic” like UML. By this one action they have abandoned all analysis and logical modelling and have begun misusing a misnamed design technique in the belief that they are taking a more “technical” approach to analysis.

This is made evident by looking at the type of data quality errors that are commonplace in so many databases today. These are not errors brought about the limitations of relational databases. They demonstrate a fundamental lack of understanding of all normal forms (from 1NF to 5NF) and key fundamentals such as unique identifiers, etc.

Instead of looking for more “sophisticated” techniques and methods, practitioners need to go back to basics and learn the fundamentals. Then, when they truly know how to analyse and model data at the logical level, they will be able to implement the modelled structures using the most appropriate database structures, be they relational or hierarchical databases or flat files.

Regards
John

Leave Your Response

You must be to post a comment.

Search

Welcome to Pervasive Software's Data Integration Blog

Log in

Lost your password?

Register For This Site

Join

Join us as we spread the word.