Lorraine Lawson noticed a large number of articles and discussions on various strategies for building metadata management solutions popping up in the blogosphere, and asked the question, “Why the Big Push for Do-It-Yourself Metadata?”
That struck me as a darn good question. And I’d like to take a shot at answering it.
I believe that metadata management projects inevitably require a certain amount of customization. Well-built ETL tools take that need into consideration, and offer a simple means to extend functionality. The metadata management needs of individual enterprises are so widely varied that trying to create a one-size-fits-all metadata solution borders on the impossible. But writing the whole thing from scratch isn’t the answer either. Custom coding huge chunks of technology that have already been built by a dozen companies is like picking up a hammer and chisel to invent a wheel because you want a custom car.
Tom Spitzer at EC Wise had the right idea in the Information Management article, “Roll Your Own Metadata with ETL Tools.” Start with ETL tools that have already done a fair amount of the work for you, then add the additional functionality that you need just like you would with any other ETL project. He gives a good set of reasons why starting with an ETL-based metadata strategy makes sense:
“ETL products were built to address the various dimensions of the metadata repository population problem. They come equipped with connections to myriad systems and typically are already programmed to interrogate whatever internal metadata those systems contain. They tend to support a wide range of communications and connectivity protocols so that they can track down information wherever it happens to be located…. Finally, these products can be automated to periodically visit the systems they have cataloged and look for new or changed information.”
Another good reason is simply that a fair amount of the metadata you need is embedded in the ETL processes themselves. For example, one source of metadata you frequently need is data lineage, the source, target, and transformation information from those ETL processes. And nothing is better suited to capturing that data than the ETL tool itself.
So, starting with an ETL tool gives you a huge head start over coding everything from scratch, but it doesn’t get you to the finish line because there are guaranteed to be individual semantic matching, or business rule requirements that are unique to your business.
The key is extensibility. If you reach the end of the capabilities of the ETL tool, and you need more, then what? If the tool is not extensible, then the answer is, give up, start over, and build it all from scratch. That’s the frustrating problem with a lot of ETL metadata management “solutions” that lead a lot of people to think they have to build it themselves. They’re proprietary, and not extensible. That makes them just another freakin data silo. You would think that ETL companies would know better.
On the other hand, if the tool allows you to modify, add to, and utilize its metadata storage for managing additional metadata, allows you to customize business rules, add code modules, etc. then all you need to code is the bit that’s special, the aspects that are unique to your needs.
The best custom metadata management solutions start with the collected best practices, pre-built code, and sensible structural framework that an experienced ETL vendor collects over the years of doing thousands of implementations. Like a car manufacturer brings their years of experience to how a stock car needs to be built in order to run smoothly. To that base, smart ETL vendors add the ability to extend that framework by bolting on new code modules, adding custom business rules and semantic clarification, and querying with best-of-breed analytic and business intelligence tools.
If West Coast Customs had to build a new car from the ground up every time someone wanted a unique paint job, engine, or body style, they’d go out of business. Don’t fall into that trap. The process of customizing a metadata solution is the same. Start with the stock model, soup it up, modify it to your heart’s content, and get your dream solution fast.