The heart of an IOA is what we have chosen to call the middleware components. This set of software components connects the BI applications to the various BI data sources in a reliable way, delivering data to the applications in a timely manner. This is not a trivial task to achieve, requiring more than just connectivity software, as can be seen in the diagram below which shows most of the IOA.
We can think of this middleware layer as having three separate elements:
1. Mapping software
2. Performance management software
3. Integration software
Ultimately there has to be a map of some kind available to IT users (or programs) which describes the available data resources in a useful way. So here we have defined three components; an IOA registry, the master data management component and the semantic data map. Taking these one by one:
- • IOA registry: The IOA registry is an SOA registry that declares and describes generally available data services. It would abide by corporate standards and be accessed in the same manner as any other data service. It ought to be possible for BI tools that want nothing more than ODBC access to a data source to simply connect via the IOA registry to achieve it. Some data integration companies already offer this kind of capability. The registry is a catalog of the data services available.
- • Master data management (MDM): MDM goes much broader and deeper than simply defining data. First, MDM tries to arrive at a single consistent unambiguous definition of the data of the organization. Multiple, potentially inconsistent definitions of data will likely remain in use in operational systems — since many data definitions often embody an element of context. There may be no “single version of the truth” in some areas, but there will likely be “the most useful general description” of the data. MDM is best thought of as a process since new data stores and data records are defined all the time and for some organizations, mergers and acquisitions are common and can be rather disruptive to any MDM effort that is in progress. Such developments may necessitate major new MDM projects. It could be said that the MDM project is never finished. As such, MDM might be thought of as a vain effort, but this is not so. Certainly, an effective IOA requires a reliable and usable map of corporate data and MDM is the best hope of having such a map. One outcome of successful MDM can be agreement on critical data descriptions and what the corporate data resource contains at the level of the business. This in turn offers additional value in BPM.
- • Semantic data map: MDM as conceived by most vendors is mostly about structured data. Structured data has metadata which describes its meaning to some degree, but not with a great deal of sophistication. Above and beyond such metadata, there is a kind of business vocabulary which expresses some basic truths about an organization. Take a simple example. The data record that describes an insurance policy will tell you many of the important attributes of the policy: object(s) insured, the term of the policy, etc. But it will not tell you what insurance actually is. And, you may not be able to deduce from the information the full range of insurance claims that might be made and which ones are valid. That’s because the systems either don’t hold that information at all or don’t hold it in a convenient form. For this reason, an MDM map of corporate data can be usefully complemented with a semantic map of corporate data which embodies a kind of business vocabulary. Such semantic information can be useful in the analysis of some unstructured data and in the setting of governance policies.
Two components, shown in the diagram focus specifically on managing the performance of requests for data: data virtualization and the enterprise service bus/data flow component. We’ll consider data virtualization first. In the past we tried to meet the demand for data by building a data warehouse and then syphoning off subsets of data into data marts. That’s a very slow and, to some extent, manual process for determining where to locate data. Rather than creating data marts, it would be better for an intelligent software component to automatically build caches of data to satisfy demand. That’s what data virtualization does.
Data virtualization federates data from multiple sources, ODSs, data warehouses and possibly operational systems and learns where to place data for optimum performance by analyzing query traffic to determine which data is commonly queried. It then caches that data in a place that is as local as possible to the querying applications. Exactly where that place should be will be determined by several factors, including available server resources, network speeds and the need for resilience. Such a capability is central to an IOA.
A second aspect of performance management is the management of data flows within the IOA, whether they are caused by individual queries or by batch loads of data from one database to another. Some data flows, such as query responses, have a higher priority than others. So there is a need to balance competing workloads over the resources available. This is the kind of task to which an enterprise service bus is suited, although the performance management activity is distinctively different to that in an SOA environment. An ESB within an SOA is all about the messages passing to and from application interfaces. This is more about getting data flows to happen when they need to, at the required speed and with a guaranteed service.
Finally there is data integration, which will be strongly linked to performance management. Note that we included data federation, which is a kind of data integration, under data virtualization. Aside from that, data integration can involve:
• Support for connectivity standards: (ODBC, MDX, XML, etc.)
• Connections to non–standard data sources
• Extract, transform and load (ETL)
• Connection to data archives
Functionality beyond this level of connectivity, for example in the analysis of unstructured data, will most likely be part of the associated BI application, rather than included in this middleware layer.
Data integration will likely be dependent on data maps, which may exist in mapping software (i.e., a separate MDM component) or could just as easily be a part of data integration software. Ultimately there needs to be a common metadata store — a metadata warehouse if you like, for the sake of standardization, for data integration and for evolving a business vocabulary of data.
Aside from the practicalities of data integration, there are also the issues of data governance and data quality. We can think of these as separate components, but it is worth noting that there is a logical and a physical aspect to both of these.
Data governance is the more complex of the two, in theory at least, as it may involve both the declaration and implementation of company policy in many areas including data usage policies, data management policies, business process policies in respect of data and even data risk management. It could also involve data quality rules, but we will discuss these separately.
It can be an intelligent move to define such policies even though there may be no way to automate the implementation of some of them. At least users and IT staff will know what the policy is. The difficulty with the implementation of data governance is that for many of its aspects the implementation is widely distributed. The implementation of data risk management, for example, may involve the deployment of data audit capabilities and many other data security components as well as the imposition of data usage rules.
The same kind of distributed deployment is likely to be involved in data quality. Some data quality activity could involve the cleansing of data at the source. In some instances it may be better to record the fact that data was wrong and the time period for which it was incorrect. In yet another case, it may be better to cleanse the data at the point of delivery — or prior to delivery — since that might be the point at which errors were identified.
And then there is the issue of unclean data that is used by the organization but does not belong to the organization. Clearly the organization has no ability to correct the data at source but will have no wish to use incorrect information either. So, as with data governance, although in a less complex way, there can be a central store of rules to apply to data quality and a distributed set of software components that implement the rules.
And that’s it – the final chapter in my description of the Information Oriented Architecture. Note that if you want a copy of the IOA diagram and the associated white paper, you can download it from The Virtual Circle.