Many don’t consider performance when they design and deploy a data integration solution, but they should. Typically misunderstood is the fact that data integration is a holistic solution with many links in a chain. The data integration engine is only one link. Thus, the performance of any data integration solution is typically limited by its overall design, as well as its slowest component.
When I’m called about data integration performance issues, the caller’s natural tendency is to blame the data integration technology itself. However, these days the data integration technology is rarely the bottleneck, and thus you must look at other aspects of the design to find the issue. The best determination of overall performance is not the speed of the data integration engine, it is the experience of those doing the overall data integration architecture and deployment.
When considering the data integration design you have to think in terms of links in a chain. The API is invoked by the adapter on the source to produce the data, the data is consumed into the integration engine, the structure and the content of the data is altered to meet the needs of the target, a log is updated, and it’s pushed out of the integration engine to an adapter that invokes an API where the data is pushed into the target.
Therefore, the performance of that “chain” is dependent upon all components working efficiently. A poorly performing database or application API, or a poorly designed transformation, can be the cause of your latency. Those are just a few things that could be a performance issue (discussed below), but those are not the only places to look.
Let’s review the top 3 things to consider with data integration and performance issues. They are:
- API Latency
- Poorly Design Transformations
- Over Eating
API latency, as introduced above, is the most common performance issue, no matter if it’s a database API, such as a CLI (Call Level Interface) such as JDBC and ODBC, or if it’s a proprietary enterprise application API such as SAP’s BAPI. The typical problem is that the API is not configured correctly, and thus not able to produce or consume the data at the required rate. Or, in some instances, the poor performance is engineered into the API, which requires you to work directly with those who produced and maintain the API.
In the case of a configuration problem you need to become intimate with the configuration parameters of the API, including allocating cache and/or queue space, which are typically the issues. Make sure the API rarely goes directly to the physical disk for read and write into the queue, and that typically means the use of physical memory is maxed out for the API, but not always.
Moreover, watch out for Web service interfaces. While they are based on open standards and it’s much easier to make them work and play well with others, they have a tend to underperform their native API counterparts. Thus where performance is a requirement, and you don’t need Web services access, then the native API is your best bet (if there is a choice).
Poorly designed transformations are common problems. The technology these days is so flexible and able to accommodate so many different requirements that, in some instances, you could be writing transformations that are much more complex than what is required. In the world of data integration, the ability to alter both schemas and content from a source to a target is a core requirement, but in doing so you need to optimize how the transformation executes.
This typically means working with your data integration technology vendor to determine the best approach to deal with complex transformation requirements. If performance is a core consideration, then you need to understand how to optimize the schema and content transformation, and in doing so remove a common bottleneck.
Over eating is really a design issue around the consumption of too much data at one time into the adapter and integration engine. This is typically not an issue with the integration engine, but just a design mistake. Instead of consuming a bunch of small portions of data, such as each sales transaction, huge chunks of data are pulled in at the same time. Sometimes there are reasons to pull in huge chunks of data, but in many cases it’s not a requirement.
Of course, as with all design mistakes, the data integration engine will accommodate you. However, the granularity in how you consume the source data, or push to a target, typically means that your performance is less than optimal. Again, work with the data integration technology provider to determine the best practices and design guidelines here, but as a rule of thumb, smaller and many is better than larger and few.
Performance is often overlooked when those charged with driving data integration solutions just want to get things up-and-running. It’s understandable. Moreover, the data integration technology is so powerful these days that it’s very forgiving and able to work around many of the inefficiencies of design. However, with a bit of forethought, performance won’t be an issue within your data integration solution.