Information systems performance and SOA performance are key concerns for architects that plan on implementing an enterprise data virtualization layer. Data virtualization is a form of data integration that tends to include many data sources that are both relational and non-relational, and organizes them in a logical virtualized manner rather than a physically consolidated manner. Performance is very important in environments with widely distributed data sources because network latency cannot be controlled. Real or near real-time environments also require high performance from the participating data services. This means that the data virtualization layer’s performance depends mostly on response latency. The bottlenecks produced by network latency from the overloaded data virtualization layer can be reduced through high-performance caching in a data virtualization layer.
Three factors contribute to the data virtualization layer’s response latency: the network, the middleware, and the data sources. When these three things are all located on the same subnets, network latency will be pretty constant. In this scenario, architects can reduce the response latency at the slowest data source to reduce latency for the entire solution. Another approach is high-performance caching. With a caching system in place, performance will increase and many of the client requests will be fulfilled by cached data, which reduces the number of requests that go against the data source production.
Single Cache Instance
Single Cache is the most basic implementation in the data virtualization layer and they are preferred for small or medium projects with low or moderate client load activity. The implementation team should consider putting the cache on the same subnet as the data virtualization middleware to minimize the network latency between the middleware and the cache. If cached data is frequently accessed, but relatively small, it might be an even better idea to put the caching system on the same blade server as the middleware to completely eliminate network latency.
Caching raw table data is a good choice for environments where one data source is significantly slower than the rest of the data sources. This improves the performance of the overall solution because it doesn’t make the middleware stand idle and cause latency. Materialized-view caching is best for when many clients send identical requests and clog production systems with requests that invoke identical responses. In this case, the data virtualization middleware will execute the first client request against the production systems, and then cache it. Because it doesn’t discard the returned result-set, the subsequent client requests will be fulfilled by the cache system instead of the production systems. Procedural caching should be used if one of the data sources is a web service with long or unpredictable response latency. In this solution, the data virtualization middleware optimizes the overall performance by caching the result-sets from the web service sources based on the passed parameters.
This is an implementation for more complex deployments. The Cluster Cache can handle heavy client request loads by clustering the data virtualization middleware into multiple nodes. Be aware that the middleware clustering increases the load on the production data sources, because each client request is executed against the production data sources. A caching system in a clustered environment could have a significant impact on the solution’s performance and on offloading stress from the production systems.
A Distributed Cache is best for environments with one or more clients located remotely. These systems usually have a central cache repository with multiple remote caches to service requests from the remote clients without increased latency. The edge caches don’t need to be a full copy of the central cache because the edge caches monitors remote client requests and replicate only the portions of the central cache that are relevant to each request. Changes to the central cache are copied dynamically to the edge caches without a wholesale re-sync.
Using specialized data virtualization middleware with high-performance caching is quickly becoming a popular method for reducing performance bottlenecks.