Overview

In IT industry, one has to deal with large scale applications which should be live for 24/7. Proper management of your application and millions of customers with high risk of internal or external failure is important. For critical applications, the disaster risk of any kind can’t be ignored and its recovery plans must be considered very carefully. Disaster can be of any type like natural disasters or internal hardware or software failure.

For large scale applications, distributed caches are used to improve their performance, reliability and runtime scalability. So distributed caches can be a very important part of disaster recovery plan. The best and most frequently used disaster recovery plan is live data replication to other backup site. So that when needed, the live users can be redirected to backup site without any error. For that purpose, the active and backup caches should always be in synchronized states. If they are not synchronized, then it can affect the application's cache clients.

Data replication also solves the problem if along with disaster recovery plans the application needs to be deployed on geographically separated regions for widely spread customers. In such a case, there will be two or more active sites which will deal with users of related regions and can also be used as backup of other region sites.

Data replication can be done in two ways, synchronously and asynchronously. But for WAN replication, asynchronous replication is mostly preferred.

NCache provides WAN replication through Bridge feature. A bridge is created between two cluster caches and data is replicated from source to other site through that bridge.

Pluggable Caching Architecture: Caches are not aware of each other they just know about bridge and replicate their data to bridge. Due to this loose coupling, the bridge can be configured between any two caches irrespective of their cache topology. Also caches configured with bridge can be freely removed.

Data Integrity: Operations performed at source cache are en-queued by the bridge maintaining the actual order in which they were performed at source cache. Bridge performs operations on target cache in the same order. Conflicts are resolved on target cache. In this way, caches becomes eventually consistent.

Dedicated Bridge Service: Bridge is also a stand-alone and dedicated service like cache service so cache operations will not be affected if any bridge operation is delayed due to latency in network.

Configuring Bridge: The bridge can be configured either on the same server where the cluster cache resides or on a separate server node. When both cluster caches are added into bridge, data is replicated between them.

Disaster Recovery: Bridge can be configured between an active and passive data center for disaster recovery.

Dealing Geographically Spread Customer: There can be two active sites which will deal with users of related regions and can also be used as backup of other region sites.

Asynchronous Replication: For WAN replication, asynchronous replication is used so that cache operations will not suffer in case of delays in bridge operations.

Queue Backup: Bridge is basically a two node clustered queue in which one node is active and the other is passive having backup of active queue to avoid data loss on bridge.

Connection Retries: Bridge also tries to replicate all operations by retrying when any connection failure occurs.

Configurable Queue Size: Bridge and target replicator queue size is configurable. The size of bridge queue can be configured by analyzing the cache load because every update will be replicated to bridge queue and if its size is small, then queue will become full resulting in data loss.

Bridge caches : Any topology can be used for cluster caches that will be part of the bridge. Even two different topology caches on each site in one bridge can exist too. The convention, however, is to use the same topology on both sites.

It is recommended that bridge caches have same configurations other than topology to avoid issues. For example if data source is configured on one cache you should configure it on other cache too. Because same operation specifications from one site cache will be replicated to other site.

Send comments on this topic.