Applications today are high transaction and they can scale for high transactions by running in a load-balanced, multi-server deployment. At the same time, many of these applications need to run in multiple data centers. This could be done for disaster recovery where one active data center has disaster recovery as a passive data center, which is usually in a different geographical location. Another reason is that if users of this application are distributed graphically, they would get faster response times as they are simultaneously accessing their own data center in their region.
Cache Needs WAN Replication Just Like Database
When you have applications running in multiple data centers, the cache needs to be replicated. This is because the cache is keeping transient data and if the cache ever goes down, the data is lost. Transient data could be your ASP.NET sessions, any data the application has created, any aggregate data but since transient data is not permanent data, it never gets saved in the database and is only kept in the cache.
The other data NCache keeps for performance reasons is the application data. If you lose that data, it is reloaded from the database but it has a huge performance impact negatively that many businesses are not willing to incur because of the high traffic on the applications. This is why even the application data has to be replicated across the WAN if you have a multi-datacenter deployment.
NCache Details Cache Operations in NCache
Solution: NCache WAN Replication through Bridge
To address the aforementioned situation, NCache provides a WAN replication feature through a bridge that allows you to deploy NCache in multiple data centers in an active-passive, active-active or 3+ data center active-active configuration.
2-Site Active-Passive
In an active-passive configuration, NCache is deployed on both the active and passive sites and a bridge topology is created on the active site. Figure 1 shows how this is laid out in an active-passive configuration. All the application updates are sent from the active site cache to the bridge, which then sends them to the passive site in an asynchronous fashion but within milliseconds. The only delay in this is the latency between the data centers if the data centers are far apart. That’s why the bridge topology having an asynchronous queueing mechanism allows you to make all these updates without overflowing anything.

2-Site Active-Active
The other configuration NCache supports is active-active where both sites are active – one keeps the cache and the bridge topology and the other only keeps the cache. Figure 2 shows how that is laid out. Just like active-passive, in active-active, the cache sends its updates to the bridge, and the bridge sends it to the other cache. However, now there is a difference: there is a possibility of conflict if the same item is being updated on both sites at the same time. This means that the bridge has to reconcile updates from both sites, and it resolves the conflict asynchronously so it does not have any negative impact on the cache performance of each of the active sites.

3+ Site Active-Active
The third situation is where you have three or more active data centers. Here, one of the active sites has the cache plus the bridge and all the other sites only have the caches. This means that the bridge is local to one of the sites but the other two sites access the bridge remotely.
Just like the active-active scenario, in this 3+ active-active scenario, all the caches send their updates to the bridge, and the bridge propagates the updates to all the caches that are connected to it. At the same time, it does conflict resolution to ensure that the same data being updated by the caches does not cause any data integrity violation.
Figure 3 shows you three active-active data centers. NCache allows you to replicate the cache in a very seamless and asynchronous fashion, there are no application code changes, your application doesn’t even know that the cache is being replicated, but the cache is being replicated asynchronously across the WAN across the data centers.

NCache Details WAN Replication in NCache Bridge for WAN Replication Docs
Parallel & Bulk Async WAN Replication
All of the updates to the caches are done in an asynchronous fashion. The reason they’re asynchronous is large distances between multiple datacenters. That distance can cause performance slowdown because of the latency. If the application in the cache is in the same data centers, the access time across servers is very fast; but if you go across the WAN it’s usually very slow. So, if you don’t do async, synchronous updates mean that whoever makes the update request has to wait until the update gets done and the applications really slow down.
However, when you have an async replication, it means the application and the cache in each site do not wait for that data to be replicated to other data centers. The data gets queued up in the bridge. The bridge actually itself is a 2-node cluster and it queues all of the update requests from all the caches. In the case of active-passive, it queues the requests from just the active cache and then it applies it in an asynchronous fashion to the passive cache. If you have 3 or more data centers, the bridge applies the updates in parallel to multiple active sites.
Moreover, the bridge does bulk updates, which means it can combine multiple data items as one request and send it as one bulk request to the other site, drastically reducing network trips. Therefore, this power-packed combination of parallel, bulk, and async replication results in faster performance during replication of the cache across multiple datacenters.
Conflict Resolution
When you have multiple active sites, the same data could be updated on each of those sites at the same time. Make sure that your local machines are synced with the local time zones. Obviously, if it is not updated at the same time, then there’s no problem. Let’s say at time T1 if you update item 1, and at time T2 you update item 2, the time T2 update is the latest update to be applied. However, if at the same time both updates are done, then the bridge has to resolve the conflict in one of two ways:
- Default “Last Update Wins” Logic: The bridge automatically gets the timestamp from each of the caches where all the caches have synchronized their clock to make sure that the time is the same. The bridge receives the update time from each of the caches and whichever was the latest update that’s going to be applied to all the caches. Again, the conflict resolution’s goal is that the same version of the update is applied to all the caches to prevent data integrity problems.
- Conflict Resolution Handler: Another way that the bridge does conflict resolution is it allows you to provide a conflict resolution handler according to your own logic which analyzes the content of the updates. This conflict resolver is configured in NCache so the bridge gives you both copies of the objects and the conflict resolver analyzes which version makes more sense according to the provided logic. This then returns the finalized version to the bridge, which applies that update to all of the caches.
The following code snippet gives you a sample of the conflict resolver implementation which is deployed on the cache:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | public class Resolver : IBridgeConflictResolver { public void Init(System.Collections.IDictionary parameters) {. . .} public ConflictResolution Resolve(ProviderBridgeItem oldEntry, ProviderBridgeItem newEntry) { var conflictResolution = new ConflictResolution(); switch (oldEntry.BridgeItemVersion) { case BridgeItemVersion.OLD: { /* Replace item with new entry */ } break; case BridgeItemVersion.LATEST: { /* Keep old entry */ } break; case BridgeItemVersion.SAME: { /* Your logic to replace entry if needed */ } break; } return conflictResolution; // Configure this implementation on cache } public void Dispose() {. . .} } |
Hence, by having a conflict resolution mechanism in NCache, you are rest assured that your cache will never be out of sync, it will never have 2 different versions of the data updates and it will always be consistent across multiple datacenters.
Handling Disaster at Runtime
Now let’s talk about what happens in case of a disaster situation where one site goes down.
Active-Passive
Scenario 1: The passive site goes down
The active-passive configuration is actually intended as a disaster recovery site. So, if the passive site goes down, the active site keeps working and the application keeps running. The bridge is not able to replicate the data to the passive site until you intervene and bring the passive site back up. Once you bring the passive site back up, it reconnects with the bridge and re-syncs itself so it throws away its existing cache and essentially gets an entire copy of the active site cache through the bridge. Once the synchronization is completed, it goes into the normal WAN replication mode.
Scenario 2: The active site goes down
If the active site goes down, that means you have some sort of a disaster, because the bridge is down and the application is probably down. You have to send all the traffic now to the passive site which now becomes active. Your users will not see any interruption because all the data was being replicated from the original active site to the original passive site. The original passive site is now active, so all of the updates are being done here but the users are not seeing any interruptions.
However, once the original active site is up again, it connects to the new active site (the original passive site) and synchronizes itself completely. Once the sync is done, both of these are active-active even though all of the traffic is going to the original passive site. You have the flexibility of shifting all of this traffic to the original active site and change the status of the acting active site back to passive in the bridge. NCache allows you to do all this at runtime without any interruption in case of an active-passive disaster.
Active-Active
If the active site with the bridge goes down, WAN replication stops because the bridge has stopped. However, the other active site continues working and all your traffic is routed to that site. Once you bring the down site up, you can start the bridge and connect the active site to the bridge so both sites are synchronized. This all done at runtime, so no downtime is required and once they are synced, the bridge ensures that both sites are able to propagate updates to each other. Hence NCache ensures fault tolerance is achieved.
3+ Active Sites
The third situation is when you have 3 or more active sites where one site has the bridge, and the others do not. Two scenarios can take place in this case:
Scenario 1: The non-bridge active site goes down
In this case, the other sites keep replicating with each other and the traffic of this site is re-routed to the other connected sites without the users being interrupted. Once the site is up, it reconnects to the bridge, re-synchronizes itself and thus gets the latest copy of the cache. This is the cue to start sending traffic to it again just like you had originally.
Scenario 2: The active site with the bridge goes down
In the case where the bridge goes down, the other two active sites keep working but they are not able to replicate their updates to each other because there’s no bridge. So, what you can do at this time is to start a bridge in one of the other two active sites.
To start the bridge, you need 2 nodes as a cluster. Ideally, you should have a dedicated set of servers for the bridge topology on that site, but you can also use two of the cache servers as a cluster since it’s most likely temporary. However, it may have some performance impact on those cache servers because now the bridge also consumes some resources like CPU and memory.
You need to start the bridge on one of the active sites where both the active sites are already running. The running cache can now connect to a bridge. So, they both connect to a bridge and synchronize by replicating all the updates with each other. The users don’t experience any interruption as the two sites are synchronized and propagating updates with each other. Hence, if any site goes down, you don’t lose any data.
Now once the original site with bridge comes back up, you can simply:
- Take the bridge down on the temporary site
- Remove the bridge from the existing cache
- Bring the bridge up on the original site
- Connect all the caches to this new bridge
Since caches can be connected to bridge at runtime, NCache allows you to automate all of this work through scripting so you can seamlessly handle the situation where a bridge site goes down and you have to bring it back up.
Conclusion
NCache gives you a very powerful WAN replication mechanism which allows you to handle a lot of different advanced scenarios. Moreover, it performs WAN replication in a very fast and efficient manner and handles active-passive, active-active or 3 or more active-active data centers.