Distributed Cache Architecture - NCache

Learn about the distributed cache architecture of NCache. NCache is a distributed cache for .NET and Java. It is an open source product released under the Apache 2.0 license. NCache helps applications linearly scale to handle extreme transaction loads.

NCache Architecture

Common Uses of NCache

There are three common uses of NCache as follows:

  1. Application Data Caching
  2. ASP.NET Web Applications Caching
  3. Runtime Data Sharing (RTDS)

Application Data Caching

Number one is scaling up the application data caching process. A distributed cache like NCache processes app data transactions in-memory, and across more than one cache server, so it can handle any load at extremely fast speeds. A database cannot do this, so it easily becomes an app data processing bottleneck. NCache is faster than a database (being in-memory) and scalable for any load (as it scales in a linear fashion), both of which are shortcomings of using a database without caching.

Web Applications Caching

The second common use case is for web applications whether they are ASP.NET or Java JSP. For ASP.NET apps you can cache the session state for either a single site or a multi-site configuration. Multi-site here means that if you have two data centers, a user can go from one data center to the next without losing their session; this is a feature that NCache provides.

For ASP.NET you can also cache viewstate. Viewstate is a string that is sent by the server to the browser and it can consume a lot of bandwidth, hence slowing down response times, sometimes significantly. By caching it on the server end, performance is improved and bandwidth costs are also reduced, resulting in cost savings.

Another feature in ASP.NET is output cache, where the page output is cached in NCache. The next time this page is called, instead of executing it, only the change on the page is returned.

For a java web application, you can cache the java servlet sessions or persist them in NCache the same way you would cache ASP.NET session state. None of these features require any programming, you just make configuration changes and NCache plugs in seamlessly.

Runtime Data Sharing (RTDS)

This is the third use case which is something a lot of people do not know about. You can use NCache as a very powerful runtime data sharing (RTDS) platform. NCache provides an event driven Pub/Sub data sharing model that has powerful event notification features and also continuous query features. The NCache Product Features page contains more information on Runtime Data Sharing (RTDS)

High Availability Through Self-Healing Dynamic Cluster

Let's now look at the self-healing dynamic clustering aspect of NCache. Figure 1 shows how NCache is deployed in the enterprise. NCache usually forms a cluster of two or more cache servers. Two is the minimum we recommend for redundancy purposes although NCache can work on a single server also.

Within the cache cluster, NCache pools the memory and CPU resources of all cache servers into one logical capacity. This means that if you add more servers you are increasing the memory and CPU capacity. This is how NCache scales in a linear fashion (which a database cannot).

In addition to having a minimum of two cache nodes, we also recommend a 4 to 1 or 5 to 1 ratio between the application tier and the caching tier. This is not a hard requirement but more of a recommendation based on what we have seen over the years as the most optimum deployment configuration.

NCache in the Enterprise

Figure 1: NCache in the Enterprise

you start using NCache for application data caching you should expect to reduce your database traffic by about eighty percent (80%), because that is how much data you will end up caching.

And because the cache never becomes a bottleneck (you can always add more servers as you are adding more traffic) your application never faces a scalability bottleneck.

Let's look at the self-healing dynamic cluster details below in Figure 2. When I use the word cluster I do not mean windows clustering. NCache has its own TCP based cluster and it has a peer-to-peer architecture. There is no single point of failure, master/slave or majority rule. NCache allows the addition or removal of any cache server(s) at runtime without stopping either the cache or the application. NCache also provides a connection failover support mechanism, which means if any cache server goes down, NONE of the clients connected to it stop; they are all able to connect to other servers in a seamless fashion. Moreover, the cluster membership is adjusted automatically, which is why it is called ‘self-healing’.

The dynamic configuration aspect of NCache means users do not have to hard code configuration information within the clients. As soon as the client connects to any cache server, the cluster sends the client information on the cluster membership, the topology and a bunch of other configuration information. In this way the client gets the most updated information when there are any changes. And all this at runtime.

High Availability Thru Self-Healing Dynamic Clustering

Figure 2: High Availability Thru Self-Healing Dynamic Clustering

For example, in Figure 2, if you want to add a new server (the third server in the bottom left of Figure 2) to the cluster, upon addition the cluster membership information will change, as there is a third server in the cluster now. The new membership information will be propagated to all connected clients (NCache clients) and the clients can connect to the newly added server if they want to. The word 'client' here does not mean your application, but rather it is the NCache client API, which has this intelligence built into it.

Caching Topologies

Let’s look at the caching topologies that are essential for data storage, client connection and replication strategies. There are four topologies we will be talking about.

  1. Mirrored Cache Topology
  2. Replicated Cache Topology
  3. Partitioned Cache Topology
  4. Partition-Replica Cache Topology

Mirrored Cache (2-Node Active/Passive)

As the name suggests, this is a two-node active passive topology, mirroring each other. In this topology, all the clients connect to the active node. The active node has the entire cache. The passive node has a copy of the cache. And the clients perform their read and write operations on the active node and whatever they update on the active node is asynchronously backed up to the passive node. If the active node ever goes down the passive node automatically becomes active and all the clients automatically move to the new active node. This is how high availability is provided.

The limitation of this topology is that you cannot add more than two nodes, but this has its uses where you do not have a lot of servers available, both reference and transactional data can be served quickly. Transactional data means there are a lot of updates (write operations) being done, but even in that case it is a very fast topology.

Mirrored Cache

Figure 3: Mirrored Cache

Replicated Cache

The second topology is called replicated NCache. In this topology you can have more than two servers in the cluster, so it does not have the limitation of the mirrored. All nodes are active which means they have their own client connections and every node has an entire copy of the whole cache. So more servers mean more copies of the entire cache.

This topology is great for read performance but the storage capacity does not increase by adding more servers. Moreover, since every node is active the updates have to be done in a synchronous manner which is a token or sequence based algorithm that NCache uses to ensure that all updates are done in the same sequence as received throughout the cluster before control is handed back to the client. This synchronous nature is what slows the updates down, so this topology is not recommended for very scalable update operations instead if you need more read scalability this is a very good topology.

Let me give you an example for this topology from Figure 4. If a client (let’s say the top left cache client in Figure 4) wants to update item number three on Server 1, it will go and ask Server 1 to update item number three. Server 1 will first of all obtain, from within the cluster, a unique sequence number.

Replicated Cache

Figure 4: Replicated Cache

This is the sequence number that puts this server in the queue in terms of all the updates that are going to be done in the other servers. Then it notifies all the other servers by passing the sequence number and all those servers in turn will update the same item on the same sequence.

That way if another client reaches that server first but got a later sequence number it will still have to wait even though it got there first. This is how you ensure consistency in updates, so once all the servers return success, the operation is considered successful and if any of the servers return a failure then the operation fails. For the entirety of this the client waits and for this reason it’s called a synchronous update, hence the updates are not as fast as other topologies.

Partitioned Cache and Partition-Replica Cache

The third and the fourth topologies are called partitioned and partition-replica cache. They are both identical with the only exception that the partition-replica has replicas of every partition.

Partitioned cache means that every server has partitions, and there is one partition per server. There are about thousand buckets in the cache cluster and every partition has an equal number of buckets or 1/nth the number of buckets. If you have two partitions, every partition will have 500 buckets and there is a distribution map which you can also think of as a bucket map that gets created at runtime and this contains information about which buckets are in which partition. This distribution map is propagated to all clients and they use this distribution map to connect to all the servers. Every client keeps a connection with all the servers and in this manner they can have a single hop or can go directly where the data is.

Using Figure 5 as a reference, if a client (let’s say the top left cache client in Figure 5) wants item number one, it is going to come to partition one on Server 1. If it wants item number three, it’s going to go directly to partition two on Server 2. With the combination of the distribution map, clients are able to do this.

Partitioned Cache

Figure 5: Partitioned Cache

The distribution map does not change unless you are adding or removing partitions or if you are doing data load balancing. So basically distribution map only changes when there is a move of buckets from one partition to the other.
Partitioned cache is extremely fast, it’s our fastest topology. Partition-replica is equally fast for read operations just like Partitioned cache. But for write operations it is slightly slower because there is a replica that has to be updated in every node cluster, synchronously.

For Partition-replica, every partition is replicated onto a different server. The replicated partitions ‘replicas’ are passive, which means no clients connect to the replica. All clients talk to active partitions and active partitions propagate to replicas. Replicas are created automatically in this topology which means as soon as you add a new partition, a replica for it has to be created somewhere in the cluster.

Partition-Replica Cache

Figure 6: Partition-Replica Cache

Asynchronous replication is the default of this topology between active and replica nodes. This means the copy from partition to replica is done in bulk operation which is really fast. Let’s say if you have financial data where you cannot afford to lose any data because in an async operation there is a queue where you could technically lose some updates if the server were to go down. The queue would go away and there will be some items that are left in the queue that did not make it to the replica.

In synchronous replication you do not have that risk because as part of an operation once you are updating the partition, it immediately updates the replica and only when both of the partitions are updated, that operation is completed otherwise an exception is thrown back. So, a synchronous replication ensures operation completion and data loss is one less thing to worry about.

Another thing to keep in mind the data balancing that we were talking about in case of a partitioned cache, it’s also done to the replica in this case. So when buckets and partitions are rearranged buckets and replicas also have to be rearranged because a replica is a copy of the partition.

Cache topologies page here briefs more information on this topic. Also, for WAN Replication Bridge Topology, read more details here.

Data Load Balancing

Let’s quickly see how data load balancing is done. When I am adding a new server to a two node existing NCache cluster which means there are two partitions and two replicas. Data load balancing request can be done on a node in a partitioned cluster without waiting for the automatic load-balancing task to trigger it. This brings data load on the node near to average data per node. Candidate nodes accepting the load are selected on the basis of the amount of data currently present on them. A node having less data gets a larger share from the load.

Partition-Replica Cache

Adding a third server at runtime creates a new partition and allocates 1/3rd of the number of buckets to this partition. So it gets some of them from Server 1 and some of them from Server 2 (assuming we are adding Server 3). After this, it creates a replica for this partition to be used on Server 1 so that the replica of Server 2 will be removed from Server 1 and relocated to Server 3. This all gets done behind the scenes and none of your operations are going to be interrupted whilst. And of course the movement of the buckets is highly optimized because there is no duplication of effort. So if some data had to come back to Replicas 3, it’s not going to move back and forth from Replica 1 to Replica 3.

Partition-Replica Cache

Client Cache

Let's now talk about another feature called Client Cache which is sometimes also called near cache. A client cache is essentially a local cache that sits on the application server or what we call the cache client. It can either be in-proc which means it is within the application process or it can be out-proc in which case it’s a separate process. In-proc is superfast and is not a standalone cache even though it’s a local cache its synchronized with the cache cluster.

The synchronization is based on event notifications so whatever is in the client cache, if that data changes in the clustered cache, let’s say by adding another client, the cluster cache knows which client caches have which data, so they will notify the specific caches only to go and update themselves with that data and that is how the client cache stays synchronized. The other aspect of the client cache is that there is no coding needed on your part to use the client cache.

You can use client cache for your program by making NCache API calls. Then within the configuration you can plug in the client cache that intercepts your calls to the clustered cache. This checks the client cache to see if it has the data and if it does not it will go get it from the clustered cache to put it in the client cache to give it to you. So, the next time you want that data it will find it in the client cache.

Client Cache

Figure 7: Client Cache

Client cache is a very powerful concept and it is very good for read intensive uses. It is basically a cache on top of a cache, just like a clustered cache is a cache on top of your database. For example, ASP.NET session state is not a read intensive use because it has almost equal number of read and write operations, so do not use Client ache for that. You should use Client cache for application data caching because you are reading a lot more than writing so it is a very good use case for client cache.

To find out more about Client Cache, please click here.

NCache Benchmarks

Going to our website benchmarks that were recorded in our QA labs which are not very high-end labs so the performance is most likely to be much faster in your environment which is what we see with our customers too.

We use 1 k of data size and a string key. These being the server side benchmarks which means that the capacity of a cache cluster with for example a two node mirrored cache you can do about 26,000 reads per second and 20,000 writes per second.

Mirrored Cache Benchmarks

With a two node replicated cache you can achieve 53,000 reads per second, twice as much as mirrored, but writes are only 6,000 per second as this because of the synchronous nature of the writes. With 3 Nodes, the writes are almost half the speed of 2 nodes.

Mirrored Cache Benchmarks

Partitioned cache has the fastest the reads with writes at almost the same performance of 54,000 for two nodes and 81,000 per second for three nodes, 108,000 writes per second for four nodes.

Mirrored Cache Benchmarks

For Partition-replica the reads are as fast as partition but the rights are slightly slower. They are still very fast with 32,000 writes per second for two nodes and 48,000 per second for three nodes and 64,000 per second for four nodes. These are linear increments so as you add more servers you are increasing capacity.

Mirrored Cache Benchmarks

Partition-replica with a synchronous replication the writes are not as fast as async replication but they are still pretty fast with 14,000 per second for two nodes, 20,000 for three nodes and 26,000 for four nodes.

Mirrored Cache Benchmarks

This is the end of my presentation and what I recommend is that you go ahead to our website and download NCache. When you are doing so, you can download the enterprise edition which is what I would recommend as it has a 60-day fully working trial and if you want to download the open source you can download it from here as well. Also, you can get the source code from GitHub or you can also contact us an online demo.

What to Do Next?