In-Memory Data Grid Topologies - TayzGrid

In-Memory Data Grid topology plays a vital role in the data storage strategy of a cluster. Based on the environment, TayzGrid lets you choose from a rich set of In-Memory Data Grid topologies. The goal here is to manage from very small (two-server) data grids to very large data grid clusters consisting of hundreds of servers. TayzGrid uses the TCP protocol and forms its own data grid cluster. It provides the following data grid topologies:

  1. Mirrored Cache (2-server active/passive)
  2. Replicated Cache
  3. Partitioned Cache
  4. Partitioned-Replica Cache
  5. Client Cache (a Local Cache connected to a Clustered Cache)

Find more details about the above mentioned data grid topologies below:

Reference Data versus Transactional Data

Reference data is stored in the data grid once and read over and over again. This results in a lot more reads than writes. In contrast, transactional data is updated as frequently as it is being read (or very close to it). Originally, data grids were considered to be only good for storing reference data. But now In-Memory Data Grids are proving to be faster and more scalable than even databases for transactional data.

An example of reference data can be taken as a product catalog that is stored in the data grid where product prices might change only once a day. In contrast, Web Session persistence in a In-Memory Data Grid is an example of transactional data use.

Reference data can be handled well with any data grid topology but only some topologies are optimal for handling transactional data. To determine the best topology for your environment, you need to determine the number of updates you'll be performing on your data. Here are a few recommendations:

  1. Mirrored Cache:
    • Reference data: Good. Max 30000 reads/sec (1k size).
    • Transactional data: Good. Max 25000 writes/sec (1k size).
  2. Replicated Cache:
    • Reference data: Good. 30000 reads/sec per server. Grow linearly by adding servers.
    • Transactional data: Not so good. Max 2500 writes/sec. Drops if you add 3rd server.
  3. Partitioned Cache (no replication):
    • Reference data: Good. 30000 reads/sec per server. Grow linearly by adding servers.
    • Transactional data: Good. 25000 writes/sec per server. Grow linearly by adding servers.
  4. Partition-Replica Cache (with replication):
    • Reference data: Good. 30000 reads/sec per server. Grow linearly by adding servers.
    • Transactional data: Good. 20000 writes/sec per server. Grow linearly by adding servers.

As you can see that for the 2-server cluster, the performance for reference data for the Mirrored Cache and Partition-Replica Topologies is the same. If you have one dedicated data grid server and the replication server is being shared with other apps, than the Mirrored Cache topology is suitable. But, if you have a need to use three or more data grid servers, then Partition-Replica Cache is the best choice for transactional use.

Mirrored Cache

A Mirrored Cache is an active/passive cluster of 2 data grid servers. In this case, all the clients only connect to the active data grid server and perform read and write operations on it. All updates that are performed on the data grid (like add, insert, and remove) are also performed on the passive server. These changes are performed in the background and in bulk. This means that the clients don't need to wait for the updates to be made on the passive server. After updating the active server, the data grid returns the control to client. The passive server is updated by a background thread.

Mirrored Cache

Due to this, Mirrored Cache gets a significant performance boost over a Replicated Cache of the same cluster size. Mirrored Cache is about as fast as a Local Cache which has no clustering cost. Also, it is more reliable due to data replication for the case when the active data grid server goes down.

If the active server goes down, the passive server automatically becomes active. All clients of the old active server automatically connect to the new active server. This process takes place without any interruption to your application. When the previously active server comes back up, this will now become the passive server since an active server already exists.

Replicated Cache

Two or more In-Memory Data Grid servers form a Replicated Cache cluster. Each one of the data grid servers contains all of the data in the data grid. Any update performed on any server is propagated in a synchronous manner to all the other servers within the cluster. In the Replicated Caching topology, all updates to the data grid are made as atomic operations. This means that either all data grid servers are updated or none are updated.

Replicated Cache provides extremely fast GET performance. This is because every data grid server that the client connects to always has all the data. Therefore, data is locally available for all GET operations on that data grid server. This boosts the GET speed. However, the cost of an update operation is not scalable if you add more servers to a Replicated Cache data grid cluster.

Replicated Cache

A Replicated Cache is ideal for reference data use because the read capacity can be increased linearly by adding more servers to the cluster. On the other hand, the performance is not as fast as other data grid topologies for transactional data. But, it can be used in small configurations (e.g. 2-server cluster) and if the overall load is not very huge. The performance of UPDATE operations actually drops for larger configurations.

A sequence-based synchronization algorithm is used to perform all UPDATES in a Replicated Cache. When a client issues an update request to a data grid server, that data grid server contacts the cluster coordinator first in order to obtain a unique sequence number. Then, it submits the UPDATE operation to all other servers in the cluster with the provided sequence number.

All update operations are based on the sequence number to eliminate possible data integrity issues. Therefore, if an update operation reaches a data grid server for execution but an earlier operation has not yet been performed, this update waits until all operations with smaller sequence numbers are performed. This way consistency of updates across multiple machines is ensured.

Partitioned Cache

A Partitioned Cache is a very scalable data grid topology and is therefore intended for larger In-Memory Data Grid clusters. The cost of a GET or UPDATE operation is fairly constant regardless of the size of the data grid cluster. There are two reasons for this. First, a Hash Map algorithm (similar to a Hashtable) partitions the data. As a result, a distribution map is created and sent to all the clients. This distribution map tells the clients which partition has the data or should have the data. This allows the client to directly fetch the data it is looking from the data grid server that has it.

Partitioned Cache

Second, all UPDATE operations are performed on only one data grid server and thus no sequencing logic is required. Obtaining a sequence adds on extra network round-trip in most cases.

Therefore, the UPDATE operations are much faster than GET operations in this topology. The UPDATE operations remain fast regardless of the size of the data grid cluster. This makes Partitioned Cache a highly scalable topology.

However, please remember that no replication takes place in Partitioned Cache. Therefore, if any data grid server goes down, you lose the data on that node. This might be acceptable in many object caching situations but not in the case when the In-Memory Data Grid is your main data repository without the data existing in any master data source. An excellent example of this is Web Session persistence in the data grid.

Partitioned-Replica Cache

Partitioned-Replica Cache is a hybrid of Partitioned Cache and Replicated Cache. It contains the best features from both topologies, that is, reliability through data replication and scalability through data partitioning. If you have more than 2 data grid servers in the cluster, you only replicate the data once instead of replicating it on every other server. This means that only two copies of the data exists regardless of the size of In-Memory Data Grid cluster. This allows scaling out through partitioning.

Data distribution in Partitioned-Replica cache is performed using same Hash Map algorithm as Partitioned Cache. Each partition is replicated to another server in the cluster as a "passive Replica". This means that every server contains one active partition which is accessible to the clients. The same server also contains one passive replica of another server's active partition which is not directly accessible to any client as long as it is in the "passive" mode. This Replica can only be accessed by its active Partition. If the active Partition goes down (for example if the server goes down), then its replica becomes an active partition and handles client requests.

Partitioned-Replica Cache

Partitioned-Replica replicates data in two ways - asynchronously and synchronously. By default, data will be replicated asynchronously which is very fast. The reason for this speed is that it does not require the user to wait for an update operation on a replica to be completed before regaining control.

In this methodology, all updates to be performed on the replica are queued up on the active partition server. Then, a background thread performs updates to the replica in bulk. Even though this is extremely fast, there is still a possibility of very small data loss. This can happen if the active partition server crashes unexpectedly, a small amount of data that hasn't yet been updated on the replica will be lost. This is acceptable in most cases but in some applications, data is too sensitive to be lost (for example in financial data applications).

In that case, a synchronous replication is also provided for Partition-Replica Cache. In synchronous replication method, both active partition and its replica are updated for each atomic operation. This affects the performance a little but is still much faster and more scalable than Replicated Cache.

Partition Replica Cache is much faster than Replicated Cache. This happens because sequence-based data synchronization is not required for data replication. The replica is passive and only the active partition updates it.

In Replicated Cache, UPDATE performance and capacity drops with the increase in number of servers in the cluster. Same does not hold true in Partition Replica cache. Because, only two copies of the data are kept in the cluster regardless of its size. Therefore, adding more servers increases UPDATE capacity in a linear fashion.

Client Cache

A Client Cache is the same as Near Cache and is local to your web server or application server. It stores frequently used data even closer to your application than the In-Memory Data Grid cluster. Therefore, a Client Cache boosts performance and scalability of your application even further. A Client Cache works independent of the clustering topologies residing in the In-Memory Data Grid (like Mirrored, Replicated, Partitioned, and Partition-Replica Cache).

Best use of Client Cache is for reference data only (therefore, it must not be used for Web Session persistence). As both the Client Cache and the In-Memory Data Grid cluster are being updated here. Therefore it is slightly slower than if you didn't have any Client Cache. However, the reads from Client Cache are significantly faster than from the In-Memory Data Grid cluster (particularly if InProc setting for the Client Cache is enabled).

Client Cache

The Client Cache is a stand-alone local cache on the web server or application server. The only difference is that Client Cache is also synchronously connected to In-Memory Data Grid cluster. This synchronization makes sure that the Client Cache is always notified of any changes in data residing in the data grid. For example, if some data is updated in the data grid cluster by some other application or application instance, then the In-Memory Data Grid cluster must notify the Client Cache to update its data.

A Client Cache is not part of the data grid cluster. Therefore, the Client Cache is notified about the updates that already happened in the cluster after a very small period of time (a few milliseconds). This can create a situation where data in the Client Cache is not consistent with the In-Memory Data Grid cluster. For handling this, two types of synchronizations within Client Cache the clustered cache takes place:

  1. Optimistic Synchronization: In this approach, the application is flexible about the data integrity of the Client Cache. Therefore it fetches whatever data is present in the Client Cache. As a result, the data in the Client Cache may slightly be old if the In-Memory Data Grid update event notification hasn't yet been received by the Client Cache.
  2. Pessimistic Synchronization: In this approach, the application applies strict data integrity rules. Therefore, it always issues a "GetIfNewerVersion()" call to the In-Memory Data Grid cluster before reading data from the Client Cache. If no new version of the cached item is available, than "null" is returned. But if data has changed in the data grid cluster then the updated data is returned. The performance is slightly slower than optimistic synchronization but it is still faster than having no Client Cache.

What to Do Next?