Self-Healing Dynamic Clustering - TayzGrid

TayzGrid has a peer-to-peer cluster architecture. This architecture provides a self-healing dynamic clustering capability. TayzGrid creates its own TCP based dynamic cluster of cache servers instead of using OS-based clustering. This dynamic cluster provides 100% uptime. It also allows addition or removal of servers from the data grid cluster at runtime.

The following capabilities are added through dynamic clustering:

  • 100% cluster uptime
  • Addition or removal of TayzGrid servers at runtime without stopping the cluster
  • Load balancing of client connections to TayzGrid servers (at the time of connection)
  • In case of server going down, clients automatically reconnect to a different data grid server

Peer to Peer Dynamic Cluster

TayzGrid provides a 100% uptime and dynamic clustering capability for In-Memory Data Grid cluster. This is possible because there is no single point of failure in the cluster ensured by peer to peer architecture.

A collection of one or more data grid servers is called In-Memory Data Grid cluster. In a cluster, every server is connected to every other server. In-Memory Data Grid cluster always contain a cluster coordinator which is the oldest server in the cluster (the first server in the cluster). The job of cluster coordinator is to manage all memberships to the cluster. If the coordinator node goes down, this role is passed on to the oldest server amongst the rest in the cluster. This way, any single point of failure is removed from cluster membership management.

TayzGrid Peer to Peer Dynamic Cluster

Runtime Discovery within Cluster

A new data grid server can be added to the data grid cluster on runtime using runtime discovery algorithm. This algorithm needs the new server to have knowledge of at least one other server in the cluster when it starts. Usually, multiple data grid servers are listed in its configuration file. It creates a connection with any one of the listed servers. Then, it inquires about the identity of the cluster coordinator from other server. After finding the cluster coordinator, it requests the coordinator to be added in the membership list of the cluster.

The coordinator adds this new server to the cluster membership list at runtime. Than it informs all the other servers in the cluster about the new server joining the cluster. Then, it provides information about the existing cluster members to the new server. After that, the new server connects with all the other servers in the cluster using TCP connections. Therefore, an In-Memory Data Grid server can join the cluster without prior knowledge of all the servers in the cluster.

If the new server cannot find any other server when it starts than it becomes the cluster coordinator forming a new cluster.

Runtime Discovery by Clients

For connecting to the data grid cluster, the client (local or remote) needs to know about only one of the data grid server.

After establishing the connection, the client receives information about the host from the server it is connected to at runtime. This information helps the client to determine data grid servers it needs to connect to. It also helps determine how to access the data. At runtime, the client can get the following information from data grid servers:

  • Cluster membership information: This information is sent to the client in two cases. First, when it connects to any one of the data grid server. Secondly, in the case of any change in cluster membership. This means that after any addition or removal data grid servers from the cluster, client is informed.
  • Caching topology information: This information is sent to the client after it connects to the server. This information is useful for the client to determine the cache server(s) it needs to establish connection with. More details about this can be found at data grid topologies page of this website.
  • Data distribution map: The data distribution map is provided only if the caching topology is Partitioned Cache or Partition-Replica Cache. This information is useful for clients to determine the location of some data on cluster. This way, The client can directly access the data from the cache server. The data distribution map is provided in two cases. First, at the time of client’s connection to the server. Secondly, if any change occurs in the partitioning map because a new server has been added or removed from the cluster.


Conclusively, you need to specify only one data grid server name in the data grid client configuration file. Although it is recommended that you specify as many servers as you can for redundancy purpose. The data grid client immediately receives information about all the other servers in the cluster if it connects to any one server in the cluster. Then, it can then decide if/whether to make more connections depending on the data grid topology.

This dynamic configuration propagation simplifies the data grid client configuration. This is because most of the information is either being kept in the data grid server configuration or within the data grid cluster at runtime.

Failover Support: Add/Remove Servers at Runtime

TayzGird’s self-healing dynamic clustering provides complete failover support. TayzGrid supports two types of failovers:

  • Failover support within cluster: The In-Memory Data Grid cluster is self-healing and automatically adjusts itself if a cluster change occurs. This means that cluster membership information is updated and propagated to all the servers within cluster so each data grid server can update its connections to all other data grid servers.
  • Failover support for cache clients: All the clients connected to a data grid automatically adjust themselves if a data grid server is added or removed. All clients of a particular server connect to some other data grid servers if that server is removed from the cluster. Similarly, all clients get information about a new data grid server and can choose whether to connect to it.

dynamic cluster

Failover support allows addition or removal of data grid servers from the cluster at runtime without stopping your application. TayzGrid ensures uninterrupted and lossless execution of your application. Data loss is prevented through various replication schemes provided by TayzGrid. These replication schemes have been discussed on the data grid topologies page.

Local and Remote Clients

TayzGrid cluster can be accessed through different types of clients including local clients, remote clients or a combination of the two. Remote clients access the data grid from across the network, outside of the data grid cluster. Local clients access the data grid from same cluster but in a separate process.

ncache local clients

Local client libraries are automatically installed on the data grid server machines. On the other hand, the remote client libraries need to be installed separately on the web application server.

Hot Apply Configuration Changes

TayzGrid allows addition or removal of data grid servers at runtime. It also allows change in some of the data grid configuration information at runtime. These changes can be Hot Applied, which means that these changes can be made without stopping the data grid or application servers.

What to Do Next?