Why shouldn't I immediately retry a timed-out NCache operation?

Immediate retries can lead to further timeouts if the server is still processing the initial queue; it is recommended to extend the client-request-timeout value instead.

How does State Transfer affect NCache operations?

During state transfer, nodes rebalance data bucket-by-bucket. Operations on specific locked buckets are logged and synced, ensuring data consistency after the transfer completes.

Error Handling in Cache

There are scenarios where NCache client operations may fail due to various errors, as demonstrated below.

Specified Key Exists in Cache

Type of Exception: OperationFailedException
Message: "Specified key already exists in cache."
Reason: This exception is specific to the Add API if the same key is used with the object. The operation only adds an item to the cache if the given key does not exist previously. It adds data to the cache for the first time when the key does not exist within the cache.

You can handle this in the following ways according to your business requirements:

Specify a new key for the object you are trying to add.
Use Insert API if the same key is used with the object.
Use Remove API if the same key is used with the object. Remove the object first and then add it to the cache with the same key.

Connection Failure

Type of exception: OperationFailedException
Message: "The operation has been interrupted due to a loss of connectivity between the server and the client."
Reason: Operations can fail in case a connection is lost between the server and the client. This could be due to a network glitch or a physical interruption between the client and server.

You may encounter this exception if an already established connection is broken with the server. In such a case, it is advised to resolve the network cause first and then retry the operation if it is highly needed.

No Server Available to Process the Request

Type of exception: OperationFailedException
Message: "No server is available to process the request".
Reason: You may encounter this exception if the cache is unavailable due to the cache being stopped or inaccessible.

Operation Timeout

Type of Exception: OperationFailedException
Message: "Operation timed out."
Reason: In case of heavy transactions on the server, incoming client operations can be queued while the server processes the current operations. Each cache client operation has a configurable timeout value associated with it. This means that if the cache server does not respond to the operation within the given time, the operation will be considered as failed even though it has not really failed - or succeeded - for that matter.

It is not advised to immediately retry a timed-out operation as it may still result in timeout again if the server is still processing the previous operations. Hence, it is recommended that if your business allows, you extend the timeout value before retrying the operation.

In NCache, the default timeout value is 90 seconds during which a client operation must be completed. In case you want the operation to wait longer, you can configure the client-request-timeout value from the client.ncconf file.

Operation Performed During State Transfer

Type of Exception: OperationFailedException
Message: "Operation could not be completed due to state transfer."
Reason: Once the distribution state of the cluster changes, it requires immediate data consistency across the cluster, especially for Partitioned and Partition-Replica topologies. This change of state occurs due to the nodes joining or leaving the cluster. This means that once a node joins the cluster, the newly added node needs to be provided with the data of the cache according to the distribution. Similarly, if a node leaves the cluster, the data is re-distributed among the remaining nodes.

To cater to this issue, NCache internally initiates a state transfer - a node-level data transfer mechanism, that is triggered after the distribution of data is changed across a clustered cache.

A state transfer can occur between two nodes of a single cluster, where data is transferred bucket by bucket. The requesting node initiates the state transfer by locking that specific bucket. Any operations that are performed on this bucket during state transfer are logged. These operations can result in duplicates or fewer results than expected, as the same bucket might exist on both nodes during transfer or none. Once a bucket is transferred completely, the logged operations are transferred to the requesting node.

It is advised to monitor State Transfer PerfMon counters for your cache after every change in the cluster. Once the counters depict that the state transfer is complete, you can proceed with performing operations on the cache.