Handling Failures in NCache
There are scenarios where client operations on NCache may fail due to various errors. Given below are the failure reasons and their handling in NCache.
Specified Key Existing in Cache
- Type of Exception:
Message: "Specified key already exists in cache."
Reason: This exception is specific to Add API if the same key is to be used with the object. operation where its distinctive behavior is that it only adds an item to cache if the key does not already exist. Add is used to add data to the cache for the first time, when the key does not exist in the cache.
You can handle this in the following ways according to your business requirement:
- Specify a new key for the object you are trying to add.
- Use Insert API if the same key is to be used with the object.
- Remove API if the same key is to be used with the object. Remove the object first and then Add it again to the cache with the same key.
- Type of exception:
Message: "Operation has been interrupted due to loss of connectivity between server and client."
Reason: Operations can fail in case connection is lost between the server and client. This could be due to a network glitch, or a physical interruption between the client and server.
You may encounter this exception if the client is unable to establish a connection with the server or if an existing connection is interrupted while transactions were being made on the cache.
In such a case it is advised to resolve the network cause first and then retry the operation if it is highly needed.
- Type of Exception:
Message: "Operation timed out."
Reason: In case of very heavy transactions on the server, incoming client operations can be queued while the server processes the current operations. Each cache operation has a configurable timeout value associated with it. This means that if the cache server does not respond to the operation within the given time, the operation will be considered as failed even though the operation has not really failed - or succeeded - for that matter.
It is not advised to immediately retry a timed out operation as it may still result in timeout again if the server is still processing the previous operations. Hence, it is recommended that, if your business allows, you extend the timeout value before retrying the operation.
In NCache, the default timeout value is 60 seconds. In case you want the operation to wait longer, you can change the operation timeout value according to your business needs. This can be done by:
NCache Web Manager
Launch NCache Web Manager by browsing to http://localhost:8251 (Windows) or
<server-ip>:8251(Windows + Linux).
In the left navigation bar, click on Clustered Caches or Local Caches, based on the cache to configure.
Against the cache name, click on View Details.
This opens up the detailed configuration page for the cache. Go to the Advanced Settings tab and click on Cluster Settings in the left bar.
Change the Operation timeout.
Click on Save Changes to apply this configuration to the cache.
Manually editing configuration
Open config.ncconf file located at %NCHOME%\config. %NCHOME% is NCache install directory.
In config.ncconf of EACH server of your cluster, specify the
operation-timeoutkey in the
<cache-config> <cache-settings ...> <cluster-settings operation-timeout="60sec"> ... </cluster-settings> </cache-settings> </cache-config>
Operation performed during state transfer
- Type of Exception:
Message: "Operation could not be completed due to state transfer."
Reason: Once the distribution state of the cluster changes, it requires immediate data consistency across the cluster, especially for Partitioned and Partitioned-Replica topologies. This change of state is caused due to nodes joining or leaving the cluster. This means that once a node joins the cluster, the newly added node needs to be provided with the data of the cache according to the distribution. Similarly, if a node leaves the cluster, the data is re-distributed among the remaining nodes.
To cater to this issue, NCache internally initiates state transfer - a node-level data transfer mechanism which is triggered after distribution of data is changed across a clustered cache.
State transfer can occur between two nodes of a single cluster, where data is transferred bucket by bucket. The requesting node initiates state transfer by locking that specific bucket. Any operations that are performed on this bucket during state transfer are logged as the operation at this moment can result in duplicate results or less results than expected, as the same bucket might exist on both nodes during transfer or none. Once bucket is fully transferred, logged operations are transferred to the requesting node.
It is advised to monitor State Transfer PerfMon counters for your cache once a change is cluster is made. Once the counters depict that state transfer is complete, you can proceed to perform the operations on the cache.