Managing Data Relationships in Distributed Cache

Author: Iqbal Khan

Introduction

A distributed cache lets you greatly improve application performance and scalability. Application performance is improved because an in-memory cache is a lot faster for data access than a database. And, scalability is achieved by growing the cache to multiple servers as a distributed cache and gaining not only more storage capacity but also more transactions per second throughput.

Despite such powerful benefits, there is one issue faced by many in-memory caches. And that has to do with the fact that most data is relational whereas a cache is usually a simple hash-table with a key-value pair concept. Each item is stored in the cache independently without any knowledge of any other related items. And, this makes it difficult for applications to keep track of relationships among different cached items both for fetching them and also for data integrity in case one item is updated or removed and its related items are also updated or removed in the database. When this happens, the cache does not know about it and cannot handle it.

A typical real-life application deals with relational data that has one-to-one, many-to-one, one-to-many, and many-to-many relationships with other data elements in the database. This requires referential integrity to be maintained across different related data elements. Therefore, in order to preserve data integrity in the cache, the cache must understand these relationships and maintain the same referential integrity.

To handle these situations, Cache Dependency was introduced by Microsoft in ASP.NET Cache. Cache Dependency allows you to relate various cached elements and then whenever you update or remove any cached item, the cache automatically removes all its related cached items to ensure data integrity. Then, when your application does not find these related items in the cache the next time it needs them, the application goes to the database and fetches the latest copy of these items, and then caches them again with correct referential integrity maintained.

This is a great feature in ASP.NET Cache but ASP.NET Cache is by design a stand-alone cache that is good only for single-server in-process environments. But, for scalability, you must use a distributed cache that can live outside of your application process and can scale to multiple cache servers. NCache is such a cache and fortunately provides the same Cache Dependency feature in a distributed environment. You can have cached items in one physical cache server depending on cached items in another physical cache server as long as they're both parts of the same logical clustered cache. And, NCache takes care of all data integrity issues mentioned above.

This article explains how you can use Cache Dependency to handle one-to-one, one-to-many, and many-to-many relationships in the cache. It uses NCache as an example but the same concepts apply to ASP.NET Cache.

Although, NCache provides various types of dependencies including Data Dependency, File Dependency, SQL Dependency, and Custom Dependency, this article only discusses the Data Dependency for handling relationships among cached items.

What is Data Dependency in cache?

Data Dependency is a feature that lets you specify that one cached item depends on another cached item. Then if the second cached item is ever updated or removed, the first item that was depending on it is also removed from the cache. Data Dependency lets you specify multi-level dependencies where A depends on B, which then depends on C. Then, if C is updated or removed, both A and B are removed from the cache.

Below is a brief example of how to use Data Dependency to specify multi-level dependency.

public static void CreateDependencies(ICache _cache)
{
    try
    {
        string keyC = "objectC-1000";
        Object objC = new Object();
        string keyB = "objectB-1000";
        Object objB = new Object();
        string keyA = "objectA-1000";
        Object objA = new Object();
        // Initializing cacheItems
        var itemOne = new CacheItem(objA);
        var itemTwo = new CacheItem(objB);
        var itemThree = new CacheItem(objC);
        // Adding objA dependent on ObjB
        itemOne.Dependency = new KeyDependency(keyB);
        itemTwo.Dependency = new KeyDependency(keyC);
        //Adding items to cache
        _cache.Add(keyC, itemThree);
        _cache.Add(keyB, itemTwo);
        _cache.Add(keyA, itemOne);

        // Removing "objC" automatically removes “objB” as well as "ObjA"
        _cache.Remove(keyC);
        _cache.Dispose();
    }
    catch (Exception e)
    {
        throw;
    }
}

Multi-Level Data Dependency


Data Relationships

The following example is used in this article to demonstrate how various types of relationships are handled in the cache.

Managing Data Relationships
Figure 2: Relationships in the Database

In the above diagram, the following relationships are shown:

  • One to Many: There are two such relationships and they are:
    1. Customer to Order
    2. Product to Order
  • Many to One: There are two such relationships and they are:
    1. Order to Customer
    2. Order to Product
  • Many to Many: There is one such relationship and that is:
    1. Customer to Product (via Order)

For the above relationships, the following domain objects are designed.

class Customer
    {
        public string CustomerID;
        public string CompanyName;
        public string ContactName;
        public string ContactTitle;
        public string Phone;
        public string Country;
        public IList<Order> _OrderList;
    }
    class Product
    {
        public int ProductID;
        public string ProductName;
        public Decimal UnitPrice;
        public int UnitsInStock;
        public int UnitsOnOrder;
        public int ReorderLevel;
  
        public IList<Order> _OrderList;
    }
    class Order
    {
        public int OrderId;
        public string CustomerID;
        public DateTime OrderDate;
        public DateTime RequiredDate;
        public DateTime ShippedDate;
        public int ProductID;
        public Decimal UnitPrice;
        public int Quantity;
        public Single Discount;
        public Customer _Customer;
        public Product _Product;
    }

As you can see, the Customer and Product classes contain an _OrderList to contain a list of all Order objects that are related to this customer. Similarly, the Order class contains _Customer and _Product data members to point to the related Customer or Product object. Now, it is the job of the persistence code that is loading these objects from the database to ensure that whenever a Customer is loaded, all its Order objects are also loaded.

Below, I'll show how each of these relationships is handled in the cache.

Handling One-to-One/Many-to-One Relationships

Whenever you have fetched an object from the cache that also has a one-to-one or many-to-one relationship with another object, your persistence code might have also loaded the related object. However, it is not always required to load the related object because the application may not need it at that time. If your persistence code has loaded the related object then you need to handle it.

There are two ways you can handle this. I will call one optimistic and the other pessimistic way and will explain each of them below:

  1. Optimistic handling of relationships: In this, we assume that even though there are relationships, nobody else is going to modify the related object separately. Whoever wants to modify the related objects will fetch it through the primary object in the cache and will therefore be in a position to modify both primary and related objects. In this case, we do not have to store both of these objects separately in the cache. Therefore, the primary object contains the related object and both of them are stored as one cached item in the cache. And, no Data Dependency is created between them.
  2. Pessimistic handling of relationships: In this case, you assume that the related object can be independently fetched and updated by another user and therefore the related object must be stored as a separate cached item. Then, if anybody updates or removes the related object, you want your primary object to also be removed from the cache. In this case, you'll create a Data Dependency between the two objects.

Below is the source code for handling the optimistic situation. Please note that both the primary object and both of its related objects are cached as one item because the serialization of the primary object would also include the related objects.

static void Main(string[] args)
{
    string cacheName = "myReplicatedCache";
    ICache _cache = CacheManager.GetCache(cacheName);
    OrderFactory oFactory = new OrderFactory();
    Order order = new Order();
    order.OrderId = 1000;
    oFactory.LoadFromDb(order);
    Customer cust = order._Customer;
    Product prod = order._Product;
    var itemOne = new CacheItem(order);
    // please note that Order object serialization will
    // also include Customer and Product objects
    _cache.Add(order.OrderId.ToString(), itemOne);
    _cache.Dispose();
}

Optimistic handling of many-to-one relationship

Below is the source code for handling the pessimistic situation since the optimistic scenario does not require any use of Data Dependency.

static void Main(string[] args)
{
    string cacheName = "myReplicatedCache";
    ICache _cache = CacheManager.GetCache(cacheName);
    OrderFactory oFactory = new OrderFactory();
    Order order = new Order();
    order.OrderId = 1000;
    oFactory.LoadFromDb(order);
    Customer cust = order._Customer;
    Product prod = order._Product;
    string custKey = "Customer:CustomerID:" + cust.CustomerID;
    _cache.Insert(custKey, cust);
    string prodKey = "Product:ProductID:" + prod.ProductID;
    _cache.Insert(prodKey, prod);
    string[] depKeys = { prodKey, custKey };
    string orderKey = "Order:OrderID:" + order.OrderId;
    // We are setting _Customer and _Product to null so they
    // don't get serialized with Order object
    order._Customer = null;
    order._Product = null;
    var item = new CacheItem(order);
    item.Dependency = new CacheDependency(null, depKeys);
    _cache.Add(orderKey, item);
    _cache.Dispose();
}

Pessimistic handling of many-to-one relationships

The above code loads an Order object from the database and both Customer and Product objects are automatically loaded with it because the Order object has a many-to-one relationship with them. The application then adds Customer and Product objects to the cache and then adds the Order object to the cache but with a dependency on both Customer and Product objects. This way, if any of these Customer or Product objects are updated or removed in the cache, the Order object is automatically removed from the cache to preserve data integrity. The application does not have to keep track of this relationship.

Handling One-to-Many Relationships

Whenever you have fetched an object from the cache that also has a one-to-many relationship with another object, your persistence code may load both the primary object and a collection of all its one-to-many related objects. However, it is not always required to load the related objects because the application may not need them at this time. If your persistence code has loaded the related objects then you need to handle them in the cache. Please note that the related objects are all kept in one collection and this introduces issues of its own that are discussed below.

There are three ways you can handle this. I will call one optimistic, one mildly pessimistic, and one really pessimistic way and will explain each of them below:

  1. Optimistic handling of relationships: In this, we assume that even though there are relationships, nobody else is going to modify the related objects separately. Whoever wants to modify the related objects will fetch them through the primary object in the cache and will therefore be in a position to modify both primary and related objects. In this case, we do not have to store both of these objects separately in the cache. Therefore, the primary object contains the related object and both of them are stored as one cached item in the cache. And, no Data Dependency is created between them.
  2. Mildly pessimistic handling of relationships: In this case, you assume that the related objects can be fetched independently but only as the entire collection and never as individual objects. Therefore, you store the collection as one cached item and create a dependency from the collection to the primary object. Then, if anybody updates or removes the primary object, you want your collection to also be removed from the cache.
  3. Really pessimistic handling of relationships: In this case, you assume that all objects in the related collection can also be individually fetched by the application and modified. Therefore, you must not only store the collection but also all their individual objects in the cache separately. Please note however that this would likely cause performance issues because you're making multiple trips to the cache which may be residing across the network on a cache server. I will discuss this in the next section that deals with "Handling Collections in Cache".

Below is an example of how you can handle one-to-many relationships optimistically. Please note that the collection containing the related objects is serialized as part of the primary object when being put in the cache.

static void Main(string[] args)
{
    string cacheName = "ltq";
    ICache _cache = CacheManager.GetCache(cacheName);
    CustomerFactory cFactory = new CustomerFactory();
    Customer cust = new Customer();
    cust.CustomerID = "ALFKI";
    cFactory.LoadFromDb(cust);
    // please note that _OrderList will automatically get
    // serialized along with the Customer object
    string custKey = "Customer:CustomerID:" + cust.CustomerID;
    _cache.Add(custKey, cust);
    _cache.Dispose();
}

Handling one-to-many relationship optimistically


Below is an example of how to handle one-to-many relationship mildly pessimistically.

static void Main(string[] args)
{
    string cacheName = "myReplicatedCache";
    ICache _cache = CacheManager.GetCache(cacheName);
    CustomerFactory cFactory = new CustomerFactory();
    Customer cust = new Customer();
    cust.CustomerID = "ALFKI";
    cFactory.LoadFromDb(cust);
    IList<Order> orderList = cust._OrderList;
    // please note that _OrderList will not be get
    // serialized along with the Customer object
    cust._OrderList = null;
    string custKey = "Customer:CustomerID:" + cust.CustomerID;
    var custItem = new CacheItem(cust);
    _cache.Add(custKey, custItem);
    // let's reset the _OrderList back
    cust._OrderList = orderList;
    string[] depKeys = { custKey };
    string orderListKey = "Customer:OrderList:CustomerId" + cust.CustomerID;
    IDictionary<string, CacheItem> dictionary = new Dictionary<string, CacheItem>();
    foreach (var order in orderList)
    {
        var orderItem = new CacheItem(order);
        orderItem.Dependency = new CacheDependency(null, depKeys);
        dictionary.Add(orderListKey, orderItem);

    }
    _cache.AddBulk(dictionary);
    _cache.Dispose();
}

Handling one-to-many relationship mildly pessimistically

In the above example, the list of Order objects that are related to this Customer is cached separately. The entire collection is cached as one item because we are assuming that nobody will directly modify individual Order objects separately. The application will always fetch it through this Customer and modify and re-cache the entire collection again.

Another case is the pessimistic handling of one-to-many relationships, which is similar to how we handle collections in the cache. That topic is discussed in the next section.

Handling Collections in the Cache

There are many situations where you fetch a collection of objects from the database. This could be due to a query you ran or it could be a one-to-many relationship returning a collection of related objects on the "many" side. Either way, what you get is a collection of objects that must be handled in the cache appropriately.

There are two ways to handle collections as explained below:

  1. Optimistic handling of collections: In this, we assume that the entire collection should be cached as one item because nobody will individually fetch and modify the objects kept inside the collection. The collection might be cached for a brief period of time and this assumption may be very much valid.
  2. Pessimistic handling of collections: In this case, we assume that individual objects inside the collection can be fetched separately and modified. Therefore, we cache the entire collection but then also cache each individual object and create a dependency from the collection to the individual objects.

Below is an example of how to handle collections optimistically.

static void Main(string[] args)
{
    string cacheName = "myReplicatedCache";
    ICache _cache = CacheManager.GetCache(cacheName);
    CustomerFactory cFactory = new CustomerFactory();
    Customer cust = new Customer();
    string custListKey = "CustomerList:LoadByCountry:Country:United States";
    IList<Customer> custList = cFactory.LoadByCountry("United States");
    IDistributedList<Customer> list = _cache.DataTypeManager.CreateList<Customer>(custListKey);

    // please note that all Customer objects kept in custList
    // will be serialized along with the custList
    foreach (var customer in custList)
    {
        // Add products to list
        list.Add(customer);
    }
    _cache.Dispose();
}

Handling collections optimistically

In the above example, the entire collection is cached as one item and all the Customer objects kept inside the collection are automatically serialized along with the collection and cache. Therefore, there is no need to create any Data Dependency here.

Below is an example of how to handle collections pessimistically.

static void Main(string[] args)
{
    string cacheName = "myReplicatedCache";
    ICache _cache = CacheManager.GetCache(cacheName);
    CustomerFactory cFactory = new CustomerFactory();
    Customer cust = new Customer();
    IList<Customer> custList = cFactory.LoadByCountry("United States");
    ArrayList custKeys = new ArrayList();
    // Let's cache individual Customer objects and also build
    // an array of keys to be used later in CacheDependency
    foreach (Customer c in custList)
    {
        string custKey = "Customer:CustomerID:" + c.CustomerID;
        custKeys.Add(custKey);
        _cache.Insert(custKey, c);
    }
    string custListKey = "CustomerList:LoadByCountry:Country:United States";
    // please note that this collection has a dependency on all
    // objects in it separately. So, if any of them are updated or
    // removed, this collection will also be removed from cache
    IDistributedList<Customer> list = _cache.DataTypeManager.CreateList<Customer>(custListKey);
    foreach (var customer in custList)
    {
        // Add products to list
        var item = new CacheItem(customer);
        item.Dependency = new CacheDependency(null, (string[])custKeys.ToArray());
        list.Add(customer);
    }

    _cache.Dispose();
}

Handling collections pessimistically

In the above example, each object in the collection is cached as a separate item and then the entire collection is cached as well as one item. The collection has a Data Dependency on all its objects that are cached separately. This way, if any of these objects are updated or removed, the collection is also removed from the cache.


Author: Iqbal Khan works for Alachisoft , a leading software company providing .NET and Java distributed caching, O/R Mapping and SharePoint Storage Optimization solutions. You can reach him at iqbal@alachisoft.com.

Signup for monthly email newsletter to get latest updates.

© Copyright Alachisoft 2002 - . All rights reserved. NCache is a registered trademark of Diyatech Corp.