Object Query Language for Distributed Cache

Author: Iqbal Khan

NCache lets you create a scalable distributed cache in the middle-tier so you can reduce expensive trips to the database and greatly improve your application performance. It also improves application scalability because you are able to find frequently used data in this highly scalable cache instead of a single database server that cannot scale up very well.

Your application typically uses a cache as a Hashtable where everything is stored based on a key and you must have this key to fetch an item. This is like having a relational database where you can only use primary key to find data. This works fine in many situations but in a real life complex application your application often needs to find data based on attributes other than the primary key. And, since you're keeping a lot of your data in the cache, it would be very useful if you could search the cache in this manner as well. NCache provides exactly such facility.

In this article, I will discuss how NCache object querying works.

Cache Search Returning Keys

NCache provides an Object Query Language (OQL) to let you search the cache. You have to make API calls and specify a search based on this OQL in order to fetch a collection of objects from the cache. Here is what you need to use in your .NET application in order to query the cache.

public class Program
{
    public static void Main(string[] args)
    {
        NCache.InitializeCache("myReplicatedCache");
        String query = "SELECT NCacheQuerySample.Business.Product WHERE this.ProductID > 100";
        // Fetch the keys matching this search criteria
        ICollection keys = NCache.Cache.Search(query);
        if (keys.Count > 0)
        {
            IEnumerator ie = keys.GetEnumerator();
            while (ie.MoveNext())
            {
                String key = (String)ie.Current;
                Product prod = (Product)NCache.Cache.Get(key);

                HandleProduct(prod);
                Console.WriteLine("ProductID: {0}", prod.ProductID);
            }
        }
        NCache.Cache.Dispose();
    }
}

The above code searches the cache and returns a collection of keys. Then, it iterates over all the returned keys and individually fetches the corresponding cached items from the cache. The benefit of this approach is that the query does not automatically return a lot of data and instead only returns keys. Then, the client application can determine which keys it wants to fetch. The drawback of this approach is that if you're going to fetch most of the items from the cache anyway, then you'll end up making a lot of trips to the cache. And, if the cache is distributed, this may eventually become costly. If that becomes the case, then you can conduct a search that returns the keys and the items together.

Cache Search Returning Items

As you have seen already, a simple Cache.Search(...) returns a collection of keys. However, if you intend to fetch all or most of the cached items associated with these keys, then Cache.Search(...) is not a very efficient way to search the cache. The reason is that you'll first make a call to do the search. Then, you'll make a number of calls to fetch the items associated with each key. This can become a very costly operation. In these situations, it is better to fetch all the keys and items in one call. Below is the example doing just that.

public class Program
{
    public static void Main(string[] args)
    {
        NCache.InitializeCache("myReplicatedCache");
        String query = "SELECT NCacheQuerySample.Business.Product WHERE this.ProductID > 100";
        // Fetch the keys matching this search criteria
        IDictionary dict = NCache.Cache.SearchEntries(query);
        if (dict.Count > 0)
        {
            IDictionaryEnumerator ide = dict.GetEnumerator();
            while (ide.MoveNext())
            {
                String key = (String)ide.Key;
                Product prod = (Product)ide.Value;
                HandleProduct(prod);
                Console.WriteLine("Key = {0}, ProductID: {1}",
                key, prod.ProductID);
            }
        }
        NCache.Cache.Dispose();
    }
}

The above code searches the cache and returns a dictionary containing both keys and values. This way, all the data based on the search criteria is fetched in one call to NCache. This is much more efficient way of fetching all the data from the cache than doing Cache.Search().

Indexing Searchable Attributes

Please note that NCache requires all searchable attributes to be indexed. This is because without indexing, NCache would have to traverse the entire cache in order to find the items the user is looking for. And, that is a very costly operation with potential of slowing down the entire cache and undo the major reason why people would NCache, namely to boost their application performance and scalability.

NCache provides its own indexing mechanism. You can identify the objects in your .NET assemblies that you want to index. Then, when you add data to NCache, it checks to see whether you're adding these objects. And, if you are, then it uses .NET Reflection to extract data from the indexed attributes and builds its internal index. Then, when you query the cache with Cache.Search() or Cache.SearchEntries(), NCache uses these indexes to quickly find the desired objects and returns them to you.

Figure 1: Indexing Object Attributes before Starting Cache

Please note that any time you specify indexes on object attributes, it adds a little bit to the processing time for Add, Insert, and Remove operations. However, Get operation is unaffected.

Querying Different Caching Topologies

Although, the search behavior from the client application's perspective is the same regardless of what caching topologies you're using, the internals of search vary from topology to topology. For example, if you're doing a search on a replicated cache, your search is conducted entirely on the cache server you initiated this search from. This is because the entire cache is available there. Here is how a query is run in a replicated cache.

Figure 2: Query Runs Locally on One Server

However, if you have partitioned or partitioned-replica caching topology, then not all the data resides on a single cache node in the cluster. In this situation, the cache server where the query is initiated sends the query to all other servers in the cluster and also runs it locally. The query then runs in all servers in a parallel and its results are returned from all the nodes to this originating server node. This server node then combines all the results (does a "union") and returns them to the client. Below is a diagram showing all of this.

Figure 3: Query Runs in Parallel on All Server Nodes

Your .NET Assemblies

In NCache, indexing is done on the server nodes. However, NCache uses .NET Reflection on the client-end to extract the values of object attributes and sends them to the server. Therefore, your .NET assemblies that contain the object definition are only required on the client-end where your application resides. Whether you're running in InProc or OutProc mode, your assemblies need to be in a directory where your client application can access them.

Additionally, NCache also supports object query language for Java clients.

Query Language Syntax

NCache supports SQL-like language called Object Query Language (OQL). This language has the following syntax in NCache.

SELECT NCacheQuerySample.Business.Product WHERE this.ProductID > 100;

SELECT NCacheQuerySample.Business.Product WHERE (this.ProductID != 100 AND this.ProductID <= 200);

SELECT NCacheQuerySample.Business.Product WHERE (this.ProductID == 150 OR this.ProductID == 160);

SELECT NCacheQuerySample.Business.Product WHERE this.ProductID IN (1, 4, 7, 10);

SELECT NCacheQuerySample.Business.Product WHERE this.ProductID NOT IN (1, 4, 7, 10);

SELECT NCacheQuerySample.Business.Employee WHERE this.HiringDate > DateTime.now;

SELECT NCacheQuerySample.Business.Employee WHERE this.HiringDate > DateTime ('01/01/2007');

SELECT NCacheQuerySample.Business.Employee WHERE this.Hired = true;

You can combine multiple expressions together with AND and OR and by using nested parenthesis. The full query language syntax grammar is specified below.

<Query>                 ::= SELECT <ObjectType> 
                          | SELECT <ObjectType> WHERE <Expression>

<Expression>            ::= <OrExpr>

<OrExpr>                ::= <OrExpr> 'OR' <AndExpr>
                          | <AndExpr>

<AndExpr>               ::= <AndExpr> 'AND' <UnaryExpr>
                          | <UnaryExpr>

<UnaryExpr>             ::= 'NOT'  <CompareExpr>
                          | <CompareExpr>

<CompareExpr>           ::= <Atrrib> '='  <Value>
                          | <Atrrib> '!=' <Value>
                          | <Atrrib> '==' <Value>
                          | <Atrrib> '<>' <Value>
                          | <Atrrib> '<'  <Value>
                          | <Atrrib> '>'  <Value>
                          | <Atrrib> '<=' <Value>
                          | <Atrrib> '>=' <Value>
                          | <Atrrib> 'IN' <InList>
                          | <Atrrib> 'NOT' 'IN' <InList>
                          | '(' <Expression> ')'

<Atrrib>		::= <ObjectValue>                                                    

<Value>                 ::= '-' <NumLiteral> 
                          | <NumLiteral> 
                          | <StrLiteral>
                          | 'true'
                          | 'false'
                          | <Date>

<Date>                  ::= 'DateTime' '.' 'now'
                          | 'DateTime' '(' StringLiteral ')'

<StrLiteral>		::= StringLiteral
			  | 'null'  

<NumLiteral>            ::= IntegerLiteral 
                          | RealLiteral

<ObjectType>            ::= '*' 
                          | <Property>

<Property>		::= <Property> '.' Identifier
			  | Identifier

<ObjectValue>           ::= Keyword '.' Identifier              

<InList>                ::= '(' <ListType> ')'

<ListType>              ::= <NumLiteralList>
                          | <StrLiteralList>
                          | <DateList>

<NumLiteralList>        ::=  <NumLiteral> ',' <NumLiteralList>
                          | <NumLiteral>

<StrLiteralList>        ::= <StrLiteral> ',' <StrLiteralList>
                          | <StrLiteral>

<DateList>              ::= <Date> ',' <DateList>
                          | <Date>

Conclusion

As you have seen, NCache makes it very simple to query the distributed cache. This is a powerful feature that allows you to use your cache in a more meaningful way and find things in it more easily. Check it out.

Author: Iqbal Khan works for Alachisoft , a leading software company providing .NET and Java distributed caching, O/R Mapping and SharePoint Storage Optimization solutions. You can reach him at iqbal@alachisoft.com.