Big Data Analytics and Processing with IMDG - TayzGrid

With the popularity of Hadoop, MapReduce programming framework has become quite common for doing Big Data Analytics and Big Data Processing. Hadoop typically uses HDFS to store all the data on which it needs to do its distributed processing and as a result the processing is not completely in-memory.

TayzGrid on the other hand provides a totally in-memory Big Data Analytics and Big Data Processing capability similar to Hadoop but one that is much faster because of being in-memory.

TayzGrid is an extremely fast and linearly scalable In-Memory Data Grid. And, TayzGrid provides a MapReduce programming framework that lets you to write programs to process very large amounts of data (big data) in parallel and distributed across a cluster of TayzGrid servers.

TayzGrid provides a totally in-memory MapReduce, Aggregator, and Entry Processor framework that is much faster than the traditional HDFS based Hadoop processing for Big Data. Each is explained in more detail below.

TayzGrid allows you to configure the In-Memory Data Grid cluster to keep all data in object form.  Additionally, all the MapReduce, Aggregator, and Entry Processor code runs within the TayzGrid server process (InProc). This means this code can access all of the data in object form on its own heap without any serialization/deserialization cost. This makes TayzGrid extremely fast and allows you to process Big Data in much less time.

MapReduce for Big Data Processing

The term MapReduce refers to two different phases. The first phase “Map” takes a large set of data and converts it (“Maps it”) to a smaller set of data where individual items are broken down into key-value pairs.

Below is an example of a Map:


public class ProductAnalysisMapper implements 
                                 com.alachisoft.tayzgrid.runtime.mapreduce.Mapper {

 	    @Override
 	    public void map(Object ikey, Object ivalue, OutputMap omap) 
 	    {
		      //This line emits value count to the map.
              omap.emit(ivalue, 1);
 	    }
}

The second phase “Reduce” takes the output of the “Map” phase as its input and further reduces that into a much smaller data set. This output is also broken down into key-value pairs.

Below is an example of a Reducer:


public class ProductAnalysisReducer implements
                          com.alachisoft.tayzgrid.runtime.mapreduce.Reducer {

        public ProductAnalysisReducer(Object k) { /* ... */ }

        @Override
        public void reduce(Object iv) { /* ... */ }

        @Override
        public void finishReduce(KeyValuePair kvp) { /* ... */ }
}

Aggregator for Big Data Analytics

The Aggregator works with MapReduce framework of TayzGrid and provides statistical data to the user. The Aggregator provides cumulative functions on the final result produced by MapReduce. The aim of having the Aggregator functionality is to have the capability of producing valuable results.


class IntAggregator implements Aggregator<Collection, Integer> {

        public IntAggregator(String function) { /* ... */ }
        
        @Override
        public int aggragate(Collection value) { /* ... */ }

        @Override
        public int aggragateAll(Collection value) { /* ... */ }

        private int caclculate(Collection value) { /* ... */ }
}

Entry Processor

Entry Processors is part of the JCache (JSR 107) API specification. And, since TayzGrid is 100% JCache compliant, it fully provides the Entry Processor capability.

With Entry Processor, you can provide your custom code (called Entry Processor) that TayzGrid then runs on all the nodes of its In-Memory Data Grid cluster. And, this Entry Processor runs within the TayzGrid node process (meaning it is InProc).

And, since you can configure TayzGrid to store all data in object form, the execution of the Entry Processor can be extremely fast because there is no serialization/deserialization cost associated with this processing.

Below is an example of an Entry Processor.


public class CustomEntryProcessor<K, V, T>
            implements EntryProcessor<Object, Object, Object>, Serializable
{
            @Override
            public T process(MutableEntry<Object, Object> me, Object... args) 
                            throws EntryProcessorException
            { /* ... */ }
}


What to Do Next?