NCache 4.6 - Online Documentation


MapReduce in NCache allows developers to write programs that process huge amounts of unstructured data in parallel across a NCache cluster. To distribute input data and analyze it in parallel, MapReduce operates in parallel on all nodes in a cluster of any size. The term “MapReduce” refers to two distinct phases. The first phase is ‘Map’ phase, which takes a set of data and converts it into another set of data, where individual items are broken down into key-value pairs. The second phase is ‘Reduce’ phase, which takes output from ‘Map’ as an input and reduces that data set into a smaller and more meaningful data set.
MapReduce is a programming model for processing and generating large data sets with a parallel, distributed algorithm on a NCache cluster. A user defined Map function processes a key-value pair to generate a set of intermediate key-value pairs. Reduce function processes all those intermediate key-value pairs (having same intermediate key) to aggregate, perform calculations or any other operation on the pairs. Another optional component, called Combiner, performs merging of the intermediate key-value pairs generated by Mapper before these key-value pairs can be sent over to the Reducer.
Figure 1. MapReduce Workflow
See Also