NCache 4.6 - Online Documentation

How does MapReduce Work?

Generally, MapReduce  in NCache consists of 2 (sometimes 3) phases: i.e. Mapping, Combining (optional) and Reducing.
1.     Mapping phase: filters and prepares the input for the next phase that may be Combining or Reducing.
2.     Reduction phase: takes care of the aggregation and compilation of the final result.
3.     Combining phase: responsible for reduction local to the node, before sending the input to the Reducers. Combine phase optimizes performance as it minimizes the network traffic between Mapper and Reducers by sending the output to the Reducer in chunks.
Similarly, NCache MapReduce has three phases: Map, Combine, Reduce. Only the Mapper is necessary to implement – Reducer implementation is optional.
NCache MapReduce will execute its default reduction during the task if the Reducer is not implemented by the user.
The Mapper, Combiner and Reducer are executed simultaneously during a NCache MapReduce task on the NCache cluster. Mapper output is individually sent to the Combiner. When Combiner’s output reaches the specified chunk size, it is then sent to the Reducer, which finalizes and persists the output.
In order to monitor the submitted task, a trackable object is provided to the user.
A typical MapReduce task has the following components:
Mapper: Processes input into key-value pairs to generate a set of intermediate key-value pairs to be sent for further refining and extraction of the data.
Combiner Factory: Assigns a unique Combiner to each key provided to it. User can implement it to provide different combiners for different keys.
Combiner: Works as local reducer to the node where Mapper’s output has been combined to minimize traffic between Mapper and Reducer. The tasks are processed and stored in bulk before being sent to the Reducer, meaning the data from Mapper is processed in chunks, the size of which is configurable. Once the combined results reach the specified chunk size, the elements are forwarded to the Reducer.
Reducer Factory: Assigns a unique Reducer to each key provided to it. User can implement it to provide different reducers for different keys.
Reducer: Processes all those intermediate key-value pairs generated by Mapper or combined by Combiner to aggregate, perform calculations or apply different operations to produce the reduced output.
Key Filter: Key Filter, as the name indicates, allows the user to filter cache data based on its keys before being provided to the Mapper. The KeyFilter is called during Mapper’s execution. If it returns true, the Map will be executed on the key. If it returns false, Mapper will skip the key and move to next one from the Cache.
TrackerTask: This component lets you keep track of the progress of the task and its status as the task is executed. And lets you fetch the output of the task and enumerate it.
Output: The output is stored in-memory, on the server side. It can be enumerated using the TrackableTask instance on the client application.
See Also