This feature is only available in the NCache Enterprise.
You can configure MapReduce for processing and generating large data sets with a parallel, distributed algorithm on a cluster.
Using the NCache Management Center
Launch the NCache Management Center by browsing to http://localhost:8251 or
<server-ip>:8251on Windows and Linux.
In the left navigation bar, click on Clustered Caches or Local Caches, based on the cache to configure.
Against the cache name, click on View Details.
This opens the detailed configuration page for the cache. Go to the Advanced Settings tab and click on MapReduce in the left bar.
Configure the Max Tasks that can be executed simultaneously.
In case you expect exceptions to be thrown during task execution, you can specify Max avoidable exceptions, after which the task is failed and logged in the cache error log.
Specify Chunk Size - the number of emitted elements a chunk must have before being transmitted to the Combiner or Reducer.
Specify the Queue Size which is the maximum number of tasks that can wait in queue before they are processed.
Click Deploy Task Libraries.
A dialog box will open. Browse for the libraries that have the MapReduce interfaces implemented and click Open.
- Click on Save Changes to apply this configuration to the cache.
Using Command Line Tools
The Add MapReduce tool configures MapReduce tasks for processing and generating large data sets with a parallel, distributed algorithm on a clustered cache.
The following command configures MapReduce execution on demoClusteredCache with its default options.
The following command configures MapReduce on demoClusteredCache with 20 tasks to be executed in parallel with chunks of 100 elements each.