You can configure MapReduce for processing and generating large data sets with a parallel, distributed algorithm on a cluster.
Using NCache Web Manager
Launch NCache Web Manager by browsing to http://localhost:8251 (Windows) or
<server-ip>:8251(Windows + Linux).
In the left navigation bar, click on Clustered Caches or Local Caches, based on the cache to configure.
Against the cache name, click on View Details.
This opens up the detailed configuration page for the cache. Go to the Advanced Settings tab and click on MapReduce in the left bar.
Configure the Max Tasks that can be executed simultaneously.
In case you expect exceptions to be thrown during task execution, you can specify Max avoidable exceptions, after which the task is failed and logged in the cache error log.
Specify Chunk Size - the number of emitted elements a chunk must have before being transmitted to Combiner or Reducer.
Specify the Queue Size which is the maximum number of tasks that can wait in queue before they are processed.
Click Deploy Task Libraries.
A dialog box will open. Browse for the libraries which have the MapReduce interfaces implemented and click Open.
- Click on Save Changes to apply this configuration to the cache.
Add-MapReduce cmdlet configures MapReduce tasks for processing and generating large data sets with a parallel, distributed algorithm on a clustered cache.
The following command configures MapReduce execution on demoClusteredCache with default options.
Add-MapReduce -CacheName demoClusteredCache
The following command configures MapReduce on demoClusteredCache with 20 tasks to be executed in parallel with chunks of 100 elements each.
Add-MapReduce demoClusteredCache -MaxTasks 20 -ChunkSize 100