• Webinars
  • Docs
  • Download
  • Blogs
  • Contact Us
Show / Hide Table of Contents
  • Programmer's Guide
  • Client Side API Programming
    • Setting Up Development Environment
    • Basic Cache Operations
      • Initialize Cache
      • Add Data to Cache
      • Update Data in Cache
      • Fetch Data From Cache
      • Remove Data From Cache
      • Dispose Cache
    • Bulk Operations
      • Adding Collection to Cache
      • Updating Collection in Cache
      • Retrieving Collection from Cache
      • Removing Collection from Cache
      • Deleting Collection from Cache
    • Asynchronous Operations
      • Using Asynchronous Operations
      • Using Asynchronous Operations with Callback Methods
    • Groups and Subgroups
      • Adding/Updating Data Group in Cache
      • Retrieving Data Group from Cache
      • Removing Data Group from Cache
    • Tagging Data in NCache
      • Creating Tags
      • Adding Items with Tags
      • Retrieving Previously Tagged Data
      • Removing Tagged Items from Cache
    • Named Tags
    • Data Expiration Strategies
      • Using Absolute Expiration
      • Using Sliding Expiration
    • Cache Dependencies
      • Key Dependency
      • File Dependency
      • Notification based Dependencies
        • Database Dependency using SQL Server
        • Database Dependency using Oracle
      • Polling Based Dependency
      • Custom Data Source Dependency
      • Multiple Cache Sync Dependency
      • Aggregate Dependency
      • Add Dependency to Existing Item
      • Using CLR Procedures to Call NCache
    • Locking Data in NCache
      • Locking Items in Cache (Pessimistic Locking)
      • Locking Items with Cache Item Versioning (Optimistic Locking)
    • SQL Reference for NCache
      • SQL Syntax
      • Querying Samples for Operators
      • Querying Data in NCache
      • NCache Language Integrated Query (LINQ)
        • Using LINQ in NCache
        • Configuring LINQPad for NCache
        • Querying NCache Data in LINQPad
    • Event Notifications
      • Cache Level Event Notifications
      • Item Level Event Notifications
      • Custom Event Notifications
    • Publish/Subscribe (Pub/Sub) in NCache
      • Pub/Sub Topics
      • Managing Topics
      • Pub/Sub Messages
        • Message Behavior and Properties
        • Creating a Message
      • Publish Messages to Topic
      • Subscribe for Topic Messages
      • Monitoring Pub/Sub Topics
    • Continuous Query
    • Using Streams in NCache
      • Opening with Stream Modes
      • Adding and Updating Data with Streams
      • Retrieving Data from Streams
      • Closing a Stream
    • Security and Encryption
      • NCache Security
      • NCache Data Encryption
    • Data Compression
    • NCache Management API
  • Server Side API Programming
    • Cache Startup Loader
      • Components of Cache Startup Loader
      • Sample Implementation of ICacheLoader on Single Node
      • Sample Implementation of ICacheLoader with Distribution Hints
    • Data Source Providers (Backing Source)
      • Read-Through Caching
        • Configure Read-Through Provider
        • Using Read-Through with Cache Operations
      • Write-Through Caching
        • Configuring Write-Through Provider
        • Using Write-Through with Basic Operations
        • Using Write-Behind with Basic Operations
        • Using Write-Behind with Bulk Operations
        • Using Write-Behind with Async Operations
        • Monitor Write-Through Counters
    • Custom Dependency
      • Sample Implementation of Custom Dependency
      • Sample Usage of Custom Dependency
    • WAN Replication through Bridge
      • Bridge Configurations
      • Implementing Bridge Conflict Resolver
    • Entry Processor
      • Sample Implementation of IEntryProcessor Interface
      • Sample Usage of EntryProcessor
    • MapReduce
      • Sample Implementation of MapReduce Interfaces
      • Sample Usage of MapReduce
    • Aggregator
      • Sample Implementation of IValueExtractor Interface
      • Sample Implementation of IAggregator Interface
      • Sample Usage of Aggregator
    • Dynamic Compact Serialization
  • Client Side ASP.NET Features
    • ASP.NET
      • ASP.NET Session State Provider for NCache
      • Multi-Region ASP.NET Session State Provider for NCache
    • ASP.NET Core
      • Session Storage in ASP.NET Core
        • Configure NCache ASP.NET Core Session Provider
        • Configure ASP.NET Core Sessions with NCache IDistributedCache Provider
      • Multi-Region ASP.NET Core Session Provider for NCache
      • Object Caching in ASP.NET Core
    • ASP.NET SignalR
      • Using NCache Extension for SignalR
    • View State Caching
      • Configuring and Using Content Optimization
      • Group View State with Sessions
      • Limit View State Caching
      • Perform Page Level Grouping for View State
    • ASP.NET Output Cache
      • Configure ASP.NET Output Caching
      • Using ASP.NET Output Cache with Custom Hooks
  • Client Side Third Party Integrations
    • Migrating AppFabric to NCache
      • AppFabric API vs. NCache API
    • NHibernate
      • NCache as NHibernate Second Level Cache
      • Using NHibernate Query Caching
      • Configuring Database Synchronization with NHibernate
    • Entity Framework Caching Integration
      • NCache as Entity Framework Second Level Cache
      • Entity Framework Caching Config File
    • Entity Framework Core Caching
      • Installing NCache Entity Framework Core Provider
      • Configuring NCache Entity Framework Core Provider
      • Using NCache Entity Framework Core Provider
        • Caching Options for EF Core Provider
        • LINQ APIs for EF Core Provider
        • Cache Only APIs for EF Core Provider
        • Query Deferred APIs for EF Core Provider
      • Logging in NCache Entity Framework Core Provider
    • Memcached
      • NCache Memcached Gateway Approach
      • Memcached Client Plugin for .NET
    • Debug NCache Providers in Visual Studio
    • NCache for Visual Studio Extension

MapReduce

MapReduce in NCache allows developers to write programs that process huge amounts of unstructured data in parallel across an NCache cluster. To distribute input data and analyze it in parallel, MapReduce operates in parallel on all nodes in a cluster of any size. The term “MapReduce” refers to two distinct phases. The first phase is ‘Map’ phase, which takes a set of data and converts it into another set of data, where individual items are broken down into key-value pairs. The second phase is ‘Reduce’ phase, which takes output from ‘Map’ as an input and reduces that data set into a smaller and more meaningful data set.

MapReduce is a programming model for processing and generating large data sets with a parallel, distributed algorithm on an NCache cluster. A user defined Map function processes a key-value pair to generate a set of intermediate key-value pairs. Reduce function processes all those intermediate key-value pairs (having same intermediate key) to aggregate, perform calculations or any other operation on the pairs. Another optional component, called Combiner, performs merging of the intermediate key-value pairs generated by Mapper before these key-value pairs can be sent over to the Reducer.

How does MapReduce Work?

Generally, MapReduce consists of two (sometimes three) phases: i.e. Mapping, Combining (optional) and Reducing.

1. Mapping phase: filters and prepares the input for the next phase that may be Combining or Reducing.

2. Reduction phase: takes care of the aggregation and compilation of the final result.

3. Combining phase: responsible for reduction local to the node, before sending the input to the Reducers. Combine phase optimizes performance as it minimizes the network traffic between Mapper and Reducers by sending the output to the Reducer in chunks.

Similarly, NCache MapReduce has three phases: Map, Combine, Reduce. Only the Mapper is necessary to implement – Reducer implementation is optional.

NCache MapReduce will execute its default reduction during the task if the Reducer is not implemented by the user.

The Mapper, Combiner and Reducer are executed simultaneously during a NCache MapReduce task on the NCache cluster. Mapper output is individually sent to the Combiner. When Combiner’s output reaches the specified chunk size, it is then sent to the Reducer, which finalizes and persists the output.

In order to monitor the submitted task, a trackable object is provided to the user.

A typical MapReduce task has the following components:

Mapper: Processes input into key-value pairs to generate a set of intermediate key-value pairs to be sent for further refining and extraction of the data.

Combiner Factory: Assigns a unique Combiner to each key provided to it. User can implement it to provide different combiners for different keys.

Combiner: Works as local reducer to the node where Mapper’s output has been combined to minimize traffic between Mapper and Reducer. The tasks are processed and stored in bulk before being sent to the Reducer, meaning the data from Mapper is processed in chunks, the size of which is configurable. Once the combined results reach the specified chunk size, the elements are forwarded to the Reducer.

Reducer Factory: Assigns a unique Reducer to each key provided to it. User can implement it to provide different reducers for different keys.

Reducer: Processes all those intermediate key-value pairs generated by Mapper or combined by Combiner to aggregate, perform calculations or apply different operations to produce the reduced output.

Key Filter: Key Filter, as the name indicates, allows the user to filter cache data based on its keys before being provided to the Mapper. The KeyFilter is called during Mapper’s execution. If it returns true, the Map will be executed on the key. If it returns false, Mapper will skip the key and move to next one from the Cache.

TrackerTask: This component lets you keep track of the progress of the task and its status as the task is executed. And lets you fetch the output of the task and enumerate it.

Output: The output is stored in-memory, on the server side. It can be enumerated using the TrackableTask instance on the client application.

In This Section

Sample Implementation of Interfaces
Explains, with samples, how to implement the required interfaces for MapReduce.

Sample Usage of MapReduce
Explains the steps to execute the MapReduce task.

Back to top Copyright © 2017 Alachisoft