• Products
  • Solutions
  • Customers
  • Resources
  • Company
  • Pricing
  • Download
Try Playground
  • Server-side API Programming
  • MapReduce
  • Overview
Show / Hide Table of Contents
  • Programmer's Guide
  • Setting Up Development Environment
    • .NET
      • Client API Prerequisites
      • Server-side API Prerequisites
    • Java
      • Client API Prerequisites
    • Python
      • Client API Prerequisites
    • Node.js
      • Client API Prerequisites
  • Client Side API Programming
    • Error Handling
    • Troubleshooting
    • Cache Keys and Data
    • How to Connect to Cache
    • Basic Operations - An Overview
      • Add Data
      • Update/Insert Data
      • Retrieve Data
      • Remove Data
    • Groups
      • Overview
      • Add/Update Data with Groups
      • Retrieve Data with Groups
      • Remove Data with Group
      • Search Group Data Using SQL
      • Delete Group Data Using SQL
    • Tags
      • Overview
      • Add/Update Data with Tags
      • Retrieve Data with Tags
      • Remove Data with Tags
      • Search Tag Data Using SQL
      • Delete Tag Data Using SQL
    • Named Tags
      • Overview
      • Add/Update Data with Named Tags
      • Remove Data with Named Tags
      • Search Data with Named Tags Using SQL
      • Delete Data with Named Tags Using SQL
    • Expirations
      • Overview
      • Absolute Expiration
      • Sliding Expiration
    • Data Dependency
      • Key Dependency
      • Multi-Cache Dependency
    • Dependency on Database
      • SQL Server
      • Oracle
      • OleDB with Polling
      • CLR Procedures in SQL Server
    • Dependency on External Source
      • File Dependency
      • Custom Dependency
      • Aggregate Dependency
    • Locks
      • Types of Locking
      • Pessimistic Locking
      • Optimistic Locking
    • SQL Query
      • Overview
      • Define Indexes Programmatically
      • Query with ExecuteReader and ExecuteScalar
      • Delete Data with ExecuteNonQuery
      • SQL Reference
    • LINQ Query
      • Overview
      • LINQ Query for Objects
      • LINQ Reference
    • Data Structures
      • Overview
      • List
      • Queue
      • Set
      • Dictionary
      • Counter
      • Invalidation Attributes
      • Searchable Attributes
      • Query on Data Structures
      • Remove from Data Structure
    • Events
      • Overview
      • Cache Level Events
      • Item Level Events
      • Management Level Events
    • Pub/Sub Messaging
      • Overview
      • Topics
      • Publish Messages
      • Subscribe to a Topic
      • Pub/Sub Events
      • Monitoring Topic Statistics
    • Continuous Query
      • Overview
      • Use Continuous Query
    • Stream Processing
      • Add/Update Stream Data
      • Retrieve Stream Data
    • JSON
      • Overview
      • Use JSON Objects
      • Query JSON Data
    • Security API
      • Login with Credentials
    • Management API
    • Error Logging
    • Location Affinity
  • Server-side API Programming
    • Loader and Refresher
      • Overview
      • Implement Loader and Refresher
      • Components of Loader/Refresher
    • Data Source Providers
      • Read-through
        • Implement Read-through
        • Use Read-through
      • Write-through
        • Implement Write-through
        • Use Write-through
        • Use Write-behind
    • Custom Dependency
      • Implement Extensible Dependency
      • Implement Bulk Extensible Dependency
      • Implement Notify Extensible Dependency
    • Bridge Conflict Resolver
    • Entry Processor
      • Overview
      • Implement Entry Processor
    • MapReduce
      • Overview
      • Implement MapReduce
      • Use MapReduce
    • MapReduce Aggregator
      • Overview
      • Implement and Use Aggregator
    • Compact Serialization
  • Client Side Features
    • ASP.NET Core Caching
      • Session Storage
        • Session Provider
        • IDistributedCache
        • Sessions Usage
        • Multi-site Session Provider
        • Session Sharing with ASP.NET
      • SignalR
        • SignalR Core Integration for NCache
      • Response Caching
        • Configure and Use
        • Configure with IDistributedCache
      • Data Caching
        • NCache API
        • IDistributedCache API
      • Data Protection Provider
        • Configure
    • Java Web App Caching
      • Web Sessions
        • Overview
        • Configure App
          • Add Maven Dependencies
          • Deploy Application
        • Multi-site Sessions
    • Node.js App Caching
      • Web Sessions
    • ASP.NET Caching Benefits and Overview
      • ASP.NET Session State Provider Properties
      • Multi-region ASP.NET Session State Provider Configuration
      • Session Sharing between ASP.NET and ASP.NET Core
      • ASP.NET SignalR Backplane
        • NCache Extension for SignalR
      • ASP.NET View State Caching
        • View State Content Optimization Configuration
        • Group View State with Sessions
        • Limit View State Caching
        • Page Level Grouping
      • ASP.NET Output Cache
        • Output Caching Provider Overview
        • Output Cache with Custom Hooks
  • .NET Third Party Integrations
    • Entity Framework (EF) Core
      • Installation
      • Configure
      • EF Core Extension Methods
        • Extension Methods
        • Cache Handle
        • Caching Options
        • Query Deferred API
      • Logging in EF Core
    • Entity Framework EF 6
      • EF Second Level Cache
      • EF Caching Resync Provider
      • EF Caching Configuration File
    • NHibernate
      • Second Level Cache
      • Query Caching
      • Synchronize Database with Cache
    • Debug NCache Providers in Visual Studio
  • Java Third Party Integrations
    • Hibernate
      • Second Level Cache
      • Configure Cacheable Objects and Regions
      • Configure Application
      • Query Caching
    • Spring
      • Overview
      • Use NCache with Spring
        • Configure Generic Provider
        • Configure JCache Spring Caching Provider
        • Configure Caching Declaration
        • Configure Spring Sessions
    • JCache API
      • CRUD Operations
      • Expiration
      • Events
  • Third-Party Wrappers
    • AppFabric to NCache Migration
      • AppFabric API vs. NCache API
      • Configuration Differences Between AppFabric and NCache
      • Migrating from an AppFabric Application to NCache
    • Redis to NCache Migration
      • Redis to NCache Migration
    • Memcached Wrapper
      • Memcached Gateway Approach
      • Memcached Client Plugin for .NET

MapReduce Working and Components [Deprecated]

MapReduce in NCache allows developers to process huge amounts of unstructured data in parallel across an NCache cluster. To distribute input data and analyze it in parallel, MapReduce operates in parallel on all nodes in a cluster of any size.

Note

This feature is only available in the NCache Enterprise.

MapReduce is a programming model for processing and generating large data sets with a parallel, distributed algorithm on a cluster. The term “MapReduce” refers to two distinct phases. The first phase is the ‘Map’ phase, which takes a set of data and converts it into another set of data, where individual items are broken down into key-value pairs. The second phase is the ‘Reduce’ phase, which takes the output from the ‘Map’ as an input and reduces that data set into a smaller and more meaningful data set.

A user-defined Mapper processes a key-value pair to generate a set of intermediate key-value pairs. Reducer processes all those intermediate key-value pairs (having the same intermediate key) to aggregate, perform calculations, or any other operation on the pairs. Another optional component, Combiner, performs the merging of the intermediate key-value pairs generated by the Mapper before these key-value pairs can be sent over to the Reducer.

The following example illustrates a MapReduce task (with and without combiner) being executed over a cluster of three nodes. The task takes orders as input to the Mapper and extracts the count of products consumed in it. In Figure 1, the Mapper’s output is directly sent to the reducer and is aggregated on the Reducer’s node whereas in Figure 2, the count over a single node is aggregated first and this aggregated count is sent to the Reducer node for the final aggregation.

MapReduce without Combiner:

MapReduce in Ncache without Combiner

MapReduce with Combiner:

MapReduce in Ncache with Combiner

How does MapReduce Work?

Generally, MapReduce consists of two (sometimes three) phases: i.e. Mapping, Combining (optional) and Reducing.

  1. Mapping phase: Filters and prepares the input for the next phase which may be Combining or Reducing.
  2. Reduction phase: Takes care of the aggregation and compilation of the final result.
  3. Combining phase: Responsible for the reduction local to the node, before sending the input to the Reducers. This phase optimizes the performance as it minimizes the network traffic between the Mapper and Reducers by sending the output to the Reducer in chunks.

Similarly, NCache MapReduce has three phases: Map, Combine, and Reduce. Only the Mapper is necessary to implement, Reducer and Combiner implementations are optional. NCache MapReduce will execute its default reducer if the user does not implement Reducer. Default reducer merges output omitted by the Mapper into an array.

The Mapper, Combiner, and Reducer are executed simultaneously during an NCache MapReduce task on the NCache cluster. Mapper output is individually sent to the Combiner. When the Combiner’s output reaches the specified chunk size, it is then sent to the Reducer, which finalizes and persists the output.

To monitor the submitted task, a traceable object is provided to the user.

The number of tasks to be executed simultaneously and the Mapper’s output chunk is configurable. The Mapper’s output is sent to the combiner or the reducer once the output chunk reaches the configured chunk size. See NCache Administrator’s Guide.

A typical MapReduce task has the following components:

Mapper: Processes the initial input and enables the user to emit the output into a dictionary to be used as an input for the combiner or reducer.

Combiner Factory: Creates and manages combiners for each key emitted into the output by the Mapper.

Combiner: Works as a local reducer to the node where the Mapper’s output is combined to minimize the traffic between the Mapper and the Reducer.

Reducer Factory: Create and manage the Reducers for each key emitted into the output by the Mapper or the Combiner.

Reducer: Processes all those intermediate key-value pairs generated by the Mapper or combined by the Combiner to aggregate, perform calculations, or apply different operations to produce the reduced output.

Key Filter: Key Filter, as the name indicates, allows the user to filter cache data based on its keys before it is sent to the Mapper. The KeyFilter is called during the Mapper phase. If it returns true, the Map will be executed on the key. If it returns false, the Mapper will skip the key and move to the next one from the cache.

TrackerTask: This component lets you keep track of the progress of the task and its status as the task is executed. It also lets you fetch the output of the task and enumerate it.

Output: The output is stored in memory, on the server side. It can be enumerated using the TrackableTask instance on the client application.

Warning

This feature is only supported till 5.3 SP4.

See Also

Sample Implementation of MapReduce
Using MapReduce in Cache
Aggregator
Entry Processor
Configure MapReduce

In This Article
  • How does MapReduce Work?
  • See Also

Contact Us

PHONE

+1 (214) 764-6933   (US)

+44 20 7993 8327   (UK)

 
EMAIL

sales@alachisoft.com

support@alachisoft.com

NCache
  • NCache Enterprise
  • NCache Professional
  • Edition Comparison
  • NCache Architecture
  • Benchmarks
Download
Pricing
Try Playground

Deployments
  • Cloud (SaaS & Software)
  • On-Premises
  • Kubernetes
  • Docker
Technical Use Cases
  • ASP.NET Sessions
  • ASP.NET Core Sessions
  • Pub/Sub Messaging
  • Real-Time ASP.NET SignalR
  • Internet of Things (IoT)
  • NoSQL Database
  • Stream Processing
  • Microservices
Resources
  • Magazine Articles
  • Third-Party Articles
  • Articles
  • Videos
  • Whitepapers
  • Shows
  • Talks
  • Blogs
  • Docs
Customer Case Studies
  • Testimonials
  • Customers
Support
  • Schedule a Demo
  • Forum (Google Groups)
  • Tips
Company
  • Leadership
  • Partners
  • News
  • Events
  • Careers
Contact Us

  • EnglishChinese (Simplified)FrenchGermanItalianJapaneseKoreanPortugueseSpanish

  • Contact Us
  •  
  • Sitemap
  •  
  • Terms of Use
  •  
  • Privacy Policy
© Copyright Alachisoft 2002 - 2025. All rights reserved. NCache is a registered trademark of Diyatech Corp.
Back to top