Hazelcast - Map Reduce & Aggregations - Adglob Infosystem Pvt Ltd

MapReduce is a computation model which is useful for data processing when you have lots of data and you need multiple machines, i.e., a distributed environment to calculate data. It involves ‘map’ing of data into key-value pairs and then ‘reducing’, i.e., grouping these keys and performing an operation on the value.

Given the fact that Hazelcast is designed keeping a distributed environment in mind, implementing Map-Reduce Frameworks comes naturally to it.

Let’s see how to do it with an example.

For example, let’s suppose we have data about a car (brand & car number) and the owner of that car.

Honda-9235, John

Hyundai-235, Alice

Honda-935, Bob

Mercedes-235, Janice

Honda-925, Catnis

Hyundai-1925, Jane

And now, we have to figure out the number of cars for each brand, i.e., Hyundai, Honda, etc.

Example

Let’s try to find that out using MapReduce −

package com.example.demo;

import java.lang.reflect.Array;

import java.util.ArrayList;

import java.util.Map;

import java.util.concurrent.ExecutionException;

import java.util.concurrent.atomic.AtomicInteger;

import com.hazelcast.core.Hazelcast;

import com.hazelcast.core.HazelcastInstance;

import com.hazelcast.core.ICompletableFuture;

import com.hazelcast.core.IMap;

import com.hazelcast.mapreduce.Context;

import com.hazelcast.mapreduce.Job;

import com.hazelcast.mapreduce.JobTracker;

import com.hazelcast.mapreduce.KeyValueSource;

import com.hazelcast.mapreduce.Mapper;

import com.hazelcast.mapreduce.Reducer;

import com.hazelcast.mapreduce.ReducerFactory;

public class MapReduce {

   public static void main(String[] args) throws ExecutionException,

   InterruptedException {

      try {

         // create two Hazelcast instances

         HazelcastInstance hzMember = Hazelcast.newHazelcastInstance();

         Hazelcast.newHazelcastInstance();

         IMap<String, String> vehicleOwnerMap=hzMember.getMap("vehicleOwnerMap");

         vehicleOwnerMap.put("Honda-9235", "John");

         vehicleOwnerMap.putc"Hyundai-235", "Alice");

         vehicleOwnerMap.put("Honda-935", "Bob");

         vehicleOwnerMap.put("Mercedes-235", "Janice");

         vehicleOwnerMap.put("Honda-925", "Catnis");

         vehicleOwnerMap.put("Hyundai-1925", "Jane");

         KeyValueSource<String, String> kvs=KeyValueSource.fromMap(vehicleOwnerMap);

         JobTracker tracker = hzMember.getJobTracker("vehicleBrandJob");

         Job<String, String> job = tracker.newJob(kvs);

         ICompletableFuture<Map<String, Integer>> myMapReduceFuture =

            job.mapper(new BrandMapper())

            .reducer(new BrandReducerFactory()).submit();

         Map<String, Integer&g; result = myMapReduceFuture.get();

         System.out.println("Final output: " + result);

      } finally {

         Hazelcast.shutdownAll();

   private static class BrandMapper implements Mapper<String, String, String, Integer> {

      @Override

      public void map(String key, String value, Context<String, Integer>

      context) {

         context.emit(key.split("-", 0)[0], 1);

   private static class BrandReducerFactory implements ReducerFactory<String, Integer, Integer> {

      @Override

      public Reducer<Integer, Integer> newReducer(String key) {

         return new BrandReducer();

   private static class BrandReducer extends Reducer<Integer, Integer> {

      private AtomicInteger count = new AtomicInteger(0);

      @Override

      public void reduce(Integer value) {

         count.addAndGet(value);

      @Override

      public Integer finalizeReduce() {

         return count.get();

Let’s try to understand this code −

We create Hazelcast members. In the example, we have a single member, but there can well be multiple members.
We create a map using dummy data and create a Key-Value store out of it.
We create a Map-Reduce job and ask it to use the Key-Value store as the data.
We then submit the job to cluster and wait for completion.
The mapper creates a key, i.e., extracts brand information from the original key and sets the value to 1 and then emits that information as K-V to the reducer.
The reducer simply sums the value, grouping the data, based on key, i.e., brand name.

Output

The output of the code −

Final output: {Mercedes=1, Hyundai=2, Honda=3}