Tuesday, December 31, 2019

BigData MapReduce Concept

MapReduce

Is a programming Model to process large datasets in parallel.
MapReduce divides the task into subtasks and handles them in parallel.
Input and Output always be a key-value format.

  • Map
  • Reducer

Map - You have to write a program that can produce local(key, value) pairs.
Eg:- If your data(ex. data is x,y,z) is in 3 data nodes.
After Map - you will get a key-value pair. i.e, data is key & count is value.
DataNode1 o/p => (x,3),(y,3),(z,4)
DataNode2 o/p => (x,3),(y,3),(z,4)
DataNode3 o/p => (x,4),(y,4),(z,3)

Shuffle - Single key and all the values of that key are brought together. This will happen automatically.
Eg:- After the Shuffle phase, for each key, the value from all the data nodes will be accumulated.
(x,(3,3,4))
(y,(3,3,4))
(z,(4,4,3))

Reducer - You have to write a program that will read the key from Shuffle & sum the values of a specific key. The output will be a key-value pair.
Eg:- After the Reducer phase, you will get the following output.
(x,10)
(y,10)
(z,10)

No comments:

Post a Comment