MapReduce
Is a programming Model to process large datasets in parallel.
MapReduce divides the task into subtasks and handles them in parallel.
Input and Output always be a key-value format.
Map - You have to write a program that can produce local(key, value) pairs.
Eg:- If your data(ex. data is x,y,z) is in 3 data nodes.
After Map - you will get a key-value pair. i.e, data is key & count is value.
DataNode1 o/p => (x,3),(y,3),(z,4)
DataNode2 o/p => (x,3),(y,3),(z,4)
DataNode3 o/p => (x,4),(y,4),(z,3)
Shuffle - Single key and all the values of that key are brought together. This will happen automatically.
Eg:- After the Shuffle phase, for each key, the value from all the data nodes will be accumulated.
(x,(3,3,4))
(y,(3,3,4))
(z,(4,4,3))
Reducer - You have to write a program that will read the key from Shuffle & sum the values of a specific key. The output will be a key-value pair.
Eg:- After the Reducer phase, you will get the following output.
(x,10)
(y,10)
(z,10)
Is a programming Model to process large datasets in parallel.
MapReduce divides the task into subtasks and handles them in parallel.
Input and Output always be a key-value format.
- Map
- Reducer
Map - You have to write a program that can produce local(key, value) pairs.
Eg:- If your data(ex. data is x,y,z) is in 3 data nodes.
After Map - you will get a key-value pair. i.e, data is key & count is value.
DataNode1 o/p => (x,3),(y,3),(z,4)
DataNode2 o/p => (x,3),(y,3),(z,4)
DataNode3 o/p => (x,4),(y,4),(z,3)
Shuffle - Single key and all the values of that key are brought together. This will happen automatically.
Eg:- After the Shuffle phase, for each key, the value from all the data nodes will be accumulated.
(x,(3,3,4))
(y,(3,3,4))
(z,(4,4,3))
Reducer - You have to write a program that will read the key from Shuffle & sum the values of a specific key. The output will be a key-value pair.
Eg:- After the Reducer phase, you will get the following output.
(x,10)
(y,10)
(z,10)
No comments:
Post a Comment