Review Article

Big Data: Survey, Technologies, Opportunities, and Challenges

Table 4

MapReduce tasks.

StepsTasks

(1) Input (i) Data are loaded into HDFS in blocks and distributed to data nodes
(ii) Blocks are replicated in case of failures
(iii) The name node tracks the blocks and data nodes

(2) Job Submits the job and its details to the Job Tracker

(3) Job initialization(i) The Job Tracker interacts with the Task Tracker on each data node
(ii) All tasks are scheduled

(4) Mapping (i) The Mapper processes the data blocks
(ii) Key value pairs are listed

(5) Sorting The Mapper sorts the list of key value pairs

(6) Shuffling (i) The mapped output is transferred to the Reducers
(ii) Values are rearranged in a sorted format

(7) Reduction Reducers merge the list of key value pairs to generate the final result

(8) Result(i) Values are stored in HDFS
(ii) Results are replicated according to the configuration
(iii) Clients read the results from the HDFS