maptoreduce
MapReduce is a programming model and framework for processing large data sets on a distributed cluster. A MapReduce program runs two functions: Map and Reduce. The Map function processes input records and emits intermediate key-value pairs. The framework then groups these pairs by key and passes the list of values for each key to the Reduce function, which produces the final output.
The model was introduced by Google in 2004 in a paper by Jeffrey Dean and Sanjay Ghemawat.
In a typical deployment, data is split into blocks and processed in parallel by Map tasks across
MapReduce is well suited to batch processing tasks like log analysis, indexing, and large-scale data transformations.