Map Reduce

MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster.[1][2][3] A MapReduce program is composed of a map procedure, which performs filtering and sorting (such as sorting students by first name into queues, one queue for each name), and a reduce method, which performs a summary operation (such as counting the number of students in each queue, yielding name frequencies). The "MapReduce System" (also called "infrastructure" or "framework") orchestrates the processing by marshalling the distributed servers, running the various tasks in parallel, managing all communications and data transfers between the various parts of the system, and providing for redundancy and fault tolerance. The name MapReduce originally referred to the proprietary Google technology, but has since been genericized. https://en.wikipedia.org/wiki/MapReduce

Google technology for ConCurrency

Users specify a map function that processes a Key-Value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper. Programs written in this functional style (Functional Programming) are automatically parallelized and executed on a large cluster of commodity machines.

Ruby implementation? http://romeda.org/blog/2007/04/mapreduce-in-36-lines-of-ruby.html

Python


Edited:    |       |    Search Twitter for discussion