Skip to content

mapreduce

KimJeongChul edited this page Sep 21, 2019 · 1 revision

MapReduce

Library : multiprocessing

MapReduce WordCount workload, which counts the number of occurrences of each word in a given partitioned input dataset from Wikipedia(http://en.wikipedia.org/wiki/Wikipedia_database).

Lambda

Driver (code) Driver invoke multiple Map function(mapper) for parallel processing. It continues to create subsequent stages of reducers until a single reduced output is created.

Mapper (code) In this example, the mapper maps the occurrence on computer language ("JavaScript", "Java", "PHP", "Python", "C#", "C++", "Ruby", "CSS", "Objective-C", "Perl", "Scala", "Haskell", "MATLAB", "Clojure", "Groovy")

Reducer (code) Reducer keeps a dictionary of aggregate sums of occurrence of computer language.