This will take input files from files/input-*
and output the combined word count from input files.
Summary
- Master starts and creates tasks that will be performed by multiple workers.
- Multiple Worker starts and asks master for work.
- When a worker is done with assigned task it contacts master asking for another task, if all tasks are finished master asks worker to shutdown itself.
- If a worker dies/become slow due to some reason, master reassigns that task to some other worker.
- If a dead worker returns again, a new task is assigned to it and its old work is discarded as the worker was considered dead.
- Workers talks to Master via go RPCs and assumes they are running on different machines.
Usage:
go run master.go rpc.go files/input-*
for running Master.- Run
go run worker.go rpc.go
in one or more tabs for parallelism - After master is finished run
cat files/output-* > output.txt'
to get the final word count inoutput.txt
This relies on the master and workers sharing a common file system.
References:
- MIT 6.824
- Implementation is based on this paper MapReduce: Simplified Data Processing on Large Clusters