Skip to content
Emilio Coppa edited this page Apr 1, 2014 · 88 revisions

This project contains several diagrams describing Apache Hadoop internals (2.3.0 or later). Even if these diagrams are NOT specified in any formal or unambiguous language (e.g., UML), they should be reasonably understandable (here some diagram notation conventions) and useful for any person who want to grasp the main ideas behind Hadoop. Unfortunately, not all the internal details are covered by these diagrams. You are free to help :)


Actors Tasks Model of computation Extra
  • Job Submitter
  • Node Manager
  • Resource Manager
  • Application Master

  • Map Task
  • Reduce Task
  • Merger
  • Input

  • Job
  • Task
  • Task Attempt
  • Application
  • Container
  • Async Dispatcher
  • Localized Resource
  • Container Allocator [AM]
  • Container Launcher [AM]
  • Containers Launcher [NM]

  • Parameter File Default Diagram(s)
    mapreduce.task.io.sort.mb mapred-site.xml 100 MapTask > Shuffle
    MapTask > Execution
    mapreduce.map.sort.spill.percent mapred-site.xml 0.80 MapTask > Shuffle
    MapTask > Execution
    mapreduce.task.io.sort.factor mapred-site.xml 100 MapTask > Shuffle
    Merge
    ReduceTask > Shuffle
    mapreduce.map.combine.minspills mapred-site.xml 3 MapTask > Shuffle
    mapreduce.job.reduces mapred-site.xml 1 MapTask > Shuffle
    0 Job > NEW => INITED
    mapreduce.cluster.local.dir mapred-site.xml ${hadoop.tmp.dir}/mapred/local MapTask > Shuffle
    mapreduce.reduce.merge.memtomem.enabled mapred-site.xml False Reduce Task > Shuffle
    mapreduce.framework.name mapred-site.xml yarn/local Reduce Task > Shuffle
    mapreduce.reduce.shuffle.parallelcopies mapred-site.xml 5 Reduce Task > Shuffle
    mapreduce.reduce.memory.totalbytes mapred-site.xml Runtime.maxMemory() Reduce Task > Fetcher
    mapreduce.reduce.shuffle.memory.limit.percent mapred-site.xml 0.25 Reduce Task > Fetcher
    mapreduce.job.ubertask.enable mapred-site.xml False Job > NEW => INITED
    mapreduce.job.ubertask.maxmaps mapred-site.xml 9 Job > NEW => INITED
    mapreduce.job.ubertask.maxreduces mapred-site.xml 1 Job > NEW => INITED
    mapreduce.job.ubertask.maxbytes mapred-site.xml dfs.block.size Job > NEW => INITED
    mapreduce.map.
failures.maxpercent mapred-site.xml 0 Job > RUNNING => {RUNNING, COMMITTING, FAIL ABORT}
    mapreduce.reduce.
failures.maxpercent mapred-site.xml 0 Job > RUNNING => {RUNNING, COMMITTING, FAIL ABORT}
    mapreduce.map.memory.mb mapred-site.xml 1024 Task Attempt > NEW => UNASSIGNED
    mapreduce.reduce.memory.mb mapred-site.xml 1024 Task Attempt > NEW => UNASSIGNED
    scheduler.maximum-allocation-mb yarn-site.xml 8192 Container Allocator
    mapreduce.reduce.shuffle.merge.percent mapred-site.xml 0.90 Reduce Task > Shuffle
    Clone this wiki locally