You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
client contacts resource manager and asks it to run an application master process (step 1)
resource manager finds a node manager that can launch the application master in a container (steps 2a and 2b)
the application master may request more containers from the resource manager so that those containers can be used to run distributed computations (such as Map and Reduce tasks in the case of a MapReduce job) (step 3)
Scheduling: 3 schedulers available in YARN
FIFO: simple FIFO scheduling; use this one since we won't be simulating a shared cluster
Capacity Scheduler: fixed sized queues such as one for small jobs and one for large jobs; comes with the cost of overall cluster utilization
Fair Scheduler: essentially giving each job its "fair share"
client: client node that submits the MapReduce job
YARN resource manager: coordinates the allocation of compute resources on the cluster
Yarn node managers: launch and monitor compute containers on machines in the cluster
MapReduce application master: coordinates tasks running the MR job; the application master and MR tasks run in containers that are scheduled by the resource manager and managed by the node managers
Process of Submitting MapReduce Job Through YARN
Client submits MapReduce job by calling job.waitForCompletion() on line 87. This submits the job if it already hasn't yet been submitted. A JobSubmitter instance is created and submitJobInternal is called.
JobSubmitter does the following:
ask RM for application ID
makes sure output directory is valid
computes input splits
copies resources needed to run the job to HDFS (job JAR file, config file, input splits)
finally submits job by calling submitApplication()
RM receives call to submitApplication(), hands off request to YARN scheduler (part of RM) then allocates a container for the MR application master
RM launches the MR application master's process in that container, which is under the node manager's management (steps 5a and 5b)
MR application master receives input splits from HDFS so it knows how much map tasks are needed
MR application master requests containers for all map and reduce tasks from the RM (requests for reduce tasks are not made until 5% of map tasks have completed)
depending on the value set for mapreduce.job.ubertask.* and if mapreduce.job.ubertask.enable is set to true, the whole job may just be run in the same JVM as the MR application master
MR application master contacts the node manager where container allocation is to be made
that node manager starts an instance of YarnChild in a separate JVM
the instance of YarnChild localizes resources it needs to run the task (JAR files, and any files from a distributed cache)
the YarnChild then can run the map or reduce task
Map and Reduce Tasks
Map Tasks
output from a map task is written to a circular memory buffer which is 100 MB by default (set with mapreduce.task.io.sort.mb)
when the buffer reaches 80% (or whatever is set by mapreduce.map.sort.spill.percent) a background thread will spill contents to disk
if the buffer fills up, the map will block until the spill thread is complete
each time the spill threshold is reached, a new spill file is created
when the mapping is complete, all spill files (for this map task) are merged into a single partitioned and sorted output file (mapreduce.task.io.sort.factor controls the max number of streams to merge at once, and by default this is 10)
the map task will notify the MR application master with a heartbeat specifying that the output has been produced
the sorted, partitioned output file can then be served by the node manager that ran that map task using HTTP
the number of threads that can serve the partitions is twice the number of processors on the machine by default (mapreduce.shuffle.max.threads property)
Reduce Tasks
if a reduce task has been started, and a map task with a partition that the reduce task needs has finished, the reduce task has a small number of threads (by default 5, and can be set with reduce.shuffle.parallelcopies) copy the partitions
these threads constantly communicate with the MR application master via heartbeat asking for hosts with map outputs to copy (this is how the reduce threads know where to copy what outputs from what hosts)
files retrieved by the copy threads are copied to the reduce task's JVM memory if they are small enough ( mapreduce.reduce.shuffle.input.buffer.percent specifies the proportion of the heap to use for inputs)
when the in-memory buffer reaches a threshold size (mapreduce.reduce.shuffle.merge.percent) or reaches a threshold number of map outputs (mapreduce.reduce.merge.inmem.threshold) it is merged and spilled to disk
copies accumulated on disk are merged into a larger sorted file by a background thread
final round of merges go directly into reduce
The text was updated successfully, but these errors were encountered:
Notes
Sources:
YARN
Job Submission
Process of Submitting Job Through YARN
Scheduling: 3 schedulers available in YARN
MapReduce
Job Submission & Initialization
Components
Process of Submitting MapReduce Job Through YARN
job.waitForCompletion()
on line 87. This submits the job if it already hasn't yet been submitted. AJobSubmitter
instance is created andsubmitJobInternal
is called.submitApplication()
submitApplication()
, hands off request to YARN scheduler (part of RM) then allocates a container for the MR application mastermapreduce.job.ubertask.*
and ifmapreduce.job.ubertask.enable
is set to true, the whole job may just be run in the same JVM as the MR application masterYarnChild
in a separate JVMYarnChild
localizes resources it needs to run the task (JAR files, and any files from a distributed cache)YarnChild
then can run the map or reduce taskMap and Reduce Tasks
Map Tasks
mapreduce.task.io.sort.mb
)mapreduce.map.sort.spill.percent
) a background thread will spill contents to diskmapreduce.task.io.sort.factor
controls the max number of streams to merge at once, and by default this is 10)mapreduce.shuffle.max.threads
property)Reduce Tasks
reduce.shuffle.parallelcopies
) copy the partitionsmapreduce.reduce.shuffle.input.buffer.percent
specifies the proportion of the heap to use for inputs)mapreduce.reduce.shuffle.merge.percent
) or reaches a threshold number of map outputs (mapreduce.reduce.merge.inmem.threshold
) it is merged and spilled to diskThe text was updated successfully, but these errors were encountered: