TEZ-4580: Slow preemption of new containers when re-use is enabled #374
+118
−3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When container reuse is enabled, preemption of lower priority containers that are not yet assigned to task, takes long time as they are released one at a time, and not the number of containers based when tez.am.preemption.percentage is high added in https://issues.apache.org/jira/browse/TEZ-1742.
Further investigation lead to following conclusion:
Warn log / Assertion error thrown because in preemptIfNeeded(), when releasing new containers, the loop counter is being decremented with each
releaseUnassignedContainers
, leading to looping only half number of times. By using another counter, assertion passes because of condition method returns with checkif (numPendingRequestsToService < 1) {
.In releaseContainer(), the container is not getting removed from
delayedContainers
queue and only fromheldContainers
map, hence same container is being picked up for release in every iteration till next cycle ofDelayedContainerManager
finds out that the container is not inheldContainers
and skips it with logSkipping delayed container as container is no longer running, containerId=...
This change adds a method in
DelayedContainerManager
to allow removal of delayed container and invokes it in releaseContainer method, which so far only removed it fromheldContainers
map.