-
Notifications
You must be signed in to change notification settings - Fork 424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TEZ-4580: Slow preemption of new containers when re-use is enabled #374
base: master
Are you sure you want to change the base?
Conversation
🎊 +1 overall
This message was automatically generated. |
if (delayedContainers.remove(container)) { | ||
LOG.info("Removed {} from delayed containers", container.getContainer().getId()); | ||
} else { | ||
LOG.warn("Unknown container {} sent for removal. Ignoring.", container.getContainer().getId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A valid case here is when a new container is allocated - added to delayedContainers, it is polled (removed) from queue but if there is no pending task request releaseContainer()
is invoked. Please suggest if the log level should be changed in info
or for both the newly added logs should be at debug
.
…nmerged) apache#374 (cherry picked from commit aea86ca)
…enabled (unmerged) apache#374 (#24) (cherry picked from commit aea86ca)
…enabled (unmerged) apache#374 (#24) (cherry picked from commit aea86ca)
When container reuse is enabled, preemption of lower priority containers that are not yet assigned to task, takes long time as they are released one at a time, and not the number of containers based when tez.am.preemption.percentage is high added in https://issues.apache.org/jira/browse/TEZ-1742.
Further investigation lead to following conclusion:
Warn log / Assertion error thrown because in preemptIfNeeded(), when releasing new containers, the loop counter is being decremented with each
releaseUnassignedContainers
, leading to looping only half number of times. By using another counter, assertion passes because of condition method returns with checkif (numPendingRequestsToService < 1) {
.In releaseContainer(), the container is not getting removed from
delayedContainers
queue and only fromheldContainers
map, hence same container is being picked up for release in every iteration till next cycle ofDelayedContainerManager
finds out that the container is not inheldContainers
and skips it with logSkipping delayed container as container is no longer running, containerId=...
This change adds a method in
DelayedContainerManager
to allow removal of delayed container and invokes it in releaseContainer method, which so far only removed it fromheldContainers
map.