-
Notifications
You must be signed in to change notification settings - Fork 751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GOBBLIN-2189] Implement ContainerCompletion callback in DynamicScalingYarnService #4092
base: master
Are you sure you want to change the base?
[GOBBLIN-2189] Implement ContainerCompletion callback in DynamicScalingYarnService #4092
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #4092 +/- ##
=============================================
- Coverage 49.03% 38.75% -10.29%
+ Complexity 10008 1599 -8409
=============================================
Files 1895 388 -1507
Lines 73612 16016 -57596
Branches 8188 1588 -6600
=============================================
- Hits 36097 6207 -29890
+ Misses 34280 9311 -24969
+ Partials 3235 498 -2737 ☔ View full report in Codecov by Sentry. |
gobblin-temporal/src/main/java/org/apache/gobblin/temporal/yarn/DynamicScalingYarnService.java
Outdated
Show resolved
Hide resolved
gobblin-temporal/src/main/java/org/apache/gobblin/temporal/yarn/DynamicScalingYarnService.java
Show resolved
Hide resolved
gobblin-temporal/src/main/java/org/apache/gobblin/temporal/yarn/DynamicScalingYarnService.java
Show resolved
Hide resolved
gobblin-temporal/src/main/java/org/apache/gobblin/temporal/yarn/DynamicScalingYarnService.java
Outdated
Show resolved
Hide resolved
gobblin-temporal/src/main/java/org/apache/gobblin/temporal/yarn/DynamicScalingYarnService.java
Outdated
Show resolved
Hide resolved
gobblin-temporal/src/main/java/org/apache/gobblin/temporal/yarn/YarnService.java
Outdated
Show resolved
Hide resolved
gobblin-temporal/src/main/java/org/apache/gobblin/temporal/yarn/DynamicScalingYarnService.java
Show resolved
Hide resolved
gobblin-temporal/src/main/java/org/apache/gobblin/temporal/yarn/DynamicScalingYarnService.java
Outdated
Show resolved
Hide resolved
gobblin-temporal/src/main/java/org/apache/gobblin/temporal/yarn/DynamicScalingYarnService.java
Outdated
Show resolved
Hide resolved
gobblin-temporal/src/main/java/org/apache/gobblin/temporal/yarn/DynamicScalingYarnService.java
Show resolved
Hide resolved
* </p> | ||
*/ | ||
@Override | ||
protected void handleContainerCompletion(ContainerStatus containerStatus) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method is not synchronized
, but it removes entries from containerMap and modifies removedContainerIds, whereas, reviseWorkforcePlanAndRequestNewContainers
is synchronized and also modifies both containerMap and removedContainerIds. If request for handleContainerCompletion
interleaves with a call to reviseWorkforcePlanAndRequestNewContainers
, race conditions/inconsistent state can happen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in handleContainerCompletion
the containerId which is removed will not be the one when removing the same inside reviseWorkforcePlanAndRequestNewContainers
and for removedContainerIds
at one place we are adding and at other removing, even if interleaving call happens then both are working with different containerIds not the same so chance of inconsistent state are too low given that both are thread-safe data structures as well
gobblin-temporal/src/main/java/org/apache/gobblin/temporal/yarn/YarnService.java
Show resolved
Hide resolved
* </p> | ||
*/ | ||
@Override | ||
protected void handleContainerCompletion(ContainerStatus containerStatus) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add tests for these methods
TestDynamicScalingYarnServiceManager testDynamicScalingYarnServiceManager = new TestDynamicScalingYarnServiceManager( | ||
mockGobblinTemporalApplicationMaster, new DummyScalingDirectiveSource()); | ||
testDynamicScalingYarnServiceManager.startUp(); | ||
Thread.sleep(5000); // 5 seconds sleep so that GetScalingDirectivesRunnable.run() is called for 5 times | ||
Thread.sleep(7000); // 5 seconds sleep so that GetScalingDirectivesRunnable.run() is called for 7 times |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason to update this to 7 seconds?
typo in comments 5 seconds sleep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetScalingDirectivesRunnable.run()
currently runs every second so for each second run it gets scalingDirectives if returned from ScalingDirectiveSource Impl which in our case DummyScalingDirectiveSource
returns scaling directives for 5 times so 7 second to check if it is not returning more than 5 times even if runnable keeps running
testDynamicScalingYarnServiceManager.shutDown(); | ||
Mockito.verify(mockDynamicScalingYarnService, Mockito.times(3)).reviseWorkforcePlanAndRequestNewContainers(Mockito.anyList()); | ||
Mockito.verify(mockDynamicScalingYarnService, Mockito.times(5)).reviseWorkforcePlanAndRequestNewContainers(Mockito.anyList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what exactly are we testing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of scaling directives returned by ScalingDirectiveSource
impl reviseWorkforcePlanAndRequestNewContainers
is called that many times
here DummyScalingDirectiveSource
returns scaling directives for 5 times so 5 invocations must happens to reviseWorkforcePlanAndRequestNewContainers
...ral/src/test/java/org/apache/gobblin/temporal/yarn/DynamicScalingYarnServiceManagerTest.java
Outdated
Show resolved
Hide resolved
if (currContainerMemoryMbs < MAX_REPLACEMENT_CONTAINER_MEMORY_MBS && newContainerMemoryMbs > MAX_REPLACEMENT_CONTAINER_MEMORY_MBS) { | ||
newContainerMemoryMbs = MAX_REPLACEMENT_CONTAINER_MEMORY_MBS; | ||
} else if (newContainerMemoryMbs > MAX_REPLACEMENT_CONTAINER_MEMORY_MBS) { | ||
log.warn("Expected replacement container memory exceeds the maximum allowed memory {}. Not requesting a replacement container.", | ||
MAX_REPLACEMENT_CONTAINER_MEMORY_MBS); | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be simplified to:
int newContainerMemoryMbs = Math.min(currContainerMemoryMbs * 2, MAX_REPLACEMENT_CONTAINER_MEMORY_MBS);
if (newContainerMemoryMbs > MAX_REPLACEMENT_CONTAINER_MEMORY_MBS) {
log.warn("Replacement container memory exceeds max limit {}, not requesting a replacement container.", MAX_REPLACEMENT_CONTAINER_MEMORY_MBS);
return;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will not work if currContainerMemoryMbs
is already at MAX_REPLACEMENT_CONTAINER_MEMORY_MBS
it will keep requesting containers of MAX_REPLACEMENT_CONTAINER_MEMORY_MBS
which will not help in anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, a typo - instead of new container, the check should be on current container. Can be updated to:
if (currContainerMemoryMbs >= MAX_REPLACEMENT_CONTAINER_MEMORY_MBS) {
log.warn("Container {} already had max allowed memory {} MBs. Not requesting a replacement container.", completedContainerId, currContainerMemoryMbs);
return;
}
int newContainerMemoryMbs = Math.min(currContainerMemoryMbs * 2, MAX_REPLACEMENT_CONTAINER_MEMORY_MBS);
|
||
public DynamicScalingYarnService(Config config, String applicationName, String applicationId, | ||
YarnConfiguration yarnConfiguration, FileSystem fs, EventBus eventBus) throws Exception { | ||
super(config, applicationName, applicationId, yarnConfiguration, fs, eventBus); | ||
|
||
this.actualWorkforceStaffing = WorkforceStaffing.initialize(0); | ||
this.workforcePlan = new WorkforcePlan(this.config, this.config.getInt(GobblinYarnConfigurationKeys.INITIAL_CONTAINERS_KEY)); | ||
this.removedContainerIds = ConcurrentHashMap.newKeySet(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update to use ConcurrentLinkedQueue, as ConcurrentHashMap.newKeySet()
works better for membership checks but not for ordered or frequent removals.
private final Queue<ContainerId> removedContainerIds;
removedContainerIds = new ConcurrentLinkedQueue<>();
Dear Gobblin maintainers,
Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
JIRA
Description
Implemented handleContainerCompletion(...) which launches replacement container based on exit status
Handled Scaling down of containers based on negative delta
Refactored YarnService to remove use of helix related tags and names
Tests
Updated one existing test
Manually triggered DummyDynamicScalingYarnServiceManager to check if scale down is happening or not, the stripped logs from that are
Commits