forked from flux-framework/flux-sched
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP Rebase Milroy's branch to Flux-sched, work on REAPI update functionality. #2
Draft
tpatki
wants to merge
18
commits into
milroy:rq-api-client
Choose a base branch
from
tpatki:patki-rq2-api-client
base: rq-api-client
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Problem: t1006-qmanager-multiqueue.t exploits the fact that jobs may be submitted with incorrect queue specifications when the frobnicator is disabled, but this method won't work once the job manager becomes queue-aware. Submit test jobs with qmanager unloaded, then reconfigure queues and reload qmanager to trigger the job exceptions.
testsuite: fix coverage method for queue exception
Problem: the current system defaults specify a short SYSTEM_MAX_DURATION of 7 days. Propagating the value to the resource graph will result in unexpected behavior and unschedulable jobs as the current time approaches 7 days from the startup. The `uint64_t` types differ from the int64_t used by std::chrono. Update SYSTEM_MAX_DURATION to be a reasonably large time in the future and change the types to comply with std::chrono.
Problem: updated RFC 14 specifies that if the jobspec does not indicate a duration then the acquired resource expiration should be propagated to the job time limit. Currently there is no way to track the current resource expiration in the resource graph. Add a struct based on `std::chrono::time_point`s for the graph start and end times with default values for both. Default values are defined in `system_defaults.hpp`. Add a setter to set the times at resource acquisition.
Problem: updated RFC 14 requires that upon resource acquisition schedulers propagate the expiration set in `R` into `Rv1` fragments allocated to jobs when jobspec duration is not set. Add unpacking of expiration during resource acquisition. Check for invalid and inexpressable values of start and end times and then convert valid times `std::chrono::time_point`s. Set the resource graph duration after successful resource acquisition.
Problem: updated RFC 14 specifies that if a jobspec duration is not set the resource graph expiration should be propagated to the job's time limit. Add validity checks to ensure that if the jobspec duration is longer than the graph duration the job is not scheduled. Add check to ensure that the duration (uint64_t) is less than the expressable int64_t max () value. Check if the scheduled `at` time is negative (invalid) or greater than or equal to the resource graph end time (invalid). However, if the job start time plus the duration is greater than the graph end time we shorten the duration to fit within the remaining time. We know this is a valid schedule since the scheduling traversal has returned successfully for the full duration.
Problem: there are no tests checking whether jobs with duration=0 have their duration set according to RFC14. Add checks for duration inheritance when duration=0 to the test suite.
Problem: there are no checks to ensure that jobspecs with invalid durations are rejected. Add checks for negative durations and overly long durations.
Problem: there are no checks for jobs that request a longer duration than the lifetime of the resource graph. Add checks.
Resource graph duration and job expiration to conform to RFC 14
Problem: fluxion queues are currently configured in the qmanager in the sched-fluxion-qmanager table, but a framework-wide TOML configuration was proposed in RFC 33. Implement RFC33 queues in the qmanager by transforming the RFC 33 configuration into the old sched-fluxion-qmanager syntax. Raise an error if queues are configured in the sched-fluxion table instead of in the RFC33-compliant way. Fixes flux-framework#950.
…-config qmanager: support RFC33 TOML queue config
Problem: There are no release notes for flux-sched v0.25.0 Add notes for this release.
NEWS: add release notes for 0.25.0
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Testing