Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Rebase Milroy's branch to Flux-sched, work on REAPI update functionality. #2

Draft
wants to merge 18 commits into
base: rq-api-client
Choose a base branch
from

Conversation

tpatki
Copy link

@tpatki tpatki commented Oct 6, 2022

Testing

garlick and others added 18 commits September 30, 2022 09:35
Problem: t1006-qmanager-multiqueue.t exploits the fact that jobs
may be submitted with incorrect queue specifications when the
frobnicator is disabled, but this method won't work once the job
manager becomes queue-aware.

Submit test jobs with qmanager unloaded, then reconfigure queues
and reload qmanager to trigger the job exceptions.
testsuite: fix coverage method for queue exception
Problem: the current system defaults specify a
short SYSTEM_MAX_DURATION of 7 days. Propagating
the value to the resource graph will result in
unexpected behavior and unschedulable jobs as
the current time approaches 7 days from the
startup. The `uint64_t` types differ from
the int64_t used by std::chrono.

Update SYSTEM_MAX_DURATION to be a reasonably
large time in the future and change the
types to comply with std::chrono.
Problem: updated RFC 14 specifies that if the jobspec does
not indicate a duration then the acquired resource expiration
should be propagated to the job time limit. Currently there
is no way to track the current resource expiration in the
resource graph.

Add a struct based on `std::chrono::time_point`s for the
graph start and end times with default values for both.
Default values are defined in `system_defaults.hpp`.
Add a setter to set the times at resource acquisition.
Problem: updated RFC 14 requires that upon resource acquisition
schedulers propagate the expiration set in `R` into `Rv1`
fragments allocated to jobs when jobspec duration is not set.

Add unpacking of expiration during resource acquisition.
Check for invalid and inexpressable values of start and
end times and then convert valid times `std::chrono::time_point`s.
Set the resource graph duration after successful resource
acquisition.
Problem: updated RFC 14 specifies that if a jobspec duration is not set
the resource graph expiration should be propagated to the job's
time limit.

Add validity checks to ensure that if the jobspec duration is longer
than the graph duration the job is not scheduled. Add check to
ensure that the duration (uint64_t) is less than the expressable
int64_t max () value. Check if the scheduled `at` time is negative
(invalid) or greater than or equal to the resource graph end time
(invalid). However, if the job start time plus the duration is
greater than the graph end time we shorten the duration to
fit within the remaining time. We know this is a valid schedule
since the scheduling traversal has returned successfully for the
full duration.
Problem: there are no tests checking whether jobs with
duration=0 have their duration set according to RFC14.

Add checks for duration inheritance when duration=0
to the test suite.
Problem: there are no checks to ensure that jobspecs
with invalid durations are rejected.

Add checks for negative durations and overly long
durations.
Problem: there are no checks for jobs that request a
longer duration than the lifetime of the resource graph.

Add checks.
Resource graph duration and job expiration to conform to RFC 14
Problem: fluxion queues are currently configured in the qmanager
in the sched-fluxion-qmanager table, but a framework-wide
TOML configuration was proposed in RFC 33.

Implement RFC33 queues in the qmanager by transforming the
RFC 33 configuration into the old sched-fluxion-qmanager syntax.

Raise an error if queues are configured in the sched-fluxion
table instead of in the RFC33-compliant way.

Fixes flux-framework#950.
…-config

qmanager: support RFC33 TOML queue config
Problem: There are no release notes for flux-sched v0.25.0

Add notes for this release.
NEWS: add release notes for 0.25.0
@tpatki tpatki marked this pull request as draft October 6, 2022 18:58
@tpatki tpatki changed the title Rebase Milroy's branch to Flux-sched's tip of dev. WIP Rebase Milroy's branch to Flux-sched, work on REAPI update functionality. Oct 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants