Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support and handle optional "final" flag in .free RPC #1266

Merged
merged 5 commits into from
Aug 13, 2024

Commits on Aug 13, 2024

  1. qmamager callbacks: add support for final free flag

    Problem: recently, a large job on a large system was considered
    allocated by Fluxion, but was complete and released in flux-core
    (flux-framework/flux-core#6179). The proposed
    solution was to amend RFC 27 to include an optional "final" boolean
    flag in the `.free` RPC. That flag can be used by Fluxion to determine
    if there is an allocation state discrepancy between flux-core and
    sched.
    
    Add support to unpack the "final" boolean and send it to the qmanager
    policy for handling.
    milroy committed Aug 13, 2024
    Configuration menu
    Copy the full SHA
    5c7817b View commit details
    Browse the repository at this point in the history
  2. qmanager policies: handle final flag from .free RPC

    Problem: if the "final" flag from the `.free` RPC disagrees with the
    `full_removal` flag returned by partial cancel there is a discrepancy
    between flux-core and Fluxion.
    
    Run a full cancel if there is a discrepancy between flux-core and -sched
    allocation state and log errors.
    milroy committed Aug 13, 2024
    Configuration menu
    Copy the full SHA
    e7e3e38 View commit details
    Browse the repository at this point in the history
  3. resource module: add logging for cancellation error

    Problem: while running tests for this PR, a full cancellation failed but
    did not output the traverser error.
    
    Add logging to output traverser errors during cancellation.
    milroy committed Aug 13, 2024
    Configuration menu
    Copy the full SHA
    60ac721 View commit details
    Browse the repository at this point in the history
  4. traverser: don't return error for missing x_span in full cancellation

    Problem: if a partial cancellation does not fully remove an allocation
    but the "final" flag is set by the .free RPC a full cancellation is now
    run. However, the current traverser check for a missing exclusive span
    (x_span) considers this invalid for a full cancellation and returns an
    error.
    
    Update the check to only return an error for this condition if the
    cancellation is of type VTX_CANCEL.
    milroy committed Aug 13, 2024
    Configuration menu
    Copy the full SHA
    3f29235 View commit details
    Browse the repository at this point in the history
  5. testsuite: add flux-core issue test for housekeeping

    Problem: there is no test for flux-core issue
    flux-framework/flux-core#6179.
    
    Add tests in the Fluxion issues directory.
    milroy committed Aug 13, 2024
    Configuration menu
    Copy the full SHA
    507d368 View commit details
    Browse the repository at this point in the history