Skip to content

Commit

Permalink
traverser: prevent allocation of currently allocated resources
Browse files Browse the repository at this point in the history
Problem: issue #1043 identified a scenario where
Fluxion will grant a new allocation to a job while the
resources are still occupied by the previous allocation.
The double booking occurs due to the assumption
Fluxion makes that a job will not run beyond its
walltime. However, as the issue describes, an epilog
script may cause a job to run beyond its walltime. Since
Fluxion doesn't receive a `free` message until the epilog
completes, the allocation remains in the resource graph
but the scheduled point at allocation completion is
exceeded, allowing the resources to be allocated
to another job. There are other common scenarios that can
lead to multiple concurrent allocations, such as a job
getting stuck in CLEANUP.

Add a check for an existing allocation on each exclusive
resource vertex for allocation traversals during graph
traversal pruning. This prevents another job from receiving
the resources and allows reservations and satisfiability
checks to complete.
  • Loading branch information
milroy committed Jul 10, 2023
1 parent 506c0c4 commit 0efd133
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions resource/traversers/dfu_impl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,9 @@ int dfu_impl_t::by_excl (const jobmeta_t &meta, const std::string &s, vtx_t u,
// requested, we check the validity of the visiting vertex using
// its x_checker planner.
if (exclusive_in || resource.exclusive == Jobspec::tristate_t::TRUE) {
if (meta.alloc_type == jobmeta_t::alloc_type_t::AT_ALLOC &&
!(*m_graph)[u].schedule.allocations.empty ())
goto done;
errno = 0;
p = (*m_graph)[u].idata.x_checker;
njobs = planner_avail_resources_during (p, at, duration);
Expand Down

0 comments on commit 0efd133

Please sign in to comment.