Skip to content

Commit

Permalink
qmanager: fix lifetime issue in partial release
Browse files Browse the repository at this point in the history
problem: The shared pointer for the job_t gets released before the
cancel call, causing a potential use-after-free if the job isn't held by
another queue at the time.

solution: take a reference to the shared_ptr before erasing it from the
m_jobs map
  • Loading branch information
trws committed Aug 20, 2024
1 parent 98837c6 commit 3ac8fbb
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion qmanager/policies/base/queue_policy_base.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -662,6 +662,9 @@ class queue_policy_base_t : public resource_model::queue_adapter_base_t {
m_running.erase (job_it->second->t_stamps.running_ts);
job_it->second->t_stamps.complete_ts = m_cq_cnt++;
job_it->second->state = job_state_kind_t::COMPLETE;
// hold a reference to the shared_ptr to keep it alive
// during cancel
auto job_sp = job_it->second;
m_jobs.erase (job_it);
if (final && !full_removal) {
// This error condition indicates a discrepancy between core and sched.
Expand All @@ -672,7 +675,7 @@ class queue_policy_base_t : public resource_model::queue_adapter_base_t {
__FUNCTION__,
static_cast<intmax_t> (id));
// Run a full cancel to clean up all remaining allocated resources
if (cancel (h, job_it->second->id, true) != 0) {
if (cancel (h, job_sp->id, true) != 0) {
flux_log_error (flux_h,
"%s: .free RPC full cancel failed for jobid "
"%jd",
Expand Down

0 comments on commit 3ac8fbb

Please sign in to comment.