Skip to content

Commit

Permalink
Force bq.enter()/bq.leave() to be called if the scheduler is idle
Browse files Browse the repository at this point in the history
Once the final workers shut down, bq.enter()/bq.leave() won't end up
getting called. This means that they only get removed from the scheduler
the next time any RPC is called (e.g., when workers start to come online
again). This unfortunately causes Prometheus metrics to be incorrect in
the meantime.
  • Loading branch information
EdSchouten committed Nov 29, 2024
1 parent 8a43a77 commit 1be1483
Show file tree
Hide file tree
Showing 2 changed files with 32 additions and 1 deletion.
24 changes: 23 additions & 1 deletion cmd/bb_scheduler/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,29 @@ func main() {
actionRouter,
executeAuthorizer,
modifyDrainsAuthorizer,
killOperationsAuthorizer)
killOperationsAuthorizer,
)

// Force periodic cleanups of stale workers. This also
// happens automatically when RPCs occur, but that's not
// sufficient to ensure Prometheus metrics are updated
// if the final workers disappear.
//
// TODO: Maybe it's better to let InMemoryBuildQueue
// implement prometheus.Collector? Then cleanups can run
// whenever the scheduler is scraped.
dependenciesGroup.Go(func(ctx context.Context, siblingsGroup, dependenciesGroup program.Group) error {
t := time.NewTicker(time.Minute)
for {
select {
case <-t.C:
buildQueue.ForceCleanup()
case <-ctx.Done():
t.Stop()
return nil
}
}
})

// Create predeclared platform queues.
for _, platformQueue := range configuration.PredeclaredPlatformQueues {
Expand Down
9 changes: 9 additions & 0 deletions pkg/scheduler/in_memory_build_queue.go
Original file line number Diff line number Diff line change
Expand Up @@ -1258,6 +1258,15 @@ func (bq *InMemoryBuildQueue) leave() {
bq.lock.Unlock()
}

// ForceCleanup forcefully runs any pending cleanup tasks. This method
// can be invoked periodically to ensure that workers are removed, even
// if no other RPC traffic occurs. This ensures that Prometheus metrics
// report the correct values.
func (bq *InMemoryBuildQueue) ForceCleanup() {
bq.enter(bq.clock.Now())
bq.leave()
}

// getIdleSynchronizeResponse returns a synchronization response that
// explicitly instructs a worker to return to the idle state.
func (bq *InMemoryBuildQueue) getIdleSynchronizeResponse() *remoteworker.SynchronizeResponse {
Expand Down

0 comments on commit 1be1483

Please sign in to comment.