Skip to content

Commit

Permalink
On demand resources, started via on-demand tags
Browse files Browse the repository at this point in the history
On-demand tickets need to be prioritized over the normal tickets
(priority queue), otherwise we risk that normal tickets take "on demand"
resources.

If we want to prefer one "on demand" pool over the other "on demand"
(e.g. spot AWS over normal instances), we need to sort them through a
priority queue based on tag priority.  The fallback for spawn failures
(e.g. if SPOT instances don't start) isn't resolved yet, though.

Fixes: #24
  • Loading branch information
praiskup committed Aug 3, 2023
1 parent 5ae9781 commit 952a31d
Show file tree
Hide file tree
Showing 9 changed files with 769 additions and 51 deletions.
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ SHELLTEST_OPTIONS :=
SHELL_TESTS := \
basic.sh \
check.sh \
ondemand.sh \
reuse.sh

TEST_PYTHONS := python3
Expand Down
10 changes: 10 additions & 0 deletions NEWS
Original file line number Diff line number Diff line change
@@ -1,3 +1,13 @@
New in v5.0:

* New features

- New concept of "on demand" ticket tags added, these on demand tags
trigger a resource allocation. Until such a ticket is taken, the
corresponding resource pool has no resource allocated.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

New in v4.10:

* New features
Expand Down
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,9 @@ own when starting from scratch.
new VM in the cloud and running Ansible playbooks to provision it can take few
minutes), and users don't want to wait. It is a good idea to preallocate a
small number of resources that are ready to be used immediately.
- On demand allocation - In special "on demand" pool mode, resources are not
preallocated in advance but started on demand, only upon a ticket requesting
the resources.
- Livechecks - Clouds are unreliable. VMs can break while starting or become
unresponsive for various reasons. Resalloc periodically checks the liveness
of all resources and makes sure money doesn't leak out of our pockets.
Expand Down
21 changes: 21 additions & 0 deletions config/pools.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,27 @@
## - ci_test_machine_x86_64
## - ci_test_machine
##
## # This is similar to the "tags" configuration in terms of "matching
## # resources to tickets". But on demand tags trigger a completely
## # different pool behavior. Instead of preallocating a set of "free"
## # resources in advance dynamically, pool with the "tags_on_demand"
## # configured have by default zero resources allocated until some existing
## # ticket is taken with at least one of predefined "tags_on_demand". The
## # more tags are taken, the more resources are allocated on demand. By
## # example, if `beefy` tag is configured in pool, no resource is started
## # till `resalloc ticket --tag beefy` is taken. Note that contrary to
## # normal pools, the resources are allocated on demand, so resolving such
## # tickets always takes some time (unless the resource is reused within
## # reuse_opportunity_time). Multiple pools may provide the same
## # "on_demand_tags", but those tags may not be mixed between the "tags"
## # and "on_demand_tags" in multiple pools (configuration runtime error is
## # generated in such case). The "max_prealloc" config, if also
## # specified, is ignored (no preallocation is possible).
## tags_on_demand:
## - beefy_machine_x86_64
## - name: beefy_machine
## priority: -10
##
## # The "reuse" feature options. These options configure the mechanism of
## # re-assigning of previously used resources to multiple subsequent tickets
## # (when the assigned tickets belong to the same --sandbox). Still, when the
Expand Down
19 changes: 13 additions & 6 deletions resallocserver/logic.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@

from resalloc.helpers import RState, TState
from resallocserver import models
from sqlalchemy.orm import Query
from sqlalchemy.orm import Query, joinedload

Check warning

Code scanning / vcs-diff-lint

third party import "from sqlalchemy.orm import Query, joinedload" should be placed before "from resalloc.helpers import RState, TState" Warning

third party import "from sqlalchemy.orm import Query, joinedload" should be placed before "from resalloc.helpers import RState, TState"
from sqlalchemy import or_


Expand Down Expand Up @@ -69,10 +69,11 @@ def ready(self):
"""
Get ready resources, those which were never assigned or are released.
The sandbox-assigned resources are sorted above others - so they can be
re-used first.
re-used first. The query is already ordered by ID ASC.
"""
return (self.up().filter(models.Resource.ticket_id.is_(None))
.filter(models.Resource.check_failed_count==0))
.filter(models.Resource.check_failed_count==0)
.order_by(models.Resource.id.asc()))

def taken(self):
"""
Expand Down Expand Up @@ -156,9 +157,15 @@ def kill(self, res_id):
class QTickets(QObject):
query = Query(models.Ticket)

def waiting(self):
return self.query.filter_by(resource_id=None)\
.filter_by(state=TState.OPEN)
def waiting(self, preload_tags=False):
query = (
self.query.filter_by(resource_id=None)
.filter_by(state=TState.OPEN)
)
if preload_tags:
query = query.options(joinedload("tags"))
return query


def not_closed(self):
return self.query.filter(models.Ticket.state != TState.CLOSED)
Loading

0 comments on commit 952a31d

Please sign in to comment.