Skip to content

Commit

Permalink
Soufianej hardware 1.0 (#60)
Browse files Browse the repository at this point in the history
* Python-chi 1.0 context module

Added the following new methods:

- list_sites(show: [None, “widget”, “text”] = None) -> [str]
- use_site(site_name: str = DEFAULT_SITE) -> None
- use_project(project_id: str = None) -> None
- choose_site() -> None, displays a dropdown widget to choose the site
- choose_project() -> None, displays a dropdown widget to choose the project
- check_credentials() -> None, prints authentication metadata
- set_log_level(debug: Bool), sets openstack debug logging to true,
  including HTTP request logs
- _is_ipynb() -> Bool, checks if the code is running within an ipy
  notebook. Used to determine whether to execute widgets

* Added custom exceptions

Raised when argument is not valid. These errors might be fixed by checking hardware catalog or documentation. Examples where this might be seen are:
- Site name is not valid
- Node type is not valid
- e.g. Resource does not exist

Raised when a request has valid arguments, but the resources are being used incorrectly, or can’t be used as requested. This type of error might depend on the time the notebook is run, due to the shared nature of the testbed.
Examples:
- Nodes matching filters (e.g. node_type) are unavailable
- Cannot allocate FIP
- Allocation expires soon
- Allocation has insufficient SUs for request

Raised when an error occurs with some Chameleon resource.
For example, if your node is having hardware issues, and so fails to provision, this will be raised.

Replaced thrown exceptions with their appropriate custom exception
accross all modules.

* Initial implementation of hardware module

Incomplete

* Python-chi 1.0 context module

Added the following new methods:

- list_sites(show: [None, “widget”, “text”] = None) -> [str]
- list_projects(show: [None, “widget”, “text”] = None) -> [str]
- use_site(site_name: str = DEFAULT_SITE) -> None
- use_project(project_id: str = None) -> None
- choose_site() -> None, displays a dropdown widget to choose the site
- choose_project() -> None, displays a dropdown widget to choose the project
- check_credentials() -> None, prints authentication metadata
- set_log_level(debug: str), changes logging level to either ERROR or
  DEBUG, including HTTP request logs if the latter is chosen

* Changed one more exception

* Python-chi 1.0 context module

Added the following new methods:

- list_sites(show: [None, “widget”, “text”] = None) -> [str]
- list_projects(show: [None, “widget”, “text”] = None) -> [str]
- use_site(site_name: str = DEFAULT_SITE) -> None
- use_project(project_id: str = None) -> None
- choose_site() -> None, displays a dropdown widget to choose the site
- choose_project() -> None, displays a dropdown widget to choose the project
- check_credentials() -> None, prints authentication metadata
- set_log_level(debug: str), changes logging level to either ERROR or
  DEBUG, including HTTP request logs if the latter is chosen

* typo

* Hardware module 1.0

Used to query hardware on Chameleon. the get_nodes() methods fetches a
list of nodes in the currently selected site in form of a list of Node
dataclass.

* Using allocation API to determine next free timeslot

---------

Co-authored-by: Soufiane Jounaid <[email protected]>
Co-authored-by: Soufiane Jounaid <[email protected]>
Co-authored-by: Mark Powers <[email protected]>
  • Loading branch information
4 people committed Sep 26, 2024
1 parent 00a5005 commit fd0e301
Show file tree
Hide file tree
Showing 2 changed files with 147 additions and 0 deletions.
1 change: 1 addition & 0 deletions chi/context.py
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,7 @@ def _check_deprecated(key):
)
return deprecated_extra_opts[key]


def _is_ipynb() -> bool:
try:
from IPython import get_ipython
Expand Down
146 changes: 146 additions & 0 deletions chi/hardware.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
from dataclasses import dataclass
from datetime import datetime, timezone
from typing import List, Optional, Tuple

from .clients import blazar
from .context import get, RESOURCE_API_URL

import requests
import logging

LOG = logging.getLogger(__name__)

@dataclass
class Node:
"""
Represents the Chameleon hardware that goes into a single node.
A dataclass for node information directly from the hardware browser.
"""
site: str
name: str
type: str
architecture: dict
bios: dict
cpu: dict
gpu: dict
main_memory: dict
network_adapters: List[dict]
placement: dict
storage_devices: List[dict]
uid: str
version: str

def next_free_timeslot(self) -> Tuple[datetime, Optional[datetime]]:
"""
Finds the next available timeslot for the hardware using the Blazar client.
Returns:
A tuple containing the start and end datetime of the next available timeslot.
If no timeslot is available, returns (end_datetime_of_last_allocation, None).
"""
raise NotImplementedError
blazarclient = blazar()

# Get allocations for this specific host
allocations = blazarclient.allocation.get(resource_id=self.uid)

# Sort allocations by start time
allocations.sort(key=lambda x: x['start_date'])

now = datetime.now(timezone.utc)

if not allocations:
return (now, None)

# Check if there's a free slot now
if datetime.fromisoformat(allocations[0]['start_date']) > now:
return (now, datetime.fromisoformat(allocations[0]['start_date']))

# Find the next free slot
for i in range(len(allocations) - 1):
current_end = datetime.fromisoformat(allocations[i]['end_date'])
next_start = datetime.fromisoformat(allocations[i+1]['start_date'])

if current_end < next_start:
return (current_end, next_start)

# If no free slot found, return the end of the last allocation
last_end = datetime.fromisoformat(allocations[-1]['end_date'])
return (last_end, None)

def _call_api(endpoint):
url = "{0}/{1}.{2}".format(RESOURCE_API_URL, endpoint, "json")
LOG.info("Requesting %s from reference API ...", url)
resp = requests.get(url)
LOG.info("Response received. Parsing to json ...")
data = resp.json()
return data

def get_nodes(
all_sites: bool = False,
filter_reserved: bool = False,
gpu: Optional[bool] = None,
min_number_cpu: Optional[int] = None,
) -> List[Node]:
"""
Retrieve a list of nodes based on the specified criteria.
Args:
all_sites (bool, optional): Flag to indicate whether to retrieve nodes from all sites.
Defaults to False.
filter_reserved (bool, optional): Flag to indicate whether to filter out reserved nodes.
Defaults to False. (Not Currently implemented)
gpu (bool, optional): Flag to indicate whether to filter nodes based on GPU availability.
Defaults to None.
min_number_cpu (int, optional): Minimum number of CPU logical cores per node.
Defaults to None.
Returns:
List[Node]: A list of Node objects that match the specified criteria.
"""

sites = []
if all_sites:
sites = [site.get("name") for site in _call_api("sites")['items']]
else:
sites.append(get("region_name"))

nodes = []

for site in sites:
# Soufiane: Skipping CHI@EDGE since it is not enrolled in the hardware API,
if site == "CHI@Edge":
print("Please visit the Hardware discovery page for information about CHI@Edge devices")
continue

endpoint = f"sites/{site.split('@')[1].lower()}/clusters/chameleon/nodes"
data = _call_api(endpoint)

for node_data in data['items']:
node = Node(
site=site,
name=node_data.get("node_name"),
type=node_data.get("node_type"),
architecture=node_data.get("architecture"),
bios=node_data.get("bios"),
cpu=node_data.get("processor"),
gpu=node_data.get("gpu"),
main_memory=node_data.get("main_memory"),
network_adapters=node_data.get("network_adapters"),
placement=node_data.get("placement"),
storage_devices=node_data.get("storage_devices"),
uid=node_data.get("uid"),
version=node_data.get("version"),
)

if isinstance(node.gpu, list):
gpu_filter = gpu is None or (node.gpu and gpu == bool(node.gpu[0]['gpu']))
else:
gpu_filter = gpu is None or (node.gpu and gpu == bool(node.gpu['gpu']))

cpu_filter = min_number_cpu is None or node.architecture['smt_size'] >= min_number_cpu

if gpu_filter and cpu_filter:
nodes.append(node)

return nodes

0 comments on commit fd0e301

Please sign in to comment.