Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: armonik domain definitions aep #47

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 70 additions & 0 deletions AEP/aep-0000y.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# AEP : ArmoniK's Domain Definition

| |ArmoniK Enhancement Proposal|
---: |:---
**AEP** |
**Title** | ArmoniK's Domain Definition
**Author** | Italo Aguiar <<[email protected]>>, Quentin Delamea <<[email protected]>>
**Status** | Draft
**Type** | Standard
**Creation Date** | 2024-06-04

# Abstract

This AEP proposes a comprehensive definition of ArmoniK's domain. A domain is an area of interest or control within our system. Our goal is to define precisely the core domain entities to facilitate further development.

# Motivation

The purpose of this document is to establish a clear and consistent definition of ArmoniK's concepts, enabling other documents and standards to effectively reuse them.

# Rationale

To describe more complex features of ArmoniK accurately, it is crucial to have a shared vocabulary and avoid redundancy or discrepancies. While some definitions already exist in the ArmoniK glossary, this AEP seeks to enable community discussion and approval to ensure an unified understanding and usage of these terms.

# Specification

## Worker

User-developed containerized software capable of performing one or several user-implemented tasks. A worker is built on top of a uniformized runtime environment. A worker process input data, outputing more data from its calculation, as while as it can produce new tasks that might be handled by itself or by different workers. A worker embeds an implementation of the communication protocols with the scheduling agent.

## Scheduling agent

Containerized software cohabiting with a worker, running a specific algorithm to determine which tasks its associated workers should perform, scheduling tasks on the workers and monitoring their execution. It also manages all interactions between the worker and the databases (retrieving/saving data, creating new tasks, etc.), as well as managing workers errors and retrying/resubmitting failed tasks when necessary. A scheduling agent, like a worker, exists within a single partition.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Containerized software cohabiting with a worker, running a specific algorithm to determine which tasks its associated workers should perform, scheduling tasks on the workers and monitoring their execution. It also manages all interactions between the worker and the databases (retrieving/saving data, creating new tasks, etc.), as well as managing workers errors and retrying/resubmitting failed tasks when necessary. A scheduling agent, like a worker, exists within a single partition.
Containerized software cohabiting with a worker inside a pod, running the main algorithm of the orchestrator which will determine the tasks that the linked worker will perform. It will schedul the task on the workers and monitor their execution. It also manages all interactions between the worker and the databases (retrieving/saving data, creating new tasks, etc.), as well as managing workers errors and retrying/resubmitting failed tasks when necessary. A scheduling agent, like a worker, exists within a single partition.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we shouldn't use "pod" once we aim to go bare metal soon and it is mostly associated with k3s


## Partition

Logical segmentation of the cluster's pool of machines to distribute workloads according to usage. This feature is provided and handled by ArmoniK.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should precise that this the creation is done by Armonik only at deployement stage (for now).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it may change soon I think it's better not to give this precision.


## Task

An atomic computation that takes one or several inputs and outputs one or several results. A task is launched by a client or by another task and processed by a worker. In ArmoniK, a task cannot communicate with another one directly. They can, however, depend on each other via their input/output data, known as data dependency.

A task is defined by:

- **TaskOptions**: Set of parameters specifying the execution conditions, for instance:
- *PartitionId*: The Id of the partition where the task must be executed
italo1aguiar-aneo marked this conversation as resolved.
Show resolved Hide resolved
- *Maximum Duration*: The maximum duration of a task
- *Max Retries*: The maximum number of retries
- *Priority*: The priority level of the task execution (the algorithm performs its best effort on tasks' priorities; however, it is possible that the priority won't always be respected)

- **Data Dependencies**: Input data for a given task that depends on another unique task. Data dependencies formalize dependencies between tasks.
italo1aguiar-aneo marked this conversation as resolved.
Show resolved Hide resolved

- **Expected Outputs**: Data that must be generated as output from a task. If a task submits new tasks then it can transfer responsibility for generating all or part of its outputs to the tasks it submits.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above you are speaking about results, why not use this term here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm avoiding using the word "results" since we aim to use Blobs from now on. Maybe I could take out the other usage of this word to avoid confusion.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the idea to avoid using the term "result" maybe we should avoid using "data dependencies" but "expected input ids" instead?


## Blob

Abstraction for ArmoniK's task-related data, including data dependencies and expected outputs.

- **Blob Metadata**: Abstraction which refer to a blob, whether it exists or not. Blob metadata might be used when referring to data that is expected to exist in its full form in the futur, such as expected outputs or data dependencies before the blob content is created.

## Session

A session is a logical container for tasks and associated data (task status, blobs, errors, etc.). Every task is submitted within a session. An existing session can be paused and resumed to retrieve data or submit new tasks. When a session is canceled, all associated executions still in progress are interrupted.

## Events

Events are abstractions that enable users to be notified whenever a change happens to certain entities, for example: sessions, tasks, and blobs, which must be specified by users.

## Copyright

This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.
Loading