Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add proposal for status field in akri resources #77

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 78 additions & 0 deletions proposals/akri-resources-status.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Status of akri resources

Currently, both akri managed resources (`Instances` and `Configurations`) only have a `spec` and `metadata` field, this doesn't allow the akri
components to give feedback to the user. In order to do so, we can add a `status` field to the resources, this document describes how this field
should be populated.

## Instances

For an instance, the status should give insight about the broker resources (if any), to do that, you can use the following conditions:

- Healthy: The instance is currently detected by an agent, "True" (instance detected) and "Unknown" (in grace period) are the only valid statuses
- BrokerScheduled: The broker resources are all created (absent if no broker)
johnsonshih marked this conversation as resolved.
Show resolved Hide resolved
- BrokerReady: All broker resources are "Ready" or "Succeeded"
- Ready: All above conditions are True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if no brokers are deployed for an instance? what does BrokerReady display as?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are no defined brokers to be deployed for an Instance, BrokerReady condition would simply not be present and Ready would simple reflect the Healthy condition, similar to how BrokerScheduled behave.
I'll update the proposal to add the same comment that exists for BrokerScheduled.


When the BrokerScheduled or Ready condition is False, the message shall include what resources are the cause of this.

In order to be able to correctly set the "Healthy" condition, a list of healthy nodes (must be a subset of `spec.nodes` field) is needed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does healthyNodes need to be listed in the status? Could it be a determinate that at least one node is healthy for "Healthy" to be true but not necessary for that information to be listed in the instance status

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I don't see any way for an agent to know it is the last with a "healthy" device (maybe healthy is not the best term, as it means not in grace period here) and thus switch the Healthy condition status if it goes down.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is making me realize something: when an agent on a node discovers an instance, it adds itself to the instance's nodes list; however, when it stops being able to see the instance it deletes the instance rather than removing itself from the nodes and only deleting if it is the last one left. This seems problematic because the device could be updated to explicitly no longer be able to connect with that node but is still able to connect to the others.

Is the idea here that healthyNodes reflects accurately what nodes can see the device and we update the flow to no longer delete an instance if it is offline (to a single agent) unless it is the last online? Right now, an unhealthy instance would never exist, it would already have been deleted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we need to change that behavior (in fact I think I'll create a specific issue to fix that, as this looks like a bug to me).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for creating the issue!


Here is an example of the status of an Instance:

```yaml
status:
conditions:
- type: Healthy
status: "True"
lastTransitionTime: "2023-07-24T10:40:00Z"
massage: ""
reason: Discovered
- type: BrokerScheduled
status: "True"
lastTransitionTime: "2023-07-24T10:40:00Z"
message: ""
reason: AllResourcesScheduled
- type: BrokerReady
status: "False"
lastTransitionTime: "2023-07-24T10:40:00Z"
message: "broker-pod is not Ready"
reason: BrokerNotReady
- type: Ready
status: "False"
lastTransitionTime: "2023-07-24T10:40:00Z"
message: "broker-pod is not Ready"
reason: BrokerNotReady
healthyNodes:
- node-a
- node-b
```

## Configurations

For a configuration, the status should give insight about the Configuration itself, and its Instances:

- Started: At least an agent has the matching discovery handler, and the discovery handler is discovering this configuration
- InstancesReady: All instances are Ready (absent if no instances)
- Ready: all above conditions are True

Here is an example of the status of a Configuration:

```yaml
status:
conditions:
- type: Started
status: "True"
lastTransitionTime: "2023-07-24T10:40:00Z"
massage: ""
reason: DiscoveryHandlerStarted
- type: InstancesReady
status: "False"
lastTransitionTime: "2023-07-24T10:40:00Z"
massage: "Instance foo-bar-00095f is not Ready"
reason: InstanceNotReady
- type: Ready
status: "False"
lastTransitionTime: "2023-07-24T10:40:00Z"
massage: "Instance foo-bar-00095f is not Ready"
reason: InstanceNotReady
```