-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the split configuration proposal #106
base: main
Are you sure you want to change the base?
Add the split configuration proposal #106
Conversation
This proposal aims to split the configuration object in two objects Signed-off-by: Nicolas Belouin <[email protected]>
- The name of the discovery handler: `discoveryHandlerName` | ||
- A string that a Discovery Handler knows how to parse to obtain necessary discovery details: `discoveryDetails` | ||
- A set of extra properties the Discovery Handler may need, that can be pulled from `ConfigMaps` or `Secrets` at runtime: `discoveryProperties` | ||
- The number of slots for a device instance: `capacity` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if capacity isn't a workload property? in my mind, it impacts how many workloads can exist at once more than being a reflection on anything discovered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The workloadConfiguration describes the workload akri is creating by itself, the capacity describes how many simultaneous users a given device can handle, be it one akri scheduled or one the user scheduled with some other way.
So I believe the capacity is a property of a device in that way, I even wonder if we shouldn't have the capacity in the Instance somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also makes sense for responsibility distribution, as the "agent" is currently responsible for managing the slots and it should be the one to manage the Discovery Configuration.
The "controller" on the other hand will manage the Workload Configuration.
The Instances will get created by the "agent" and read by the "controller".
I'll add a paragraph in the proposal about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I even wonder if we shouldn't have the capacity in the Instance somehow
historically, the Configuration was the place for a user to configure an Instance. if we are splitting Configuration into Discovery and Workload, i think capacity belongs in Workload. If we are introducing a third split (Discovery, Workload, and Instance), i can see it fitting in either Workload or Instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me the DiscoveryConfiguration is all about getting Instances created and configured and the WorkloadConfiguration is all about creating the "workload" (or as it is currently known "broker").
This split has multiple ideas behind it:
- you can use akri with "manual" workload scheduling (as per Requesting Resources) without a WorkloadConfiguration.
- we get rid of the namespaces for Instances (as it doesn't make sense since the resource is exposed and usable from any namespace as it is exposed by the node resource)
If we have the capacity in the WorkloadConfiguration, then the agent will need to read a WorkloadConfiguration to be able to create the DevicePlugin, so it means you must have a 1:1 relationship between a WorkloadConfiguration and a DiscoveryConfiguration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am more aligned with adding capacity to the DiscoveryConfiguration. To me, using the WorkloadConfiguration
is an add on to akri and not everyone will use it -- they may apply their own deployments. On the other hand, discovering and specifying a capacity for a device is essential to Akri and should belong in discovery. It is also something that is managed through the device plugin so it should fit in the Agent's control loop which is the discovery one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great! I like the idea you give of adding a paragraph on the following in an "implementation" section:
The "controller" on the other hand will manage the Workload Configuration.
The Instances will get created by the "agent" and read by the "controller".
I'll add a paragraph in the proposal about this.
- A string that a Discovery Handler knows how to parse to obtain necessary discovery details: `discoveryDetails` | ||
- A set of extra properties the Discovery Handler may need, that can be pulled from `ConfigMaps` or `Secrets` at runtime: `discoveryProperties` | ||
- The number of slots for a device instance: `capacity` | ||
- A set of extra properties that will get added to the `Instance` properties and forwarded to workloads using the device: `extraInstancesProperties` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should exist in the workload configuration, since these are environment variables that are set in every workload that uses the device. For example, setting frame rate for a udev camera here doesn't quite make sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These properties are stored at the device plugin level and managed by the agent, I think it can make sense to want to add information about a udev device that is unknown to the discovery handler, but of use to any consumer of the device: e.g. one craft a udev query to select modbus adapters and knows the device behind these uses address 0x42 and this is not discovered by the udev DH, so the DiscoveryConfiguration writer wants this information to be carried to the consumers of the discovered devices.
In the workload configuration, one can just add any additional property through pod/job/whatever env list.
|
||
The `WorkloadConfiguration` object is a namespaced object that will contain the following properties: | ||
|
||
- The name of the `DiscoveryConfiguration` whose `Instances` shall trigger the scheduling of the resources described in this `WorkloadConfiguration`: `discoverySelector` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we using the term discovery selector instead of "discovery handler"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well here we do not point to a discovery handler, but to a discovery configuration, so the full name should be discoveryConfigurationSelector
, I shortened it to discoverySelector
to make it simpler.
As discussed during the Community Meeting, here is the split configuration proposal on its own.
This proposal aims to split the configuration object in two objects
This change is needed for or can facilitate the implementation of several existing proposals:
Moreover, while this proposal tries to be as small as possible and stick as close as possible to the current behavior, several future enhancements or changes can already be envisioned while working on this proposal (these are here as future possibilities, and will be their own proposals if we feel we want to pursue those):
WorkloadConfiguration
to trigger onInstances
properties rather than theDiscoveryConfiguration
it is linked to, this could allow workload to be scheduled to only a subset of the discovered devices, or spanning over multipleDiscoveryConfigurations
;Instances
betweenDiscoveryConfigurations
, if multipleDiscoveryConfigurations
discover the exact same device, we may want to merge those as oneInstance
owned by severalDiscoveryConfigurations
;