Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement dynamic selection of parent prefix from a set of custom fields #90

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

henrybear327
Copy link
Collaborator

@henrybear327 henrybear327 commented Oct 8, 2024

Design

We introduce a new parentPrefixSelector field in the CR. It's a list of custom field key-value mapping, where both the key and value are of the type string. For example

parentPrefixSelector:
    environment: "Production"
    ipVersion: "4"

We now have 3 cases:

  • If only parentPrefix is set, we continue on as-is
  • If only parentPrefixSelector is set, we would take all Prefixes that exactly match all custom field's key-value specified in parentPrefixSelector. We would then pick the first prefix that is able to satisfy the prefixLength requirement as the parent prefix.
  • If we have both parentPrefix and parentPrefixSelector present in the CR, the CRD validation will reject this yaml file

The parentPrefix will be set in the status field ParentPrefix, and this will be used as the source of truth for the rest of the reconcile function.

Known issue

TODO

  • Discuss if using status field to store the computed ParentPrefix value is desired
  • The restoration hash mechanism for parentPrefixSelector would require some documentation, as there might be the case there the picked parent prefix is different for the same parentPrefixSelector (e.g. due to prefix space exhaustion), and thus, the restoration might not work as user anticipated.

Notes

@henrybear327 henrybear327 self-assigned this Oct 8, 2024
@henrybear327 henrybear327 marked this pull request as draft October 8, 2024 11:08
@henrybear327 henrybear327 force-pushed the feat/issue_79 branch 7 times, most recently from 335e04b to 97c512d Compare October 8, 2024 11:25
@henrybear327
Copy link
Collaborator Author

Discuss if using the status field to store the computed ParentPrefix value is desired

We agree that the spec field is read-only, so we wouldn't put any computed ParentPrefix there.

The status field is what we have write access to and where we reflect the relevant internal states to the external user. Thus, we have decided to use it to store the computed ParentPrefix and use it as the source of truth.

@henrybear327
Copy link
Collaborator Author

Improved CRD validation: According to [2], the oneOf validation will only come in the next version.

cc: @alexandernorth @bruelea

Reference:
[1] https://kubernetes.io/blog/2022/09/29/enforce-immutability-using-cel/
[2] https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#validation-ratcheting

@henrybear327 henrybear327 force-pushed the feat/issue_79 branch 2 times, most recently from 787fecd to 1ad0f81 Compare October 10, 2024 21:25
@henrybear327 henrybear327 marked this pull request as ready for review October 10, 2024 21:27
@henrybear327
Copy link
Collaborator Author

henrybear327 commented Oct 10, 2024

For some reason, when we are querying for custom fields, for example using this URL http://localhost:8080/ipam/prefixes/?q=&cf_poolName=Pool+2&cf_environment=production, the value of the custom field cf_environment will be capitalized automatically. This has an impact on us as we can only enter the value in custom fields in capitalized form.

Reproduction step:

  • type in http://localhost:8080/ipam/prefixes/?q=&cf_poolName=Pool+2&cf_environment=production in your browser
  • you can see that the search that went through NetBox is actually http://localhost:8080/ipam/prefixes/?q=&cf_poolName=Pool+2&cf_environment=Production

The demo SQL files have adhered to this "observation", but I am not able to wrap my head around this, since for the restoration hash, we have no issue with it

cc: @alexandernorth @bruelea

@henrybear327
Copy link
Collaborator Author

henrybear327 commented Oct 10, 2024

To maintain the backward compatibility of the restoration hash and extend it to support parentPrefixSelection, we have made the following changes to the hash:

  • append the parentPrefixSelection key-value pairs to the end of the hash during hash calculation. The exact string that is appended is done by sorting the keys first before appending, so in the case that only the key order of the parentPrefixSelection changes, the hash value won't be affected
  • we take the value of the parent prefix from the spec for the hash computation, thus, in the case of parentPrefixSelection, it will be an empty string

With the aforementioned change, the hash can by design distinguish the prefix claimed by ParentPrefix or parentPrefixSelection, thus, making the restoration process relying on the hash achievable.

internal/controller/prefix_controller.go Outdated Show resolved Hide resolved
internal/controller/prefixclaim_controller.go Show resolved Hide resolved
pkg/netbox/api/ip_address_claim.go Outdated Show resolved Hide resolved
@henrybear327
Copy link
Collaborator Author

Comment from @bruelea: add the newly created sample to the kustomization: config/samples/kustomization.yaml

Done

@jitendrs
Copy link
Collaborator

Please add me as reviewer.

@henrybear327 henrybear327 force-pushed the feat/issue_79 branch 3 times, most recently from 5c97a94 to 592375a Compare October 15, 2024 08:46
@henrybear327
Copy link
Collaborator Author

henrybear327 commented Oct 15, 2024

For some reason, when we are querying for custom fields, for example using this URL http://localhost:8080/ipam/prefixes/?q=&cf_poolName=Pool+2&cf_environment=production, the value of the custom field cf_environment will be capitalized automatically. This has an impact on us as we can only enter the value in custom fields in capitalized form.

Reproduction step:

  • type in http://localhost:8080/ipam/prefixes/?q=&cf_poolName=Pool+2&cf_environment=production in your browser
  • you can see that the search that went through NetBox is actually http://localhost:8080/ipam/prefixes/?q=&cf_poolName=Pool+2&cf_environment=Production

The demo SQL files have adhered to this "observation", but I am not able to wrap my head around this, since for the restoration hash, we have no issue with it

cc: @alexandernorth @bruelea

So as @alexandernorth explained, on browsers, they might modify the URLs. But for the API endpoints what we send is directly being processed.

So in the case, this is not an issue for us when querying using API endpoints! But when we are doing debugging using URLs, we might run into issues.

Copy link
Contributor

@jstudler jstudler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @henrybear327
Thanks for the work. I did some black box testing without digging too much into the code so far. I've left some comments regarding wording and documentation. Please remove "custom field" where it's not explicitly custom fields to avoid confusion for users.

Manual testing on a local kind cluster didn't work for me:

make create-kind && make deploy-kind
kubectl apply -f config/samples/netbox_v1_prefixclaim_customfields.yaml
kubectl describe pxc prefixclaim-customfields-sample
Name:         prefixclaim-customfields-sample
Namespace:    default
Labels:       app.kubernetes.io/managed-by=kustomize
              app.kubernetes.io/name=netbox-operator
Annotations:  <none>
API Version:  netbox.dev/v1
Kind:         PrefixClaim
Metadata:
  Creation Timestamp:  2024-11-12T08:52:02Z
  Generation:          1
  Resource Version:    1495
  UID:                 ab6c76d1-08b4-4a2d-894c-da04f3961b4e
Spec:
  Comments:     your comments
  Description:  some description
  Parent Prefix Selector:
    Environment:       Production
    Pool Name:         Pool 1
  Prefix Length:       /31
  Preserve In Netbox:  true
  Site:                DM-Akron
  Tenant:              MY_TENANT
Events:                <none>

And the operator logs are rather short:

2024-11-12T09:02:57Z    INFO    prefixClaim reconcile loop started      {"controller": "prefixclaim", "controllerGroup": "netbox.dev", "controllerKind": "PrefixClaim", "PrefixClaim": {"name":"prefixclaim-customfields-sample","namespace":"default"}, "namespace": "default", "name": "prefixclaim-customfields-sample", "reconcileID": "2b0e902f-aae0-42a4-860e-f96f0aba8ea3"}
2024-11-12T09:02:57Z    ERROR   Reconciler error        {"controller": "prefixclaim", "controllerGroup": "netbox.dev", "controllerKind": "PrefixClaim", "PrefixClaim": {"name":"prefixclaim-customfields-sample","namespace":"default"}, "namespace": "default", "name": "prefixclaim-customfields-sample", "reconcileID": "2b0e902f-aae0-42a4-860e-f96f0aba8ea3", "error": "resource name may not be empty"}

Will try again later with running it locally against demo.netbox.dev.

config/samples/netbox_v1_prefixclaim_customfields.yaml Outdated Show resolved Hide resolved
config/samples/netbox_v1_prefixclaim_customfields.yaml Outdated Show resolved Hide resolved
ParentPrefixSelectorGuide.md Show resolved Hide resolved
api/v1/prefixclaim_types.go Show resolved Hide resolved
prefixLength: "/31"
parentPrefixSelector: # The keys and values are case-sensitive
environment: "Production"
poolName: "Pool 1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit confused here. The custom field name is poolName. Shouldn't the filter in here then be cf_poolName? Or how do you distinguish custom fields from built in fields? I think it would make sense to be consistent with the netbox filtering https://demo.netbox.dev/static/docs/rest-api/filtering/#filtering-by-custom-field

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I manually filter out tenant and site, so we don't force the user to build the netbox API query string in the CRs.

Should we have the user enter the prefix cf_ when adding the entries to save us some implementation hassle, or, we handle this for the user, which looks nicer on the UX standpoint. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine both ways. If we handle cf_ for the user it's one one side nice but on the other side also confusing. Depends a bit whether we want to stick to the UI (handle it for the user) or API (the user has to specify cf_ himself). wdyt @bruelea ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For spec.customFields the cf_ is handled for the use, so we could do the same for parentPrefixSelector.

pkg/netbox/api/helper.go Show resolved Hide resolved
pkg/netbox/api/helper.go Outdated Show resolved Hide resolved
@jstudler
Copy link
Contributor

If you use a PrefixClaim using parentPrefixSelector with an invalid Tenant (a Tenant that doesn't exist in NetBox), the Event will show:

The parent prefix was not able to be computed. no parent prefix can be obtained with the query conditions set in ParentPrefixSelector, err = <nil>, number of candidates = 0

Spec example:

apiVersion: netbox.dev/v1
kind: PrefixClaim
metadata:
  name: prefixclaim-customfields-sample
spec:
  tenant: "MY_TENANT that doesn't exist"
  site: "DM-Akron"
  preserveInNetbox: true
  prefixLength: "/31"
  parentPrefixSelector: # The keys and values are case-sensitive
    environment: "PostProduction"
    poolName: "Pool 1"

This is quite misleading. Do you think there is an easy way to enhance this?

@jstudler
Copy link
Contributor

jstudler commented Nov 12, 2024

Under some circumstances the parent prefix will be computed but it's already exhausted. Even if you add more prefixes, the system will not recover and compute/chose a new parent prefix. Example describe:

Name:         prefixclaim-customfields-sample-09
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  netbox.dev/v1
Kind:         PrefixClaim
Metadata:
  Creation Timestamp:  2024-11-12T14:26:55Z
  Generation:          1
  Resource Version:    4212
  UID:                 a3b68d89-6c66-4d63-8b39-c95dd9c326b5
Spec:
  Parent Prefix Selector:
    Environment:       PostProduction
    Pool Name:         Pool 254
  Prefix Length:       /30
  Preserve In Netbox:  false
  Tenant:              Dunder-Mifflin, Inc.
Status:
  Conditions:
    Last Transition Time:  2024-11-12T14:27:11Z
    Message:               The parent prefix was computed successfully. parentPrefix is computed: 1.254.0.0/27
    Reason:                ParentPrefixComputed
    Status:                True
    Type:                  ParentPrefixComputed
    Last Transition Time:  2024-11-12T14:28:13Z
    Message:               Failed to fetch new Prefix from NetBox. parent prefix exhausted
    Reason:                PrefixCRNotCreated
    Status:                False
    Type:                  PrefixAssigned
  Parent Prefix:           1.254.0.0/27
Events:
  Type     Reason                    Age    From                     Message
  ----     ------                    ----   ----                     -------
  Normal   ParentPrefixComputed      4m41s  prefix-claim-controller  The parent prefix was computed successfully. parentPrefix is computed: 1.254.0.0/27
  Warning  PrefixCRNotCreated        3m39s  prefix-claim-controller  Failed to fetch new Prefix from NetBox. parent prefix exhausted
  Warning  FailedToLockParentPrefix  2m     prefix-claim-controller  failed to lock parent prefix 1.254.0.0/27

Could we fix this by removing .status.parentPrefix if we come to the condition that has the message "parent prefix exhausted"?

@henrybear327
Copy link
Collaborator Author

@jstudler please take another pass when you have time! :) Thanks

@henrybear327
Copy link
Collaborator Author

If you use a PrefixClaim using parentPrefixSelector with an invalid Tenant (a Tenant that doesn't exist in NetBox), the Event will show:

The parent prefix was not able to be computed. no parent prefix can be obtained with the query conditions set in ParentPrefixSelector, err = <nil>, number of candidates = 0

Spec example:

apiVersion: netbox.dev/v1
kind: PrefixClaim
metadata:
  name: prefixclaim-customfields-sample
spec:
  tenant: "MY_TENANT that doesn't exist"
  site: "DM-Akron"
  preserveInNetbox: true
  prefixLength: "/31"
  parentPrefixSelector: # The keys and values are case-sensitive
    environment: "PostProduction"
    poolName: "Pool 1"

This is quite misleading. Do you think there is an easy way to enhance this?

We actually throw an error indicating this situation, please see the code snippet below:

func (r *NetboxClient) GetTenantDetails(name string) (*models.Tenant, error) {
	request := tenancy.NewTenancyTenantsListParams().WithName(&name)
	response, err := r.Tenancy.TenancyTenantsList(request, nil)
	if err != nil {
		return nil, utils.NetboxError("failed to fetch Tenant details", err)
	}
	if len(response.Payload.Results) == 0 {
		return nil, utils.NetboxNotFoundError("tenant '" + name + "'")
	}

	return &models.Tenant{
		Id:   response.Payload.Results[0].ID,
		Slug: *response.Payload.Results[0].Slug,
		Name: *response.Payload.Results[0].Name,
	}, nil
}

I think this is related to the discussion this morning - we need to surface the underlying errors properly to the user. As in this case, we did throw an error clearly specifying the problem, but it's not visible to the user.

@henrybear327
Copy link
Collaborator Author

Under some circumstances the parent prefix will be computed but it's already exhausted. Even if you add more prefixes, the system will not recover and compute/chose a new parent prefix. Example describe:

Name:         prefixclaim-customfields-sample-09
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  netbox.dev/v1
Kind:         PrefixClaim
Metadata:
  Creation Timestamp:  2024-11-12T14:26:55Z
  Generation:          1
  Resource Version:    4212
  UID:                 a3b68d89-6c66-4d63-8b39-c95dd9c326b5
Spec:
  Parent Prefix Selector:
    Environment:       PostProduction
    Pool Name:         Pool 254
  Prefix Length:       /30
  Preserve In Netbox:  false
  Tenant:              Dunder-Mifflin, Inc.
Status:
  Conditions:
    Last Transition Time:  2024-11-12T14:27:11Z
    Message:               The parent prefix was computed successfully. parentPrefix is computed: 1.254.0.0/27
    Reason:                ParentPrefixComputed
    Status:                True
    Type:                  ParentPrefixComputed
    Last Transition Time:  2024-11-12T14:28:13Z
    Message:               Failed to fetch new Prefix from NetBox. parent prefix exhausted
    Reason:                PrefixCRNotCreated
    Status:                False
    Type:                  PrefixAssigned
  Parent Prefix:           1.254.0.0/27
Events:
  Type     Reason                    Age    From                     Message
  ----     ------                    ----   ----                     -------
  Normal   ParentPrefixComputed      4m41s  prefix-claim-controller  The parent prefix was computed successfully. parentPrefix is computed: 1.254.0.0/27
  Warning  PrefixCRNotCreated        3m39s  prefix-claim-controller  Failed to fetch new Prefix from NetBox. parent prefix exhausted
  Warning  FailedToLockParentPrefix  2m     prefix-claim-controller  failed to lock parent prefix 1.254.0.0/27

Could we fix this by removing .status.parentPrefix if we come to the condition that has the message "parent prefix exhausted"?

Quick question, what's the desired behavior?

IMO, prefix exhaustion might be a temporary issue, so we should requeue and keep trying.

WDYT?

@jstudler
Copy link
Contributor

Under some circumstances the parent prefix will be computed but it's already exhausted. Even if you add more prefixes, the system will not recover and compute/chose a new parent prefix. Example describe:

Name:         prefixclaim-customfields-sample-09
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  netbox.dev/v1
Kind:         PrefixClaim
Metadata:
  Creation Timestamp:  2024-11-12T14:26:55Z
  Generation:          1
  Resource Version:    4212
  UID:                 a3b68d89-6c66-4d63-8b39-c95dd9c326b5
Spec:
  Parent Prefix Selector:
    Environment:       PostProduction
    Pool Name:         Pool 254
  Prefix Length:       /30
  Preserve In Netbox:  false
  Tenant:              Dunder-Mifflin, Inc.
Status:
  Conditions:
    Last Transition Time:  2024-11-12T14:27:11Z
    Message:               The parent prefix was computed successfully. parentPrefix is computed: 1.254.0.0/27
    Reason:                ParentPrefixComputed
    Status:                True
    Type:                  ParentPrefixComputed
    Last Transition Time:  2024-11-12T14:28:13Z
    Message:               Failed to fetch new Prefix from NetBox. parent prefix exhausted
    Reason:                PrefixCRNotCreated
    Status:                False
    Type:                  PrefixAssigned
  Parent Prefix:           1.254.0.0/27
Events:
  Type     Reason                    Age    From                     Message
  ----     ------                    ----   ----                     -------
  Normal   ParentPrefixComputed      4m41s  prefix-claim-controller  The parent prefix was computed successfully. parentPrefix is computed: 1.254.0.0/27
  Warning  PrefixCRNotCreated        3m39s  prefix-claim-controller  Failed to fetch new Prefix from NetBox. parent prefix exhausted
  Warning  FailedToLockParentPrefix  2m     prefix-claim-controller  failed to lock parent prefix 1.254.0.0/27

Could we fix this by removing .status.parentPrefix if we come to the condition that has the message "parent prefix exhausted"?

Quick question, what's the desired behavior?

IMO, prefix exhaustion might be a temporary issue, so we should requeue and keep trying.

WDYT?

The desired behavior is that the Operator can recover from this case. Yes, we should assume that prefix exhaustion is a temporary issue only. The behaviour I've seen above was deadlock: The parent prefix was chosen but it had no space left, but there were other parent prefix candidates that weren't chosen. So I think removing .status.parentPrefix where the controllers writes the condition/event that contains the message "parent prefix exhausted" should fix this.

@jstudler
Copy link
Contributor

If you use a PrefixClaim using parentPrefixSelector with an invalid Tenant (a Tenant that doesn't exist in NetBox), the Event will show:

The parent prefix was not able to be computed. no parent prefix can be obtained with the query conditions set in ParentPrefixSelector, err = <nil>, number of candidates = 0

Spec example:

apiVersion: netbox.dev/v1
kind: PrefixClaim
metadata:
  name: prefixclaim-customfields-sample
spec:
  tenant: "MY_TENANT that doesn't exist"
  site: "DM-Akron"
  preserveInNetbox: true
  prefixLength: "/31"
  parentPrefixSelector: # The keys and values are case-sensitive
    environment: "PostProduction"
    poolName: "Pool 1"

This is quite misleading. Do you think there is an easy way to enhance this?

We actually throw an error indicating this situation, please see the code snippet below:

func (r *NetboxClient) GetTenantDetails(name string) (*models.Tenant, error) {
	request := tenancy.NewTenancyTenantsListParams().WithName(&name)
	response, err := r.Tenancy.TenancyTenantsList(request, nil)
	if err != nil {
		return nil, utils.NetboxError("failed to fetch Tenant details", err)
	}
	if len(response.Payload.Results) == 0 {
		return nil, utils.NetboxNotFoundError("tenant '" + name + "'")
	}

	return &models.Tenant{
		Id:   response.Payload.Results[0].ID,
		Slug: *response.Payload.Results[0].Slug,
		Name: *response.Payload.Results[0].Name,
	}, nil
}

I think this is related to the discussion this morning - we need to surface the underlying errors properly to the user. As in this case, we did throw an error clearly specifying the problem, but it's not visible to the user.

But I think the event is wrong. The event indicates "The parent prefix was not able to be computed" when in fact the problem is that the user set an invalid tenant. Could we create an event in GetTenantDetails for the case that the tenant cannot be found?

Co-authored-by: Sergio <[email protected]>
h := generatePrefixRestorationHash(prefixClaim)
canBeRestored, err := r.NetboxClient.RestoreExistingPrefixByHash(h)
if err != nil {
return ctrl.Result{Requeue: true}, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the log message/condition update is missing in case of an error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added as requested! Can you see if this is what you are thinking about :)
Thanks

@henrybear327
Copy link
Collaborator Author

If you use a PrefixClaim using parentPrefixSelector with an invalid Tenant (a Tenant that doesn't exist in NetBox), the Event will show:

The parent prefix was not able to be computed. no parent prefix can be obtained with the query conditions set in ParentPrefixSelector, err = <nil>, number of candidates = 0

Spec example:

apiVersion: netbox.dev/v1
kind: PrefixClaim
metadata:
  name: prefixclaim-customfields-sample
spec:
  tenant: "MY_TENANT that doesn't exist"
  site: "DM-Akron"
  preserveInNetbox: true
  prefixLength: "/31"
  parentPrefixSelector: # The keys and values are case-sensitive
    environment: "PostProduction"
    poolName: "Pool 1"

This is quite misleading. Do you think there is an easy way to enhance this?

We actually throw an error indicating this situation, please see the code snippet below:

func (r *NetboxClient) GetTenantDetails(name string) (*models.Tenant, error) {
	request := tenancy.NewTenancyTenantsListParams().WithName(&name)
	response, err := r.Tenancy.TenancyTenantsList(request, nil)
	if err != nil {
		return nil, utils.NetboxError("failed to fetch Tenant details", err)
	}
	if len(response.Payload.Results) == 0 {
		return nil, utils.NetboxNotFoundError("tenant '" + name + "'")
	}

	return &models.Tenant{
		Id:   response.Payload.Results[0].ID,
		Slug: *response.Payload.Results[0].Slug,
		Name: *response.Payload.Results[0].Name,
	}, nil
}

I think this is related to the discussion this morning - we need to surface the underlying errors properly to the user. As in this case, we did throw an error clearly specifying the problem, but it's not visible to the user.

But I think the event is wrong. The event indicates "The parent prefix was not able to be computed" when in fact the problem is that the user set an invalid tenant. Could we create an event in GetTenantDetails for the case that the tenant cannot be found?

Tracking under bug #132

@henrybear327
Copy link
Collaborator Author

Under some circumstances the parent prefix will be computed but it's already exhausted. Even if you add more prefixes, the system will not recover and compute/chose a new parent prefix. Example describe:

Name:         prefixclaim-customfields-sample-09
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  netbox.dev/v1
Kind:         PrefixClaim
Metadata:
  Creation Timestamp:  2024-11-12T14:26:55Z
  Generation:          1
  Resource Version:    4212
  UID:                 a3b68d89-6c66-4d63-8b39-c95dd9c326b5
Spec:
  Parent Prefix Selector:
    Environment:       PostProduction
    Pool Name:         Pool 254
  Prefix Length:       /30
  Preserve In Netbox:  false
  Tenant:              Dunder-Mifflin, Inc.
Status:
  Conditions:
    Last Transition Time:  2024-11-12T14:27:11Z
    Message:               The parent prefix was computed successfully. parentPrefix is computed: 1.254.0.0/27
    Reason:                ParentPrefixComputed
    Status:                True
    Type:                  ParentPrefixComputed
    Last Transition Time:  2024-11-12T14:28:13Z
    Message:               Failed to fetch new Prefix from NetBox. parent prefix exhausted
    Reason:                PrefixCRNotCreated
    Status:                False
    Type:                  PrefixAssigned
  Parent Prefix:           1.254.0.0/27
Events:
  Type     Reason                    Age    From                     Message
  ----     ------                    ----   ----                     -------
  Normal   ParentPrefixComputed      4m41s  prefix-claim-controller  The parent prefix was computed successfully. parentPrefix is computed: 1.254.0.0/27
  Warning  PrefixCRNotCreated        3m39s  prefix-claim-controller  Failed to fetch new Prefix from NetBox. parent prefix exhausted
  Warning  FailedToLockParentPrefix  2m     prefix-claim-controller  failed to lock parent prefix 1.254.0.0/27

Could we fix this by removing .status.parentPrefix if we come to the condition that has the message "parent prefix exhausted"?

Quick question, what's the desired behavior?
IMO, prefix exhaustion might be a temporary issue, so we should requeue and keep trying.
WDYT?

The desired behavior is that the Operator can recover from this case. Yes, we should assume that prefix exhaustion is a temporary issue only. The behaviour I've seen above was deadlock: The parent prefix was chosen but it had no space left, but there were other parent prefix candidates that weren't chosen. So I think removing .status.parentPrefix where the controllers writes the condition/event that contains the message "parent prefix exhausted" should fix this.

Tracking under bug #131

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement dynamic selection of parent prefix from a set of custom fields
5 participants