Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent duplicated rhel_host resources #54

Closed

Conversation

josejulio
Copy link
Contributor

PR Template:

Describe your changes

  • Assuming we don't want to have duplicated resources that have the same local_resource_id. Duplicated resources across different resource types is allowed.

Copied the LocalResourceId to Metadata but wondering if it should be removed from the ReporterData

Ticket reference (if applicable)

Fixes #

Checklist

  • Are the agreed upon acceptance criteria fulfilled?

  • Was the 4-eye-principle applied? (async PR review, pairing, ensembling)

  • Do your changes have passing automated tests and sufficient observability?

  • Are the work steps you introduced repeatable by others, either through automation or documentation?

    • If automation is possible but not done due to other constraints, a ticket to the tech debt sprint is added
    • An SOP (Standard Operating Procedure) was created
  • The Changes were automatically built, tested, and - if needed, behind a feature flag - deployed to our production environment. (Please check this when the new deployment is done and you could verify it.)

  • Are the agreed upon coding/architectural practices applied?

  • Are security needs fullfilled? (e.g. no internal URL)

  • Is the corresponding Ticket in the right state? (should be on "review" now, put to done when this change made it to production)

  • For changes to the public API / code dependencies: Was the whole team (or a sufficient amount of ppl) able to review?

@josejulio josejulio requested a review from csams August 21, 2024 21:29
@randymgeorge
Copy link
Contributor

We do not want local_resource_id at the metadata level. What makes the resource id unique is: reporter_type:reporter_instance_id:local_resource_id. With this "URN like identifier", you do not want to have a 2nd reporting of this id, i.e. the same reporter telling inventory about the same resource the 2nd, or Nth time.

For example, an ACM Hub cluster can have a managed cluster called Foo. It tells inventory about this. Another ACM Hub cluster can also have a managed cluster called Foo. ACM only requires the name to be unique with the scope of the hub cluster. Thus, by adding the instance_id, you have uniqueness.

Also, multiple reporters can tell inventory about the same resource, e.g. ACM tells about a cluster with a local id of Foo. ACS tells about the same cluster but has a local id of Bar. These local id's are supported in lieu of the REST id (service generated) so that reporters do not need to change their schemas, etc in order to persist/correlate the REST id to their existing IDs.

@josejulio
Copy link
Contributor Author

If I understand correctly, each resource_type will have an Id (URI) that will uniquely identify the resource in the inventory.
This URI can be generated from all the input data and in the case of hosts it is <reporter_data.reporter_type>:<reporter_data.resourceId_alias> but others will vary (adding more data into the mix).

Is that assumption correct?

@randymgeorge
Copy link
Contributor

Let me see if i can better clarify. Inventory will generate a unique ID for each new resource instance. That is akin to a REST ID. Since there is only 1 per resource, that is part of the metadata.

In order to avoid adopters having to persist this ID, esp in the initial bulk import, they can access the inventory resource by using an "alias" which is comprised of the reporter_type:reporter_instance_id:local_resource_id. This is all information that is persisted today by a reporter, with the exception of the reporter_instance_id which is obtained via the token. Thus, a reporter can pass this information in to get the instance.

There can be multiple reporters for a single resource instance; each having their own unique reporter_type:reporter_instance_id:local_resource_id. Thus, for every inventory instance ID, there can be an array of reporters; thus, an array of reporter_type:reporter_instance_id:local_resource_id.

This is not host specific. This will be the pattern for every resource type reported. In fact, for hosts there most likely will be only 1 reporter, i.e. HBI.

@josejulio
Copy link
Contributor Author

Let me see if i can better clarify. Inventory will generate a unique ID for each new resource instance. That is akin to a REST ID. Since there is only 1 per resource, that is part of the metadata.

This part is clear to me, thanks !

In order to avoid adopters having to persist this ID, esp in the initial bulk import, they can access the inventory resource by using an "alias" which is comprised of the reporter_type:reporter_instance_id:local_resource_id. This is all information that is persisted today by a reporter, with the exception of the reporter_instance_id which is obtained via the token. Thus, a reporter can pass this information in to get the instance.

Ok so far, reporter_type:reporter_instance_id:local_resource_id is a reporter-type/which one/which resource

There can be multiple reporters for a single resource instance; each having their own unique reporter_type:reporter_instance_id:local_resource_id. Thus, for every inventory instance ID, there can be an array of reporters; thus, an array of reporter_type:reporter_instance_id:local_resource_id.

This is where I'm lost right now.

Consider what you said before:

multiple reporters can tell inventory about the same resource, e.g. ACM tells about a cluster with a local id of Foo. ACS tells about the same cluster but has a local id of Bar

How do we exactly make that connection?

As far as inventory is concerned, the following resource from reporter with instance_id ABC

{
  "rhel_host": {
    "metadata": {
      "workspace": "abc"
    },
    "reporter_data": {
      "reporter_type": "REPORTER_TYPE_OCM",
      "reporter_version": "0.1",
      "local_resource_id": "foo",
      "api_href": "www.example.com",
      "console_href": "www.example.com"
    }
  }
}

is different from the following resource from reporter with instance_id XYZ

{
  "rhel_host": {
    "metadata": {
      "workspace": "abc"
    },
    "reporter_data": {
      "reporter_type": "REPORTER_TYPE_OCM",
      "reporter_version": "0.1",
      "local_resource_id": "bar",
      "api_href": "www.example.com",
      "console_href": "www.example.com"
    }
  }
}

Right now they would create 2 separate resources in our system, each one having it's own entry in the database (2 rhel_hosts, 2 metadata, 2 reporter_data).

If they are indeed the same resource reported from 2 different reporters how do we group them so that we have 1 rhel host, 1 metadata and 2 reporter_data? Is this even important right now?

image

@josejulio
Copy link
Contributor Author

josejulio commented Aug 22, 2024

@randymgeorge Updated to reflect comments above. It doesn't implement any de-duplication whatsoever, but prevents the same reporter from registering the same RhelHost.

edit: above should be general to cover all other resources.

@jmelis jmelis marked this pull request as draft September 18, 2024 13:36
Comment on lines +45 to +50
_, err := uc.repo.FindByID(ctx, resourceId)
if err == nil {
return nil, fmt.Errorf("rhel_host with local_resource_id: `%v` already exists for current reporter", resourceId.LocalResourceId)
} else if !errors.Is(err, gorm.ErrRecordNotFound) {
return nil, err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there also a unique index defined for resource id, reporter type, reporter id?

Otherwise this will be prone to race conditions and you could still end up with duplicate resources.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is one at the Reporter level

https://github.com/project-kessel/inventory-api/blob/main/internal/biz/common/common.go#L34-L63

Although we are missing one for the metadata_id to allow the reporter to share the same resource_id among different types of resources.

#171

@josejulio
Copy link
Contributor Author

Code is already so different, but something similar will be implemented. Closing this one

@josejulio josejulio closed this Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants