Prevent duplicated rhel_host resources #54

josejulio · 2024-08-21T20:35:43Z

PR Template:

Describe your changes

Assuming we don't want to have duplicated resources that have the same local_resource_id. Duplicated resources across different resource types is allowed.

Copied the LocalResourceId to Metadata but wondering if it should be removed from the ReporterData

Ticket reference (if applicable)

Fixes #

Checklist

- fix tests

randymgeorge · 2024-08-21T21:40:34Z

We do not want local_resource_id at the metadata level. What makes the resource id unique is: reporter_type:reporter_instance_id:local_resource_id. With this "URN like identifier", you do not want to have a 2nd reporting of this id, i.e. the same reporter telling inventory about the same resource the 2nd, or Nth time.

For example, an ACM Hub cluster can have a managed cluster called Foo. It tells inventory about this. Another ACM Hub cluster can also have a managed cluster called Foo. ACM only requires the name to be unique with the scope of the hub cluster. Thus, by adding the instance_id, you have uniqueness.

Also, multiple reporters can tell inventory about the same resource, e.g. ACM tells about a cluster with a local id of Foo. ACS tells about the same cluster but has a local id of Bar. These local id's are supported in lieu of the REST id (service generated) so that reporters do not need to change their schemas, etc in order to persist/correlate the REST id to their existing IDs.

josejulio · 2024-08-22T14:54:58Z

If I understand correctly, each resource_type will have an Id (URI) that will uniquely identify the resource in the inventory.
This URI can be generated from all the input data and in the case of hosts it is <reporter_data.reporter_type>:<reporter_data.resourceId_alias> but others will vary (adding more data into the mix).

Is that assumption correct?

randymgeorge · 2024-08-22T16:10:18Z

Let me see if i can better clarify. Inventory will generate a unique ID for each new resource instance. That is akin to a REST ID. Since there is only 1 per resource, that is part of the metadata.

In order to avoid adopters having to persist this ID, esp in the initial bulk import, they can access the inventory resource by using an "alias" which is comprised of the reporter_type:reporter_instance_id:local_resource_id. This is all information that is persisted today by a reporter, with the exception of the reporter_instance_id which is obtained via the token. Thus, a reporter can pass this information in to get the instance.

There can be multiple reporters for a single resource instance; each having their own unique reporter_type:reporter_instance_id:local_resource_id. Thus, for every inventory instance ID, there can be an array of reporters; thus, an array of reporter_type:reporter_instance_id:local_resource_id.

This is not host specific. This will be the pattern for every resource type reported. In fact, for hosts there most likely will be only 1 reporter, i.e. HBI.

josejulio · 2024-08-22T16:51:30Z

Let me see if i can better clarify. Inventory will generate a unique ID for each new resource instance. That is akin to a REST ID. Since there is only 1 per resource, that is part of the metadata.

This part is clear to me, thanks !

In order to avoid adopters having to persist this ID, esp in the initial bulk import, they can access the inventory resource by using an "alias" which is comprised of the reporter_type:reporter_instance_id:local_resource_id. This is all information that is persisted today by a reporter, with the exception of the reporter_instance_id which is obtained via the token. Thus, a reporter can pass this information in to get the instance.

Ok so far, reporter_type:reporter_instance_id:local_resource_id is a reporter-type/which one/which resource

There can be multiple reporters for a single resource instance; each having their own unique reporter_type:reporter_instance_id:local_resource_id. Thus, for every inventory instance ID, there can be an array of reporters; thus, an array of reporter_type:reporter_instance_id:local_resource_id.

This is where I'm lost right now.

Consider what you said before:

multiple reporters can tell inventory about the same resource, e.g. ACM tells about a cluster with a local id of Foo. ACS tells about the same cluster but has a local id of Bar

How do we exactly make that connection?

As far as inventory is concerned, the following resource from reporter with instance_id ABC

{
  "rhel_host": {
    "metadata": {
      "workspace": "abc"
    },
    "reporter_data": {
      "reporter_type": "REPORTER_TYPE_OCM",
      "reporter_version": "0.1",
      "local_resource_id": "foo",
      "api_href": "www.example.com",
      "console_href": "www.example.com"
    }
  }
}

is different from the following resource from reporter with instance_id XYZ

{
  "rhel_host": {
    "metadata": {
      "workspace": "abc"
    },
    "reporter_data": {
      "reporter_type": "REPORTER_TYPE_OCM",
      "reporter_version": "0.1",
      "local_resource_id": "bar",
      "api_href": "www.example.com",
      "console_href": "www.example.com"
    }
  }
}

Right now they would create 2 separate resources in our system, each one having it's own entry in the database (2 rhel_hosts, 2 metadata, 2 reporter_data).

If they are indeed the same resource reported from 2 different reporters how do we group them so that we have 1 rhel host, 1 metadata and 2 reporter_data? Is this even important right now?

josejulio · 2024-08-22T19:43:54Z

@randymgeorge Updated to reflect comments above. It doesn't implement any de-duplication whatsoever, but prevents the same reporter from registering the same RhelHost.

edit: above should be general to cover all other resources.

alechenninger · 2024-10-01T12:01:32Z

internal/biz/hosts/hosts.go

+	_, err := uc.repo.FindByID(ctx, resourceId)
+	if err == nil {
+		return nil, fmt.Errorf("rhel_host with local_resource_id: `%v` already exists for current reporter", resourceId.LocalResourceId)
+	} else if !errors.Is(err, gorm.ErrRecordNotFound) {
+		return nil, err
+	}


Is there also a unique index defined for resource id, reporter type, reporter id?

Otherwise this will be prone to race conditions and you could still end up with duplicate resources.

Yes, there is one at the Reporter level

https://github.com/project-kessel/inventory-api/blob/main/internal/biz/common/common.go#L34-L63

Although we are missing one for the metadata_id to allow the reporter to share the same resource_id among different types of resources.

#171

josejulio · 2024-10-11T18:54:26Z

Code is already so different, but something similar will be implemented. Closing this one

josejulio added 6 commits August 21, 2024 15:16

Prevent from adding repeated host with the same local_resource_id

6955e86

Adding missingcode

b168034

More missing code

28cdee8

unused imports

a8904c5

self-review

068c3c0

- Revert time changes to prevent making this pr bigger ahn required

3b89a21

- fix tests

josejulio force-pushed the prevent-duplicated-resources branch from 04871f8 to 3b89a21 Compare August 21, 2024 21:17

josejulio requested a review from csams August 21, 2024 21:29

Update to use the triplet local-resource-id,reporter-type,reporter-id

1badc4f

jmelis marked this pull request as draft September 18, 2024 13:36

alechenninger reviewed Oct 1, 2024

View reviewed changes

josejulio closed this Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent duplicated rhel_host resources #54

Prevent duplicated rhel_host resources #54

josejulio commented Aug 21, 2024

randymgeorge commented Aug 21, 2024

josejulio commented Aug 22, 2024

randymgeorge commented Aug 22, 2024

josejulio commented Aug 22, 2024

josejulio commented Aug 22, 2024 •

edited

Loading

alechenninger Oct 1, 2024

josejulio Oct 1, 2024

josejulio commented Oct 11, 2024

Prevent duplicated rhel_host resources #54

Prevent duplicated rhel_host resources #54

Conversation

josejulio commented Aug 21, 2024

PR Template:

Describe your changes

Ticket reference (if applicable)

Checklist

randymgeorge commented Aug 21, 2024

josejulio commented Aug 22, 2024

randymgeorge commented Aug 22, 2024

josejulio commented Aug 22, 2024

josejulio commented Aug 22, 2024 • edited Loading

alechenninger Oct 1, 2024

Choose a reason for hiding this comment

josejulio Oct 1, 2024

Choose a reason for hiding this comment

josejulio commented Oct 11, 2024

josejulio commented Aug 22, 2024 •

edited

Loading