-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent duplicated rhel_host resources #54
Prevent duplicated rhel_host resources #54
Conversation
04871f8
to
3b89a21
Compare
We do not want local_resource_id at the metadata level. What makes the resource id unique is: reporter_type:reporter_instance_id:local_resource_id. With this "URN like identifier", you do not want to have a 2nd reporting of this id, i.e. the same reporter telling inventory about the same resource the 2nd, or Nth time. For example, an ACM Hub cluster can have a managed cluster called Foo. It tells inventory about this. Another ACM Hub cluster can also have a managed cluster called Foo. ACM only requires the name to be unique with the scope of the hub cluster. Thus, by adding the instance_id, you have uniqueness. Also, multiple reporters can tell inventory about the same resource, e.g. ACM tells about a cluster with a local id of Foo. ACS tells about the same cluster but has a local id of Bar. These local id's are supported in lieu of the REST id (service generated) so that reporters do not need to change their schemas, etc in order to persist/correlate the REST id to their existing IDs. |
If I understand correctly, each resource_type will have an Id (URI) that will uniquely identify the resource in the inventory. Is that assumption correct? |
Let me see if i can better clarify. Inventory will generate a unique ID for each new resource instance. That is akin to a REST ID. Since there is only 1 per resource, that is part of the metadata. In order to avoid adopters having to persist this ID, esp in the initial bulk import, they can access the inventory resource by using an "alias" which is comprised of the reporter_type:reporter_instance_id:local_resource_id. This is all information that is persisted today by a reporter, with the exception of the reporter_instance_id which is obtained via the token. Thus, a reporter can pass this information in to get the instance. There can be multiple reporters for a single resource instance; each having their own unique reporter_type:reporter_instance_id:local_resource_id. Thus, for every inventory instance ID, there can be an array of reporters; thus, an array of reporter_type:reporter_instance_id:local_resource_id. This is not host specific. This will be the pattern for every resource type reported. In fact, for hosts there most likely will be only 1 reporter, i.e. HBI. |
This part is clear to me, thanks !
Ok so far, reporter_type:reporter_instance_id:local_resource_id is a reporter-type/which one/which resource
This is where I'm lost right now. Consider what you said before:
How do we exactly make that connection? As far as inventory is concerned, the following resource from reporter with instance_id ABC {
"rhel_host": {
"metadata": {
"workspace": "abc"
},
"reporter_data": {
"reporter_type": "REPORTER_TYPE_OCM",
"reporter_version": "0.1",
"local_resource_id": "foo",
"api_href": "www.example.com",
"console_href": "www.example.com"
}
}
} is different from the following resource from reporter with instance_id XYZ {
"rhel_host": {
"metadata": {
"workspace": "abc"
},
"reporter_data": {
"reporter_type": "REPORTER_TYPE_OCM",
"reporter_version": "0.1",
"local_resource_id": "bar",
"api_href": "www.example.com",
"console_href": "www.example.com"
}
}
} Right now they would create 2 separate resources in our system, each one having it's own entry in the database (2 rhel_hosts, 2 metadata, 2 reporter_data). If they are indeed the same resource reported from 2 different reporters how do we group them so that we have 1 rhel host, 1 metadata and 2 reporter_data? Is this even important right now? |
@randymgeorge Updated to reflect comments above. It doesn't implement any de-duplication whatsoever, but prevents the same reporter from registering the same RhelHost. edit: above should be general to cover all other resources. |
_, err := uc.repo.FindByID(ctx, resourceId) | ||
if err == nil { | ||
return nil, fmt.Errorf("rhel_host with local_resource_id: `%v` already exists for current reporter", resourceId.LocalResourceId) | ||
} else if !errors.Is(err, gorm.ErrRecordNotFound) { | ||
return nil, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there also a unique index defined for resource id, reporter type, reporter id?
Otherwise this will be prone to race conditions and you could still end up with duplicate resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there is one at the Reporter level
https://github.com/project-kessel/inventory-api/blob/main/internal/biz/common/common.go#L34-L63
Although we are missing one for the metadata_id
to allow the reporter to share the same resource_id
among different types of resources.
Code is already so different, but something similar will be implemented. Closing this one |
PR Template:
Describe your changes
local_resource_id
. Duplicated resources across different resource types is allowed.Copied the
LocalResourceId
toMetadata
but wondering if it should be removed from theReporterData
Ticket reference (if applicable)
Fixes #
Checklist
Are the agreed upon acceptance criteria fulfilled?
Was the 4-eye-principle applied? (async PR review, pairing, ensembling)
Do your changes have passing automated tests and sufficient observability?
Are the work steps you introduced repeatable by others, either through automation or documentation?
The Changes were automatically built, tested, and - if needed, behind a feature flag - deployed to our production environment. (Please check this when the new deployment is done and you could verify it.)
Are the agreed upon coding/architectural practices applied?
Are security needs fullfilled? (e.g. no internal URL)
Is the corresponding Ticket in the right state? (should be on "review" now, put to done when this change made it to production)
For changes to the public API / code dependencies: Was the whole team (or a sufficient amount of ppl) able to review?