MGMT-17997: Resource server doesn't start on deployment #101

irinamihai · 2024-05-31T02:16:40Z

Description:

add backend-token-file parameter to the resource-server
update the operator resourceServer to not assume the open-cluster-management namespace for ACM and dynamically obtain the searchAPI
restructure the O2IMS CRD to have a separate structure that holds the configuration for each server
- each server structure contains a new ServerConfig struct meant to hold parameters common to all the servers
add the parameters used to start the servers in the CR status

The operator was tested locally using a personal quay image that included the resource server changes.

openshift-ci-robot · 2024-05-31T02:16:43Z

@irinamihai: This pull request references MGMT-17997 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.17.0" version, but no target version was set.

In response to this:

Description:

add backend-token-file parameter to the resource-server

update the operator to dynamically:

determine the search API from the IngressHost

use the backend-token if specified in the ORANO2IMS CR and if not, use the backend-token-file

ToDo:

add conditions for the resource ser-ver

better operator error management

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

irinamihai · 2024-05-31T02:16:52Z

/hold

openshift-ci-robot · 2024-05-31T02:19:51Z

@irinamihai: This pull request references MGMT-17997 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.17.0" version, but no target version was set.

In response to this:

Description:

add backend-token-file parameter to the resource-server

update the operator to dynamically:

determine the search API from the IngressHost

use the backend-token if specified in the ORANO2IMS CR and if not, use the backend-token-file

ToDo:

add conditions for the resource ser-ver

better operator error management

update unit tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2024-05-31T02:20:00Z

@irinamihai: This pull request references MGMT-17997 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.17.0" version, but no target version was set.

In response to this:

Description:

add backend-token-file parameter to the resource-server

update the operator to dynamically:

determine the search API from the IngressHost

use the backend-token if specified in the ORANO2IMS CR and if not, use the backend-token-file

ToDo:

add conditions for the resource ser-ver

better operator error management

update unit tests & fix lint

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

irinamihai · 2024-05-31T22:40:13Z

internal/controllers/orano2ims_controller.go

+		)
+		nextReconcile = ctrl.Result{RequeueAfter: 30 * time.Second}
+	}
+


openshift-ci-robot · 2024-05-31T22:44:34Z

@irinamihai: This pull request references MGMT-17997 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.17.0" version, but no target version was set.

In response to this:

Description:

add backend-token-file parameter to the resource-server

update the operator to dynamically:

determine the search API from the IngressHost

use the backend-token if specified in the ORANO2IMS CR and if not, use the backend-token-file

ToDo:

update unit tests & fix lint

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2024-06-03T22:37:44Z

@irinamihai: This pull request references MGMT-17997 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.17.0" version, but no target version was set.

In response to this:

Description:

add backend-token-file parameter to the resource-server

update the operator resourceServer to not assume the
open-cluster-management namespace for ACM and dynamically
obtain

ToDo:

add separate CRD structure for the ResourceServer & update the
code accordingly

update CRD conditions to account for different error scenarios

update tests

fix lint

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

jhernand · 2024-06-04T06:47:08Z

api/v1alpha1/orano2ims_types.go

+	//+optional
+	BackendToken string `json:"backendToken,omitempty"`
+}
+


Can we use an structure like this for all the other servers as well instead of the boolean that we use today? I'd suggest that we create a base structure containing only the Enabled boolean, and then we create structs specific for each server that embed it. For example:

type ServerConfig struct { // Enabled indicates if the server should be started. // // +kubebuilder:default=true Enabled bool `json:"enabled"` } // ResourceServerConfig contains the configuration for the resource server. type ResourceServerConfig struct { ServerConfig `json:",inline"` //+optional BackendURL string `json:"backendURL,omitempty"` //+optional BackendToken string `json:"backendToken,omitempty"` } // MetdataServerConfig contains the configuration for the metadata server. type MetadataServerConfig struct { ServerConfig `json:",inline"` } // Same for all the other servers.

This will give us room to add other common parameters in the future. For example, at some point I would like to add logging settings:

type ServerConfig struct { // Enabled indicates if the server should be started. // // +kubebuilder:default=true Enabled bool `json:"enabled"` // LogLevel indicates the log level. // // +kubebuilder:default=info LogLevel string `json:"logLevel"` }

Yes, that was my plan as well. Initially I was thinking to leave the common settings directly in spec, but we can have another structure if you prefer.

jhernand · 2024-06-04T07:23:23Z

internal/cmd/server/common.go

+	"strings"
+)
+
+func processBackendToken(


If we are going to extract this logic to a common place I'd suggest we do it for both adding the flags and getting them, something similar to what we do for the network listening flags. For example, we could have a function to add the flags to a flag set:

// AddTokenFlags adds to the given flagset the flags needed to configure a token ... func AddTokenFlags(set *pflag.FlagSet, name string) { ... }

And then a function to get the value of the token from a flag set:

// GetTokenFlag gets the value of the a token flag ... func GetTokenFlag(set *pflag.FlagSet, name string) (result string, err error) { ... }

That way if we ever need to change how we add or get these flags it will all be in the same place.

Updated. @jhernand , kindly confirm this is what you had in mind.

Looks good. I think it can be even better if we pass the pflag.FlagSet object to the GetTokenFlag function because then that function can also extract the values of the flags from that flag set.

jhernand · 2024-06-04T07:31:43Z

internal/cmd/server/common.go

+			slog.String("!token", backendToken),
+			slog.String("token_file", backendTokenFile),
+		)
+		return "", errors.New("backendToken and backendTokenFile both provided")


If we are going to report issues related to flags returning errors then I think they should be more explicit about what caused the error, more similar to what we are writing to the log. For example:

fmt.Errorf( "backend token flag '%s' and token file flag '%s' have both been provided, "+ "but they are incompatible", backendTokenFlagName, backendTokenFileFlagName, )

jhernand · 2024-06-04T07:33:52Z

internal/cmd/server/common.go

+				slog.String("file", backendTokenFile),
+				slog.String("error", err.Error()),
+			)
+			return "", errors.New("failed to read backend token file")


Here I think we should include in the error message the name of the file and the original error:

fmt.Errorf( "failed to read backend token file '%s': %w", backendTokenFile, err, )

jhernand · 2024-06-04T07:34:31Z

internal/cmd/server/common.go

+	// Check that we have a token:
+	if backendToken == "" {
+		logger.ErrorContext(ctx, "Backend token or token file must be provided")
+		return "", errors.New("no token provided")


Try to explain include in the error messages the names of the flags that should be used.

jhernand · 2024-06-04T07:36:36Z

internal/controllers/orano2ims_controller.go

@@ -115,6 +115,17 @@ func (t *reconcilerTask) run(ctx context.Context) (nextReconcile ctrl.Result, er
 	// Set the default reconcile time to 5 minutes.
 	nextReconcile = ctrl.Result{RequeueAfter: 5 * time.Minute}

+	// Validate the CR.


Would it be possible to do this validation with a webhook so that we reject the object creation instead of failing to reconcile it? Not in this pull request, but consider it for the future.

I will look into it.

jhernand · 2024-06-04T07:47:43Z

internal/controllers/utils/utils.go

+			return "", fmt.Errorf("multiclusterengine labels does not contain the installer.namespace key")
+		}
+		return acmNamespace.(string), nil
+	}


It may not be worth here, but when you need to access information nested inside an unstructured object like here we are using in other places the jq library. It could be something like this:

// Create the jq tool: jqTool := ... // Extract the namespace: var namespace string err = jqTool.Evaluate( `.metadata.labels["installer.namespace"]`, &namespace, ) if err != nil { .... }

As I said this is probably not worth for this case, but consider it in the future if the location of the nested information is complicated or may change in the future.

jhernand · 2024-06-04T07:50:24Z

internal/controllers/utils/utils.go

+	return searchAPI, nil
+}
+
+func BuildServerContainerArgs(ctx context.Context, c client.Client, orano2ims *oranv1alpha1.ORANO2IMS,


The number of parameters of this function is growing, as usually happens, at some point we should consider making it a method of some type, so that we can have some of the parameters as fields of the type and avoid adding parameters. I just want to avoid things like this, which happen over time (from real code):

clusterApi = NewManager(getDefaultConfig(), common.GetTestLog(), db, commontesting.GetDummyNotificationStream(ctrl), nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, false)

Agreed. Noted.

danielerez · 2024-06-04T11:53:14Z

internal/cmd/server/start_resource_server.go

@@ -502,9 +512,6 @@ func (c *ResourceServerCommand) generateSearchApiUrl(backendURL string) (string,
 	// Split URL address
 	hostArr := strings.Split(u.Host, ".")

-	// Replace with search API prefix
-	hostArr[0] = searchApiUrlPrefix


Will it work when running the resource-server manually (i.e. without the operator)?
I mean, IIUC, after this change the search-api prefix is added only using the operator.
We probably need to add another flag the specify the prefix or the namespace? e.g. --search-api-namespace

The search-api is the backendURL of the resource-server which would be provided in full when run manually.
This is how I've been running the resource server:
./oran-o2ims start resource-server --log-level=debug --log-file=stdout --api-listener-address=localhost:8002 --cloud-id=123 --backend-url="${BACKEND_URL}" --backend-token-file=./token
where BACKEND_URL is https://search-api-open-cluster-management.apps.lab.karmalabs.corp. There is no need to override with the search-api prefix.
@danielerez , do you agree?

When running with the operator, the operator will try to dynamically construct all the search-api URL.

Oh, so maybe we should just rename the BACKEND_URL in launch.json to something like SEARCH_API_URL.
I.e. the current code was added in order to avoid changing the url when debugging a different server. It's not crucial of course, but should be easier when running the server locally.
Any way, can you please update the readme with details about the behaviour (to note that 'search-api' prefix is needed in the url).

I left everything with BACKEND_URL to keep it consistent. Maybe I can add some extra explanation in the CRD about the resource server requiring the search API.

Yeah, sounds good. Also in the readme file to emphasis that the prefix is required.

jhernand · 2024-06-05T10:00:23Z

internal/cmd/server/flags.go

+func GetTokenFlag(
+	ctx context.Context,
+	backendToken string, backendTokenFile string,
+	logger *slog.Logger) (string, error) {


Can we pass here the *pflag.FlagSet object and have here also the code that extracts the values of the flags?

jhernand · 2024-06-05T10:04:05Z

internal/cmd/server/flags_test.go

+	. "github.com/onsi/gomega"
+)
+
+var _ = Describe("Flags", func() {


Great to have these tests, thanks!

jhernand · 2024-06-05T10:05:50Z

internal/cmd/server/flags_test.go

+        Level: slog.LevelDebug,
+	}
+    handler := slog.NewJSONHandler(GinkgoWriter, options)
+    logger := slog.New(handler)


This should probably go inside a BeforeSuite block. It won't make a difference in this case, but in general if you put it here it will be executed when Ginkgo is reading the test suite, even if it isn't going to execute it. On the other hand the BeforeSuite block is executed only when the tests are going to be executed.

openshift-ci-robot · 2024-06-05T23:39:48Z

@irinamihai: This pull request references MGMT-17997 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.17.0" version, but no target version was set.

In response to this:

Description:

add backend-token-file parameter to the resource-server

update the operator resourceServer to not assume the
open-cluster-management namespace for ACM and dynamically
obtain

restructure the O2IMS CRD to have a separate structure that holds
the configuration for each server

each server structure contains a new ServerConfig struct
meant to hold parameters common to all the servers

ToDo:

add more unit tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

danielerez · 2024-06-06T10:48:30Z

.vscode/launch.json

-                                "--backend-url=${env:BACKEND_URL}",
-                                "--backend-token=${env:BACKEND_TOKEN}"
-                        ]
+                                "--backend-url=https://search-api-open-cluster-management.apps.lab.karmalabs.corp",


nit: remember to remove url/token

The resource server will still need these parameters. Only the O2IMS CR will have them optional.
I will cleanup this file though.

Description: - add backend-token-file parameter to the resource-server - update the operator resourceServer to not assume the open-cluster-management namespace for ACM and dynamically obtain the searchAPI - restructure the O2IMS CRD to have a separate structure that holds the configuration for each server * each server structure contains a new ServerConfig struct meant to hold parameters common to all the servers - add the parameters used to start the servers in the CR status

openshift-ci-robot · 2024-06-06T22:07:41Z

@irinamihai: This pull request references MGMT-17997 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.17.0" version, but no target version was set.

In response to this:

Description:

add backend-token-file parameter to the resource-server

update the operator resourceServer to not assume the open-cluster-management namespace for ACM and dynamically obtain the searchAPI

restructure the O2IMS CRD to have a separate structure that holds the configuration for each server

each server structure contains a new ServerConfig struct meant to hold parameters common to all the servers

add the parameters used to start the servers in the CR status

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

irinamihai · 2024-06-06T22:09:47Z

internal/controllers/utils/utils.go

+		// Add the token arg:
+		result = append(
+			result,
+			GetBackendTokenArg(orano2ims.Spec.ResourceServerConfig.BackendToken))


Requests to the resource server didn't work for me when the default token file was used ("/run/secrets/kubernetes.io/serviceaccount/token").
@danielerez @jhernand , do you know why that would be?
This doesn't have to block the merge as the resource server token can be included in the spec.

What do you mean when you shay that it doesn't work? What error does it write to the log? If you get an error then we may have a problem in how we pass that token to the server, because in theory any service account has permission to use the ACM search API. You can find more details about how that is checked here:

https://github.com/stolostron/search-v2-api/blob/main/pkg/rbac/authnMiddleware.go

If there is no error, but you don't get the expected results, then it will probably be because the service account that you are using doesn't have the right permissions. The ACM search API uses the same permissions than the rest of the cluster. You can find more details about how that is implemented here:

https://github.com/stolostron/search-v2-api/blob/main/pkg/rbac/authzMiddleware.go

If I understand correctly we create a service account for each service. For the resource server is named resource-server, I think. That service account won't have any permissions. For example, when you send a request like this:

GET /o2ims-infrastructureInventory/v1/resourcePools

Our resource server will try to use the search API to retrieve the set of ManagedClusters (this is the mapping we currently do: a resource pool per ManagedCluster, @danielerez correct me if I am wrong) and will receive nothing because the resource-server account doesn't have permissions to get ManagedCluster objects.

In the documentation here @danielerez suggests to use the oauth-apiserver-sa service account from the openshift-oauth-apiserver namespace. That works because that service account is the one used by the ACM search API itself, and it has all the permissions. Instead of that I think that our operator should add to our resource-server service account all the required permissions, creating a role and a role binding for that. I think that role will need read access to ManagedClusters and Hosts. @danielerez please clarify with @irinamihai what are the required permissions.

If my analysis is right, that work ^ looks complicated enough to deserve a separate pull request. Please merge this first, and then open another one for that.

Thank you for all the info, @jhernand. I apologize, I should have provided more details.
It doesn't work in the sense that an empty list is always returned by the server. I did multiple runs and I noticed this is the behavior when the token is not correct.
I will look more into it and discuss with @danielerez for the future PR you've suggested.

I will merge this since we can specify the backendToken for the resourceServer separately.

So about the access permissions, in the resource server we query two kinds of objects: Cluster and Node.
Then I guess it could indeed be enough to have read access for these, but will have to check if anything else is maybe required by the search API.
Regarding the empty list using an invalid token, it does look like a bug. I would expect a StatusUnauthorized (401) error in such scenario. Will have to look into it.

openshift-ci-robot · 2024-06-06T22:11:51Z

@irinamihai: This pull request references MGMT-17997 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.17.0" version, but no target version was set.

In response to this:

Description:

add backend-token-file parameter to the resource-server

update the operator resourceServer to not assume the open-cluster-management namespace for ACM and dynamically obtain the searchAPI

restructure the O2IMS CRD to have a separate structure that holds the configuration for each server

each server structure contains a new ServerConfig struct meant to hold parameters common to all the servers

add the parameters used to start the servers in the CR status

The operator was tested locally using a personal quay image that included the resource server changes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

jhernand · 2024-06-07T09:49:12Z

internal/controllers/utils/utils.go

+	domain := ".apps" + ingressSplit[len(ingressSplit)-1]
+
+	// The searchAPI is obtained from the "search-api" string and the ACM namespace.
+	searchAPI := "https://" + "search-api-" + acmNamespace + domain


For a different pull request, but note that as we will be running inside the same cluster than the search API we can use directly the search-search-api service instead of creating a new route that points to it. So the URL could be:

https://search-search-api.open-cluster-management.svc.cluster.local:4010

That can simplify things, and reduces the number of hops to get to the backend.

jhernand · 2024-06-07T09:50:16Z

/approve
/lgtm

openshift-ci · 2024-06-07T09:50:25Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jhernand

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [jhernand]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

irinamihai · 2024-06-07T14:53:01Z

/unhold

irinamihai requested review from jhernand and danielerez May 31, 2024 02:16

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 31, 2024

irinamihai force-pushed the MGMT-17997-resource-server branch from ab40072 to c1bfbcc Compare May 31, 2024 22:38

irinamihai commented May 31, 2024

View reviewed changes

irinamihai force-pushed the MGMT-17997-resource-server branch 2 times, most recently from 5c8e510 to 953a6b2 Compare June 3, 2024 22:36

jhernand reviewed Jun 4, 2024

View reviewed changes

danielerez reviewed Jun 4, 2024

View reviewed changes

irinamihai force-pushed the MGMT-17997-resource-server branch 3 times, most recently from 77df5de to 7217dc9 Compare June 5, 2024 02:10

jhernand reviewed Jun 5, 2024

View reviewed changes

irinamihai force-pushed the MGMT-17997-resource-server branch from 7217dc9 to 099d569 Compare June 5, 2024 23:38

danielerez reviewed Jun 6, 2024

View reviewed changes

irinamihai force-pushed the MGMT-17997-resource-server branch from 099d569 to ef838fd Compare June 6, 2024 22:04

irinamihai commented Jun 6, 2024

View reviewed changes

irinamihai requested review from jhernand and danielerez June 6, 2024 22:10

jhernand reviewed Jun 7, 2024

View reviewed changes

openshift-ci bot assigned jhernand Jun 7, 2024

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 7, 2024

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 7, 2024

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 7, 2024

openshift-merge-bot bot merged commit ec8d768 into openshift-kni:main Jun 7, 2024
8 checks passed

MGMT-17997: Resource server doesn't start on deployment #101

MGMT-17997: Resource server doesn't start on deployment #101

Conversation

irinamihai commented May 31, 2024 • edited Loading

openshift-ci-robot commented May 31, 2024 • edited by openshift-ci bot Loading

irinamihai commented May 31, 2024

openshift-ci-robot commented May 31, 2024 • edited by openshift-ci bot Loading

openshift-ci-robot commented May 31, 2024 • edited by openshift-ci bot Loading

Choose a reason for hiding this comment

openshift-ci-robot commented May 31, 2024 • edited by openshift-ci bot Loading

openshift-ci-robot commented Jun 3, 2024 • edited by openshift-ci bot Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-ci-robot commented Jun 5, 2024 • edited by openshift-ci bot Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-ci-robot commented Jun 6, 2024 • edited by openshift-ci bot Loading

irinamihai Jun 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-ci-robot commented Jun 6, 2024 • edited by openshift-ci bot Loading

Choose a reason for hiding this comment

jhernand commented Jun 7, 2024

openshift-ci bot commented Jun 7, 2024

irinamihai commented Jun 7, 2024

irinamihai commented May 31, 2024 •

edited

Loading

openshift-ci-robot commented May 31, 2024 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented May 31, 2024 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented May 31, 2024 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented May 31, 2024 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Jun 3, 2024 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Jun 5, 2024 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Jun 6, 2024 •

edited by openshift-ci bot

Loading

irinamihai Jun 6, 2024 •

edited

Loading

openshift-ci-robot commented Jun 6, 2024 •

edited by openshift-ci bot

Loading