Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the Repo per app scenario possible with gitops-connector? #73

Open
cyberjpb1 opened this issue Jul 21, 2024 · 31 comments
Open

Is the Repo per app scenario possible with gitops-connector? #73

cyberjpb1 opened this issue Jul 21, 2024 · 31 comments

Comments

@cyberjpb1
Copy link

Is the Repo per app scenario possible with gitops-connector?
Currently I have a repo for each application which also contains the manifests.

If the answer to my question is yes, how to configure it?
gitRepositoryType: AZDO
ciCdOrchestratorType: AZDO
gitOpsOperatorType: FLUX
azdoGitOpsRepoName:
azdoOrgUrl:
azdoPrRepoName:
gitOpsAppURL:
orchestratorPAT:

@eedorenko
Copy link
Collaborator

You need to have multiple instances of gitops-connector on your clusters. One instance per application/repo

@markphillips100
Copy link

@eedorenko would a k8 operator approach be feasible here? To clarify, I mean one installation of gitops-connector (operator + custom Connection CRD) which then handles multiple subscribe-to-notification/publish-to-specific-repo Connection resources.

@markphillips100
Copy link

@eedorenko I've added support for multiple configs in the one instance via a CRD if you are interested? It still supports the original env config via helm values type of setup, albeit it is a breaking change due to the values restructuring. The switch between modes of operation are determined by setting singleInstance: null or supplying values. All is explained in the helm chart readme.

Whilst it works for ArgoCD notifications as-is, I haven't tested with flux as my environment isn't setup to deal with it. It's not a great deal of work and can explain if this goes further.

My fork is here, and let me know if you want a PR opened.

@cyberjpb1
Copy link
Author

cyberjpb1 commented Oct 9, 2024

@markphillips100 Very interesting, I will try your approach by doing a test with Flux. I will get back to you as soon as possible.

@markphillips100
Copy link

@cyberjpb1 For the flux_gitops_operator to filter supported messages and indicate its support for the required config name in phase_data, the is_supported_message function needs fleshing out here. See the argo_gitops_operator change for how I implemented it in that use case.

I imagine in FluxV2 use case we would need to make use of the Alert's eventMetadata to convey the required config name in the phase_data, although being unfamiliar with Flux I do not know how this results in the phase_data structurally to know what to look for.

@markphillips100
Copy link

@cyberjpb1 I checked a previous PR you opened for insight into the eventMetadata and that gave me enough info to create the is_supported_operator. I've created a new flux-multi-config-support branch with this change.

So in theory, the following should suffice:

  1. add a gitops_connector_config_name: "<name of config>" to the Alert's eventMetadata,
  2. set singleInstance: null in values.yaml,
  3. apply a gitopsconfig manifest to the cluster where gitops-connector is running - ensure the name is the same name used in step 1.

NOTE: The helm chart creates a service account, role and role binding to support the connector watching and updating the gitopsconfig resource. The operator also automatically patches (hence the updating) a finalizer into the resource to ensure when it is deleted that a proper cleanup occurs before the manifest is removed from the cluster.

@cyberjpb1
Copy link
Author

@markphillips100 Hello, sorry for the delay of my response but my schedule did not allow me to do a test before today.
I tried to follow your installation procedure but it does not work, I must be missing something in the configuration.

I'm using Red Hat CodeReady Containers (crc) for my tests.

Here is the log when the pod starts.

[2024-11-24 00:21:37 +0000] [7] [INFO] Starting gunicorn 20.0.4
[2024-11-24 00:21:37 +0000] [7] [INFO] Listening at: http://0.0.0.0:8080 (7)
[2024-11-24 00:21:37 +0000] [7] [INFO] Using worker: sync
[2024-11-24 00:21:37 +0000] [8] [INFO] Booting worker with pid: 8
DEBUG:root:Detected no ENV configuration data. Running in multiple instance configuration mode via gitopsconfig resources.
INFO:root:Starting Kopf operator thread
INFO:root:Starting GitOps Operator
DEBUG:asyncio:Using selector: EpollSelector
/usr/local/lib/python3.9/site-packages/kopf/_core/reactor/running.py:179: FutureWarning: Absence of either namespaces or cluster-wide flag will become an error soon. For now, switching to the cluster-wide mode for backward compatibility.
warnings.warn("Absence of either namespaces or cluster-wide flag will become an error soon."
DEBUG:kopf._core.reactor.running:Starting Kopf 1.37.3.
INFO:kopf._core.engines.activities:Initial authentication has been initiated.
DEBUG:kopf.activities.authentication:Activity 'login_via_client' is invoked.
WARNING:kopf._core.reactor.running:OS signals are ignored: running not in the main thread.
DEBUG:kopf.activities.authentication:Client is configured in cluster with service account.
INFO:kopf.activities.authentication:Activity 'login_via_client' succeeded.
INFO:kopf._core.engines.activities:Initial authentication has finished.
WARNING:kopf._core.reactor.observation:Non-patchable resources will not be served: {gitopsconfigs.v1.apps-crc.testing}
DEBUG:kopf._cogs.clients.watching:Starting the watch-stream for customresourcedefinitions.v1.apiextensions.k8s.io cluster-wide.
DEBUG:kopf._cogs.clients.watching:Stopping the watch-stream for customresourcedefinitions.v1.apiextensions.k8s.io cluster-wide.
WARNING:kopf._core.reactor.observation:Not enough permissions to watch for resources: changes (creation/deletion/updates) will not be noticed; the resources are only refreshed on operator restarts.

@markphillips100
Copy link

The last warning in your logs would point at some role/role-binding issue for the service account the operator is configured to run with. Maybe take a look there to ensure the watch permission is available on gitopsconfigs crd.

@markphillips100
Copy link

The helm chart should be setting that up for you here

@cyberjpb1
Copy link
Author

cyberjpb1 commented Nov 24, 2024

Yes, I use your Helm Chart, the only thing I changed is replacing "example.com" by "apps-crc.testing".

@markphillips100
Copy link

markphillips100 commented Nov 24, 2024

Yeah was noticing that but wouldn't think it would be a problem....prior to the edit :-)

@markphillips100
Copy link

Did you by any chance install the helm chart prior to making the api group changes? Just wondering if there are now 2 CRDs for the same Kind.

The 3 event handlers here aren't currently checking for apiVersion so would be just chance which one is used if more than one CRD group exists. These handlers would need to change (and probably should anyway) to include the group name.

@cyberjpb1
Copy link
Author

cyberjpb1 commented Nov 24, 2024

It's as if the callback was never sent.
The task in the pipeline wait indefinitely.

@markphillips100
Copy link

DEBUG:kopf._cogs.clients.watching:Stopping the watch-stream for customresourcedefinitions.v1.apiextensions.k8s.io cluster-wide.
WARNING:kopf._core.reactor.observation:Not enough permissions to watch for resources: changes (creation/deletion/updates) will not be noticed; the resources are only refreshed on operator restarts.

Can you delete the operator pod (without uninstalling/reinstalling helm chart) and confirm if logs still show this warning please? It's possibly a race condition re permissions not set by time pod comes up so a restart of pod should negate that.

@cyberjpb1
Copy link
Author

OK, I deleted the pod and here is the log of the new pod:

[2024-11-24 02:36:45 +0000] [7] [INFO] Starting gunicorn 20.0.4
[2024-11-24 02:36:45 +0000] [7] [INFO] Listening at: http://0.0.0.0:8080 (7)
[2024-11-24 02:36:45 +0000] [7] [INFO] Using worker: sync
[2024-11-24 02:36:45 +0000] [8] [INFO] Booting worker with pid: 8
DEBUG:root:Detected no ENV configuration data. Running in multiple instance configuration mode via gitopsconfig resources.
INFO:root:Starting Kopf operator thread
INFO:root:Starting GitOps Operator
DEBUG:asyncio:Using selector: EpollSelector
/usr/local/lib/python3.9/site-packages/kopf/_core/reactor/running.py:179: FutureWarning: Absence of either namespaces or cluster-wide flag will become an error soon. For now, switching to the cluster-wide mode for backward compatibility.
warnings.warn("Absence of either namespaces or cluster-wide flag will become an error soon."
DEBUG:kopf._core.reactor.running:Starting Kopf 1.37.3.
INFO:kopf._core.engines.activities:Initial authentication has been initiated.
DEBUG:kopf.activities.authentication:Activity 'login_via_client' is invoked.
WARNING:kopf._core.reactor.running:OS signals are ignored: running not in the main thread.
DEBUG:kopf.activities.authentication:Client is configured in cluster with service account.
INFO:kopf.activities.authentication:Activity 'login_via_client' succeeded.
INFO:kopf._core.engines.activities:Initial authentication has finished.
WARNING:kopf._core.reactor.observation:Non-patchable resources will not be served: {gitopsconfigs.v1.apps-crc.testing}
DEBUG:kopf._cogs.clients.watching:Starting the watch-stream for customresourcedefinitions.v1.apiextensions.k8s.io cluster-wide.
DEBUG:kopf._cogs.clients.watching:Stopping the watch-stream for customresourcedefinitions.v1.apiextensions.k8s.io cluster-wide.
WARNING:kopf._core.reactor.observation:Not enough permissions to watch for resources: changes (creation/deletion/updates) will not be noticed; the resources are only refreshed on operator restarts.

@markphillips100
Copy link

So still a permission issue by the looks, preventing the operator watching for the gitopsconfig. I would expect to see some DEBUG lines pertaining to the GitopsConnectorManager setting up a GitopsConnector for the name supplied in your GitOpsConfig manifest.

The other messages in the previous log are just the raw event data coming from flux via the /gitopsphase endpoint. They get ignored because no GitopsConnecter is configured basically.

Any chance you can revert to a clean install of the original example.com CRD just so we can rule that out completely?

@cyberjpb1
Copy link
Author

OK, I'll try that tomorrow.
Thanks a lot for your help, it's much appreciated.

@markphillips100
Copy link

No problem

@cyberjpb1
Copy link
Author

cyberjpb1 commented Nov 24, 2024

You were right, it was an old version of the GitOpsConfig that was stuck.
I had to do an "oc replace" command to unstick it and be able to delete it.

Now I have this that keeps coming back in a loop in the log (I have hidden the sensitive values.)

DEBUG:root:_should_update_abandoned_pr. should_update: False
DEBUG:root:_should_update_abandoned_pr called. pr_data: {
"repository": {
"id": "99999999-9999-9999-9999-999999999999",
"name": "WebAppExemple2.IaC",
"url": "https://dev.azure.com/myorganization/99999999-9999-9999-9999-999999999999/_apis/git/repositories/99999999-9999-9999-9999-999999999999",
"project": {
"id": "99999999-9999-9999-9999-999999999999",
"name": "GitOps",
"state": "unchanged",
"visibility": "unchanged",
"lastUpdateTime": "0001-01-01T00:00:00"
}
},
"pullRequestId": 285,
"codeReviewId": 285,
"status": "abandoned",
"createdBy": {
"displayName": "GitOps Build Service (myorganization)",
"url": "https://spsprodcca1.vssps.visualstudio.com/99999999-9999-9999-9999-999999999999/_apis/Identities/99999999-9999-9999-9999-999999999999",
"_links": {
"avatar": {
"href": "https://dev.azure.com/myorganization/_apis/GraphProfile/MemberAvatars/svc.XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
}
},
"id": "99999999-9999-9999-9999-999999999999",
"uniqueName": "Build\99999999-9999-9999-9999-999999999999",
"imageUrl": "https://dev.azure.com/myorganization/_api/_common/identityImage?id=99999999-9999-9999-9999-999999999999",
"descriptor": "svc.XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
},
"creationDate": "2024-07-29T18:18:33.2230957Z",
"closedDate": "2024-07-29T19:03:55.5257799Z",
"title": "Deploiement Helm Chart 904.0.0 en dev sur testing",
"sourceRefName": "refs/heads/devRelease-904.0.0",
"targetRefName": "refs/heads/master",
"isDraft": false,
"mergeId": "99999999-9999-9999-9999-999999999999",
"lastMergeSourceCommit": {
"commitId": "99999999-9999-9999-9999-999999999999",
"url": "https://dev.azure.com/myorganization/99999999-9999-9999-9999-999999999999/_apis/git/repositories/99999999-9999-9999-9999-999999999999/commits/99999999-9999-9999-9999-999999999999"
},
"lastMergeTargetCommit": {
"commitId": "99999999-9999-9999-9999-999999999999",
"url": "https://dev.azure.com/myorganization/99999999-9999-9999-9999-999999999999/_apis/git/repositories/99999999-9999-9999-9999-999999999999/commits/99999999-9999-9999-9999-999999999999"
},
"reviewers": [],
"url": "https://dev.azure.com/myorganization/99999999-9999-9999-9999-999999999999/_apis/git/repositories/99999999-9999-9999-9999-999999999999/pullRequests/285",
"completionOptions": {
"squashMerge": true,
"mergeStrategy": "squash"
},
"supportsIterations": true
}
DEBUG:root:_should_update_abandoned_pr. should_update: False

@markphillips100
Copy link

This is the abandoned PR status reconciliation which is the same as the original gitops-connector code. At least I don't recall making changes here other than more logging.

_should_update_abandoned_pr. should_update: False

This statement indicates the PR was closed (abandoned) longer than the constant 72 hours defined in code so the processing just ignores further processing of the PR's status. Unfortunately there is no way in Azure Devops to delete abandoned PRs.

@cyberjpb1
Copy link
Author

Hello again @markphillips100,

I keep trying to make gitops-connector work. I don't have the following message anymore:
WARNING:kopf._core.reactor.observation:Not enough permissions to watch for resources: changes (creation/deletion/updates) will not be noticed...
Because I added this to the ClusterRole (I don't know if it's the right solution).

  • apiGroups: [apiextensions.k8s.io]
    resources: [customresourcedefinitions]
    verbs: [list, watch]
  • apiGroups: [""]
    resources: [namespaces]
    verbs: [list, watch]

In the log it seems to work fine but in Azure DevOps, it looks like the task in the pipeline is not receiving the Callback.

Here is how the task is defined, is it OK?

  • task: InvokeRESTAPI@1
    displayName: "Attente confirmation PR/déploiement"
    inputs:
    connectionType: connectedServiceName
    serviceConnection: azdo-pr-connection
    method: 'PATCH'
    urlSuffix: 'OrganizationName/$(System.TeamProject)/_apis/git/repositories/$(Build.Repository.ID)/pullRequests/$(pr_num)/properties?api-version=7.0'
    headers: '{"Content-Type":"application/json-patch+json", "Authorization": "Bearer $(System.AccessToken)"}'
    body: '[{"op":"add","path":"/callback-task-id","from":null,"value":"{"taskid":"$(System.TaskInstanceId)", "jobid":"$(System.JobId)", "planurl":"$(System.CollectionUri)", "planid":"$(System.PlanId)", "projectid":"$(System.TeamProjectId)", "pr_num":"$(pr_num)"}"}]'
    waitForCompletion: 'true'

@markphillips100
Copy link

It's possible the service account I used was supplied those extra permissions through a completely separate role binding. I'm not in a position to confirm that for a while. If you no longer get the error then I'd say you solved the permission issue at least.

As for the task, the urlSuffix path looks different. Here's mine - don't know if the api version from 6.0 to 7.0 will also make a difference:

steps:  
  - task: InvokeRESTAPI@1
    displayName: "Wait for deployment completion"
    inputs:
      connectionType: connectedServiceName
      serviceConnection: $(pr_connection)
      method: 'PATCH'
      urlSuffix: '/pullRequests/$(pr_num)/properties?api-version=6.0'
      headers: '{"Content-Type":"application/json-patch+json", "Authorization": "Bearer $(System.AccessToken)"}'
      body: '[{"op":"add","path":"/callback-task-id","from":null,"value":"{\"taskid\":\"$(System.TaskInstanceId)\", \"jobid\":\"$(System.JobId)\", \"planurl\":\"$(System.CollectionUri)\",  \"planid\":\"$(System.PlanId)\", \"projectid\":\"$(System.TeamProjectId)\", \"pr_num\":\"$(pr_num)\"}"}]'
      waitForCompletion: 'true'
      completionEvent: 'Callback'

@cyberjpb1
Copy link
Author

@markphillips100 Just to be sure, is it the "dev" branch I should use or is it "flux-multi-config-support"?

@markphillips100
Copy link

For you (using flux and this testing) use the flux-multi-config-support. It only differs from dev in terms of checking for metadata in the flux operator side of things markphillips100/gitops-connector@dev...markphillips100:gitops-connector:flux-multi-config-support

@cyberjpb1
Copy link
Author

cyberjpb1 commented Dec 22, 2024

@markphillips100 Hello,
Finally, everything works now.

I didn't realize that gitops-connector had to be running before creating/applying the gitopsconfig.yaml manifest because that's when the kopf create event is triggered and the configuration is initialized.

There will possibly be a problem with that if for some reason I have to delete the pod of the gitops-connector, I will have to recreate or update all the gitopsconfig.yaml manifests of all application repos.

Could the solution perhaps be, that when starting gitops-connector, get the list of all GitOpsConfig type names and perform a parse_config on each one.

By the way, nice work.
It makes me want to learn Python.

Are you going to make an official version?

Happy Holidays!

@cyberjpb1
Copy link
Author

@markphillips100 For loading configurations on startup, I tried the following code and it seems to work.

I added this just before the @kopf.on.create in the gitops_event_handler.py file (it would still be necessary to create a parameter for the group name).

    config.load_incluster_config()  # In-cluster Kubernetes config
    api_instance = client.CustomObjectsApi()
    instances  = api_instance.list_cluster_custom_object(group="apps-crc.testing", version="v1", plural="gitopsconfigs")
    for instance in instances.get("items"):
        config_name = instance.get("metadata").get("name")
        config_namespace = instance.get("metadata").get("namespace")
        config_spec = instance.get("spec")
        gitops_config_operator.create(config_spec, config_name)
        logging.debug(f"Processing config: '{config_name}' in Namespace: '{config_namespace}'")

@markphillips100
Copy link

@cyberjpb1 good find re the create event and the behaviour you were seeing. It definitely isn't desirable to have an order of initialisation for resource and operator.

I haven't been in a position to focus on it lately but if memory serves I'm pretty sure my configs existed prior to connector running without problem. I can't be certain though so I'd need to test more in the new year (hopefully in Jan) to confirm the behaviour. Will most likely use your code addition is my guess. Just glad you have it working for now :-)

As for an official version, I guess that's up to the owner @eedorenko

@cyberjpb1
Copy link
Author

Hello @eedorenko, Do you think you will accept the code proposal of @markphillips100 ?

@eedorenko
Copy link
Collaborator

Yes, I will. Let's PR it

@markphillips100
Copy link

I'm unavailable for a couple of weeks but happy for someone else to create the PR if anyone should so wish. Also happy to do it myself in March if that's the preferred action.

@cyberjpb1
Copy link
Author

I think it would be fair for the main author of the changes to make the PR, so I'll wait for @markphillips100 to do it in March.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants