Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CrunchyData "repo" (pgbackrest) instance not using serviceAccount. (Permissions in the namespace's "default" serviceAccount affect deployment) #3472

Closed
nnachefski opened this issue Nov 22, 2022 · 9 comments
Labels

Comments

@nnachefski
Copy link

I created a DB using CrunchyData (named "tracking"), but i also have "anyuid" policy set for the project's 'default' ServiceAccount. The initContainer ("pgbackrest-log-dir") in the "tracking-repo-host" StatefulSet failed to deploy citing:

mkdir: cannot create directory ‘/pgbackrest/repo1/log’: Permission denied

# oc get sts tracking-repo-host -n dev -o yaml |grep serviceAccount
<nothing>
# oc get sa |grep tracking
tracking-instance          1         142m
tracking-pgbackrest        1         142m     <----- shouldnt the sts being using this SA and not 'default' ?

If i remove the 'anyuid' ClusterRoleBinding from the 'default' serviceAccount and try again it works fine.

-Nick

@nnachefski nnachefski changed the title CrunchyData "repo" (pgbackrest) instance not using serviceAccount. permissions in the namespace's "default" serviceAccount affect deployment CrunchyData "repo" (pgbackrest) instance not using serviceAccount. (Permissions in the namespace's "default" serviceAccount affect deployment) Nov 22, 2022
@cbandy
Copy link
Member

cbandy commented Nov 28, 2022

Which version of PGO and OpenShift are you using?

@dan1el-k
Copy link

dan1el-k commented Dec 22, 2022

We are experiences the same on multiple our of OKD clusters (OKD 4.9, 4.10, 4.11) and PGO 5.x.x.
But we can limit the issue scope, this only happens when you run multiple services which are using the "default" namespace.
For any reason if installing PGO in a naked namespace, then the default serviceAccount works.

The (manual) solution to run it with other services in the same namespace, is to set the serviceAccount to the generated on of PGO in the "repo-host" statefullset.

@joyartoun
Copy link

joyartoun commented Feb 3, 2023

Hi!

I'm experiencing the same issue. The postgresCluster CR has no property for setting serviceAccount for pgbackrest. So I have to assing SCCs to the default serviceaccount. Running OKD 4.11 operator 5.3.0

@nnachefski
Copy link
Author

nnachefski commented May 12, 2023

This issue is still happening on OKD 4.12 with CrunchyData 5.3.0

The problem manifests itself in the pgbackrest-log-dir initContainer.

Here is the work-around for now:
(change the sts and serviceAccount name to whatever your's is called)

oc patch sts airflow-repo-host --type=merge -p '{"spec":{"template":{"spec":{"initContainers":[{"name":"pgbackrest-log-dir"}],"serviceAccountName":"airflow-pgbackrest"}}}}'

@douggutaby
Copy link

Thank you @nnachefski, it helped me a lot. The pod can work, backups are fine, but it cannot write logs, because the openshift uid doesn't have write access to the log dir:
sh-4.4$ ls -la /pgbackrest/repo1/log/
total 0
drwxr-xr-x. 2 26 26 0 Jun 5 10:06 .

I think uid of postgres user is 26 in the image, but we use openshift uid here.

I just wanted to highlight this, if someone like me find this issue and WA. I hope Crunchy will fix this soon.

@loydbanks
Copy link

I am having a question related to the use of this service account. The repo-host pod is using the default service account in my case and I am getting the error

option 'repo1-s3-key-type' is 'web-id' but 'AWS_ROLE_ARN' and 'AWS_WEB_IDENTITY_TOKEN_FILE' are not set

I believe it has to do with the pod using the default service account whilst the other pod is using the helix-instance service account (my Postgres cluster is called helix) which does have the AWS_ROLE_ARN and AWS_WEB_IDENTITY_TOKEN_FILE. Is this pod meant to be using the default service account or the helix-instance service account like the other pod. I am using AWS EKS trying to backup to AWS s3. Please help me

@jeffgus
Copy link

jeffgus commented Oct 1, 2024

I'm experiencing the same issue. I tested an awscli pod with the service account I created for the purpose of using with the operator. I've used plenty of IRSA pods in other places and I know it works.

I can't get it to work with pgbackrest. I can override the repo-host-0 pod with the serviceAccountName, but I still get the error:

option 'repo2-s3-key-type' is 'web-id' but 'AWS_ROLE_ARN' and 'AWS_WEB_IDENTITY_TOKEN_FILE' are not set

If I apply the metadata to all the service accounts so that the database pods are now using a service account with the AWS_ROLE_ARN set, I get this error:

ERROR: [029]: unable to find child 'AssumeRoleWithWebIdentityResult':0

I tried to fiddle with the Trust Releationship configuration (see: #3135), but that doesn't seem to fix it.

@nickma82
Copy link

Had the same issue that securityContext.fsGroup: 26 was attached, on OpenShift 4.14 using postgresoperator.v5.6.0.
Installing a newer postgresoperator v5.7.2 did fix the issue.

Original symptom in the event stream:
.spec.securityContext.fsGroup: Invalid value: []int64{26}: 26

@andrewlecuyer
Copy link
Collaborator

Just a quick update to note that as of the following change, CPK now reconciles a Service Account for the pgBackRest dedicated repo host: #4072.

This change also fixes the issue seen with pgBackRest+S3 and IRSA, as discussed in the CPK docs here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants