Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to make it work in Kubernetes #21

Open
xeruf opened this issue Feb 21, 2024 · 17 comments
Open

Unable to make it work in Kubernetes #21

xeruf opened this issue Feb 21, 2024 · 17 comments

Comments

@xeruf
Copy link

xeruf commented Feb 21, 2024

Thought I could spare myself the frenzy of reproducing the configuration outside by temporarily replacing the bitnami postgresql image in my cluster by this one.
But getting this error: postgres: could not look up effective user ID 1001: user does not exist

image

@justinclift
Copy link
Member

Oh, now that's a weird error message.

It's likely coming from when these lines run:

COLLATE=$(echo 'SHOW LC_COLLATE' | "${OLDPATH}/bin/postgres" --single -D "${OLD}" | grep 'lc_collate = "' | cut -d '"' -f 2)
CTYPE=$(echo 'SHOW LC_CTYPE' | "${OLDPATH}/bin/postgres" --single -D "${OLD}" | grep 'lc_ctype = "' | cut -d '"' -f 2)
ENCODING=$(echo 'SHOW SERVER_ENCODING' | "${OLDPATH}/bin/postgres" --single -D "${OLD}" | grep 'server_encoding = "' | cut -d '"' -f 2)
POSTGRES_INITDB_ARGS="--locale=${COLLATE} --lc-collate=${COLLATE} --lc-ctype=${CTYPE} --encoding=${ENCODING}"

Any idea why it would be unable to look up the effective uid in your environment?

@kaplan-michael
Copy link

I think that is caused by the chart setting the security context similar to this.

securityContext:
  runAsUser: 1001

but the container does not have a user with uid 1001.

when you do a Dockerfile like this

FROM pgautoupgrade/pgautoupgrade:16-alpine
RUN adduser -u 1001 -G root -s /bin/sh -D pgautoupgrade

and build it, it works fine

You will then hit postgresql.conf not being present, bcs bitnami generates it on the startup into /opt/bitnami/postgresql/conf which is mounted as empty-dir(temp dir) 🤦 and they do the same with pg_hba.conf...
pg_ident.conf is surprisingly in the data dir.

@justinclift
Copy link
Member

Any idea if there's something (preferably simple) we can change with our images to get it working?

Hmmm, I wonder if changing the postgres user's uid to be 1001 might help?

(note that I've just woken up and haven't had coffee yet, so that could be an obviously bad idea in 10 mins... 😉)

@kaplan-michael
Copy link

Not sure yet, I gave up on it(due to the missing configs, and just went with a pg_dump)
I'm not sure if you change the postgres users uid, that it won't break things?

I wonder if you just add a user with uid 1001 if it will impact something else? from my testing so far(you don't have to do USER 1001 so the container will still run as root by default, just have a user available)

bitnami themselfs don't add the user, but instead just set USER 1001 and postgres is not super happy about that either)

@spwoodcock
Copy link

spwoodcock commented Sep 30, 2024

Couldn't you just add a values.yaml for the chart with:

securityContext:
  runAsUser: 0

using helm --values values.yaml, or even just specify as a --set argument.

I think the container entrypoint needs to run as root here, but I don't see much of a security concern running a small upgrade script as root in the cluster as one time only.

Probably best to use PGAUTO_ONESHOT mode for this use case though!


I could update this wiki guide

https://github.com/pgautoupgrade/docker-pgautoupgrade/wiki/Automating-The-Upgrade

for a sample of using a Kubernetes initContainer at some point, if useful 😃

@p4block
Copy link

p4block commented Oct 4, 2024

I also tried to get a initcontainer going to update bitnami postgresql charts automatically, and reached the same blocker with the missing postgresql.conf (there's an empty postgresql.auto.conf though) and pg_hba.conf

Permissions issues are no big deal, worst case after running the migration with root it can chown the files again. Lacking the configs, however, is very annoying to workaround.

Please share your ideas if you have any!

Posting here my current non-working config for reference (see later comment)

postgresql:
 [...]

  primary:
    persistence:
      existingClaim: "freshrss-postgres-pvc"

    lifecycleHooks:
      postStart:
        exec:
          command:
            - /bin/sh
            - -c
            - | 
              echo "Waiting a bit"
              sleep 5
              echo "Copying configuration files for later use..."
              cp /opt/bitnami/postgresql/conf/postgresql.conf /bitnami/postgresql/data/
              cp /opt/bitnami/postgresql/conf/pg_hba.conf /bitnami/postgresql/data/

              # Fix the config file
              sed -i '/pgaudit/d' /bitnami/postgresql/data/postgresql.conf
              sed -i '/conf.d/d' /bitnami/postgresql/data/postgresql.conf

    initContainers:
      - image: pgautoupgrade/pgautoupgrade:17-bookworm
        name: upgrade-postgres
        securityContext:
          runAsUser: 0
        env:
          - name: PGAUTO_ONESHOT
            value: "yes"
          - name: POSTGRES_DB
            value: "freshrss"
          - name: POSTGRES_USER
            value: "freshrss"
          - name: POSTGRES_PASSWORD
            valueFrom:
              secretKeyRef:
                name: freshrss-postgresql
                key: "password"
        volumeMounts:
          - name: data # name of the volume in the bitnami pod
            mountPath: /var/lib/postgresql/data/
            subPath: data
        # command:
        #   - /bin/sh
        # args:
        #   - -c
        #   -
        #     # Hang indefinitely to allow debugging
        #     echo "Init container completed. Entering debug mode..."
        #     sleep infinity

@p4block
Copy link

p4block commented Oct 4, 2024

I fixed the missing files problem by copying them as a postStart to the main container. It needs to run at least once before any migrations, but it's good enough. I am encountering another issue now where it cannot connect to the database after copying it to new. Credentials are set correctly in theory, I am testing now with docker run for faster iteration and have the same problem.

[...]
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    /usr/lib/postgresql/17/bin/pg_ctl -D /var/lib/postgresql/data/new/ -l logfile start

------------------------------------
New database initialisation complete
------------------------------------
---------------------------------------
Running pg_upgrade command, from /var/lib/postgresql/data
---------------------------------------
Performing Consistency Checks
-----------------------------
Checking cluster versions                                     ok

connection to server on socket "/var/run/postgresql/.s.PGSQL.50432" failed: fe_sendauth: no password supplied
could not connect to source postmaster started with the command:
"/usr/local-pg16/bin/pg_ctl" -w -l "/var/lib/postgresql/data/new/pg_upgrade_output.d/20241004T163840.319/log/pg_upgrade_server.log" -D "/var/lib/postgresql/data/old" -o "-p 50432 -b  -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/var/run/postgresql'" start
Failure, exiting

@justinclift
Copy link
Member

justinclift commented Oct 4, 2024

@p4block We could look at adding Bitnami compatibility if doing so seems to be reasonably straight forward.

For your install, where's the location of the missing files that you needed to copy?

It is just the two files from /opt/bitnami/postgresql/conf/?

Are there any soft links between files in that directory and the PostgreSQL data directory?


As a data point, if you need to change the directory that PostgreSQL looks for its data files in, then the PGDATA docker runtime variable should work (as per your use of POSTGRES_DB and other variables there).


In the output you've pasted above, you have this as the final lines:

could not connect to source postmaster started with the command:
"/usr/local-pg16/bin/pg_ctl" -w -l "/var/lib/postgresql/data/new/pg_upgrade_output.d/20241004T163840.319/log/pg_upgrade_server.log" -D "/var/lib/postgresql/data/old" -o "-p 50432 -b  -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/var/run/postgresql'" start
Failure, exiting

That /var/lib/postgresql/data/new/pg_upgrade_output.d/20241004T163840.319/log/pg_upgrade_server.log fragment seems like a log file from the running pg_upgrade command. Does it contain anything that looks relevant?

@p4block
Copy link

p4block commented Oct 5, 2024

@justinclift
The bitnami image generates those files magically, they are not present in the postgres data directory and there is no way to grab them from the init container.

I worked around the problem by adding a hook to the bitnami container that copies them to the postgres data dir, but that's a hack imo

I don't think they are truly needed, maybe an option to use some pgautoupgrade-shipped ones on the spot is enough.
The pg_hba.conf is just this and I haven't configured the postgresql.conf in any way in my values.

host     all             all             0.0.0.0/0               md5
host     all             all             ::/0                    md5
local    all             all                                     md5
host     all             all        127.0.0.1/32                 md5
host     all             all        ::1/128                      md5

As for the error log, sadly I can't see anything aside from a shutdown request.

 /mnt/db/freshrss-postgres-pv/data # cat new/pg_upgrade_output.d/20241005T000213.057/log/pg_upgrade_server.log
-----------------------------------------------------------------
  pg_upgrade run on Sat Oct  5 00:02:13 2024
-----------------------------------------------------------------

command: "/usr/local-pg16/bin/pg_ctl" -w -l "/var/lib/postgresql/data/new/pg_upgrade_output.d/20241005T000213.057/log/pg_upgrade_server.log" -D "/var/lib/postgresql/data/old" -o "-p 50432 -b  -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/var/run/postgresql'" start >> "/var/lib/postgresql/data/new/pg_upgrade_output.d/20241005T000213.057/log/pg_upgrade_server.log" 2>&1
waiting for server to start....2024-10-05 00:02:13.186 GMT [99] LOG:  starting PostgreSQL 16.4 on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
2024-10-05 00:02:13.187 GMT [99] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.50432"
2024-10-05 00:02:13.190 GMT [102] LOG:  database system was shut down at 2024-10-05 00:02:11 GMT
2024-10-05 00:02:13.193 GMT [99] LOG:  database system is ready to accept connections
 done
server started


command: "/usr/local-pg16/bin/pg_ctl" -w -D "/var/lib/postgresql/data/old" -o "" -m fast stop >> "/var/lib/postgresql/data/new/pg_upgrade_output.d/20241005T000213.057/log/pg_upgrade_server.log" 2>&1
waiting for server to shut down...2024-10-05 00:02:13.275 GMT [99] LOG:  received fast shutdown request
.2024-10-05 00:02:13.275 GMT [99] LOG:  aborting any active transactions
2024-10-05 00:02:13.276 GMT [99] LOG:  background worker "logical replication launcher" (PID 104) exited with exit code 1
2024-10-05 00:02:13.276 GMT [100] LOG:  shutting down
2024-10-05 00:02:13.277 GMT [100] LOG:  checkpoint starting: shutdown immediate
2024-10-05 00:02:13.280 GMT [100] LOG:  checkpoint complete: wrote 2 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.004 s; sync files=2, longest=0.001 s, average=0.001 s; distance=0 kB, estimate=0 kB; lsn=13/E7DAB080, redo lsn=13/E7DAB080
2024-10-05 00:02:13.284 GMT [99] LOG:  database system is shut down
 done
server stopped


This is the part that makes no sense to me, why would the script break here this way?

Checking cluster versions                                     ok

connection to server on socket "/var/run/postgresql/.s.PGSQL.50432" failed: fe_sendauth: no password supplied

@justinclift
Copy link
Member

Ahhh. I think that error:

connection to server on socket "/var/run/postgresql/.s.PGSQL.50432" failed: fe_sendauth: no password supplied

...lines up with the pg_hba.conf file you've shown they generate:

host     all             all             0.0.0.0/0               md5
host     all             all             ::/0                    md5
local    all             all                                     md5  <-- this line
host     all             all        127.0.0.1/32                 md5
host     all             all        ::1/128                      md5

It looks like they're supplying a pg_hba.conf that demands an md5 password for unix domain socket connections (/var/run/postgresql/.s.PGSQL.50432).

Instead of copying over the one they provide, I'd probably try one with trust as the authentication method for the local line. Probably this, or some variation along those lines:

host     all             all             0.0.0.0/0               md5
host     all             all             ::/0                    md5
local    all             all                                     trust
host     all             all        127.0.0.1/32                 md5
host     all             all        ::1/128                      md5

If that turns out to work, is the /opt/bitnami directory present from the pgautoupgrade container's point of view?

Am thinking we could do something like this:

if [ -d /opt/bitnami ]; then
  # Do Bitnami specific stuff here
fi

That would let us potentially automatically use the above pg_hba.conf (if it works), or whatever else is needed for people running the container on Bitnami.

@justinclift
Copy link
Member

Hmmm, thinking about that a bit more, we could just check if the postgresql.conf and/or pg_hba.conf files are missing in the data dir and supply defaults ourselves for the pg_upgrade part of things.

That might only make sense for 'one shot' mode too, as we probably don't want to try running PG itself afterwards without the user supplied version of those files (which could potentially be very customised).

@p4block
Copy link

p4block commented Oct 5, 2024

The init container has a different filesystem than the main one, aside from the shared postgres folder. It would need to be told to ship default files.

I saw a way to detect a high likelihood the postgres data folder being used in a bitnami container before.

 /mnt/db/freshrss-postgres-pv/data # rg bitnami                     
postmaster.opts
1:/opt/bitnami/postgresql/bin/postgres "-D" "/bitnami/postgresql/data" "--config-file=/opt/bitnami/postgresql/conf/postgresql.conf" "--external_pid_file=/opt/bitnami/postgresql/tmp/postgresql.pid" "--hba_file=/opt/bitnami/postgresql/conf/pg_hba.conf"

When I changed auth to local (oops, didn't see before) I get a different error
connection to server on socket "/var/run/postgresql/.s.PGSQL.50432" failed: FATAL: must be superuser to connect in binary upgrade mode

Coincidentally, the root user for this postgres instance is a "closed the cell and threw the keys to the river" situation. I've had the upgrade before using user accounts in docker, maybe it's something else entirely related to which user is being active at that moment in the script.

I tried with the postgres docker image's pg_hba.conf with the same result, here for reference.

local   all             all                                     trust
# IPv4 local connections:
host    all             all             127.0.0.1/32            trust
# IPv6 local connections:
host    all             all             ::1/128                 trust
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     trust
host    replication     all             127.0.0.1/32            trust
host    replication     all             ::1/128                 trust

Now that I think of it, needing a database superuser account to perform the upgrade makes sense (although it's hella annoying for my usecase), and the postgres container makes the POSTGRES_USER account a superuser 🤔
The default superuser is defined by the POSTGRES_USER environment variable.

So yeah, this part is PEBKAC. I am supplying a regular user to pgautoupgrade. I will find a way to reset the root password. I now wonder if the pgautoupgrade process even needs credentials at all, as it could just modify pghba to let itself in and do the upgrade.

Thanks for your help, awesome project

@p4block
Copy link

p4block commented Oct 5, 2024

Yup. Got it to work.
To recap, bitnami user is not superuser unlike posgresql POSTGRES_USER , superuser is required to do the upgrade unless pgautoupgrade did some trickery to make a ephemeral superuser account or something like that

Using the stock postgres container I reset the password for the postgres user. This step isn't needed if you haven't long lost your postgres superuser password and it matches the one in the postgres-secret, which is what should happen tbh.

[...]
    lifecycleHooks:
      postStart:
        exec:
          command:
            - /bin/sh
            - -c
            - | 
              echo "Waiting a bit"
              sleep 5
              echo "Copying configuration files for later use..."
              cp /opt/bitnami/postgresql/conf/postgresql.conf /bitnami/postgresql/data/
              cp /opt/bitnami/postgresql/conf/pg_hba.conf /bitnami/postgresql/data/

              # Fix the config file
              sed -i '/pgaudit/d' /bitnami/postgresql/data/postgresql.conf
              sed -i '/conf.d/d' /bitnami/postgresql/data/postgresql.conf

              # Allow root user in container to log in without password
              # Required for the upgrade process
              sed -i "s/\(local\s\+all\s\+all\s\+\)md5/\1trust/" /bitnami/postgresql/data/pg_hba.conf

    initContainers:
      - image: pgautoupgrade/pgautoupgrade:17-bookworm
        name: upgrade-postgres
        securityContext:
          runAsUser: 0
        env:
          - name: PGAUTO_ONESHOT
            value: "yes"
          - name: POSTGRES_DB
            value: "freshrss"
          - name: POSTGRES_PASSWORD
            valueFrom:
              secretKeyRef:
                name: freshrss-postgresql
                key: "postgres-password"
        volumeMounts:
          - name: data # name of the volume in the bitnami pod
            mountPath: /var/lib/postgresql/data/
            subPath: data
        command:
          - /bin/sh
        args:
          - -c
          - |
            /usr/local/bin/docker-entrypoint.sh postgres

            echo "Setting postgresql data dir ownership to default bitnami user"
            chown -R 1001:1001 /var/lib/postgresql/data/

            # Hang indefinitely to allow debugging
            # echo "Init container completed. Entering debug mode..."
            # sleep infinity

This should work if you also manually put the permissive pg_hba.conf and bitnami's stock postgresql.conf in the postgres folder.

Shipping some example postgresql.conf and pg_hba.conf, and maybe even ignoring existing auth patterns altogether and jamming the upgrade process (maybe optional) would be a nice feature I wouldn't complain about 😄

@spwoodcock
Copy link

Nice work @p4block !

Looks like great info to make a wiki article from 😁

(although as you say, perhaps in the future pgautoupgrade could bypass auth / superuser requirement during the upgrade, making the process easier)

@justinclift
Copy link
Member

Cool. Yeah, it sounds like with a bit of patience and testing stuff we should be able to automate this.

Hopefully you're up for a bit of back-and-forth communication and experimentation @p4block?

On that note, with the rg bitnami command, I'm not familiar with rg. Is that a kubernetes thing or a bitnami thing? If it's a bitnami thing, is it a command that we should be able to see exists from our docker scripting which does the upgrading?

@justinclift
Copy link
Member

@p4block Oh, on a related topic does your database use any PG extensions? ie stuff loaded using CREATE EXTENSION?

We don't have the updating of those automated yet (there's the beginnings of a PR for it), so if you do you'll still need to manually check if they need updating.

@spwoodcock Which reminds me, it's probably not a bad idea for us to have some kind of "things you should still do post-upgrade" wiki page, until the extension upgrading is automated too.

Any interest in throwing something like that together?

@p4block
Copy link

p4block commented Oct 5, 2024

@justinclift rg is grep in rust, I just tried to look for signs of bitnami

I can more or less go back and forth these days, I am very interested in getting this project more developed.

I do have one postgres that uses extensions (the one for Immich) and it uses bitnami chart but with pgvecto-rs container. I haven't tackled it yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants