Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul postgres upgrade process v14 --> v16 #86

Merged
merged 5 commits into from
Jul 25, 2024

Conversation

spwoodcock
Copy link
Member

@spwoodcock spwoodcock commented Jul 23, 2024

Follow on from #85

  • Added trap to cleanup properly on failure.
  • Simplified code.
  • Added extra logs.
  • Added timeout for db-check-upgrade.

Issue

  • The db-check-upgrade currently hangs on the Github runner: https://github.com/hotosm/Drone-TM/actions/runs/10058359643/job/27801399496
  • Perhaps the exit codes / signals are not handled properly? Looking into it.
  • The upgrade from v14 --> v16 on the server was also probably bodged, as I forgot to make the pg-upgrade image public! If there are issues & you want me to look into it, add my pub key to the server / ping me on slack 👍

Update 24/07/2024

  • I overhauled this to be simpler / cleaner.
  • It should work with the CI, not causing issues with timeouts.
  • No bash script required, just docker-compose.yml.

@spwoodcock spwoodcock self-assigned this Jul 23, 2024
@nischalstha9
Copy link
Collaborator

Tested your code on my local machine. Encountered an issue where the script exited with Database is already upgraded. Skipping. in case docker volume drone-tm-pg-16-data was missing.
image
Then with creation of volume drone-tm-pg-16-data manually, the script succeed with upgrade to postgres-16.

Also I want to mention that the new database creation failed after the upgrade process with error:
template database "template1" has a collation version, but no actual collation version could be determined.
Probably this occurred as the template1 was from Postgres-14.
image

@spwoodcock
Copy link
Member Author

spwoodcock commented Jul 24, 2024

Thanks for debugging!

I will add:

# Check if required docker volume exists
docker volume inspect drone-tm-pg-16-data
exit_code=$?
# Exit script volume does not exist
if [ "$exit_code" -eq 1 ]; then
    echo
    echo "Please create docker volume 'drone-tm-pg-16-data' first."
    echo
    echo -e "\e[0;33mdocker volume create drone-tm-pg-16-data\e[0m"
    echo
    exit 0
fi

image

I was also receiving similar collation errors, but it was fixed after I restarted the db.
Does it also fix it for you?

If not, I can fix this in the final container when I do the dbvacuum.

@spwoodcock
Copy link
Member Author

@nrjadkry the --timeout 120 added to the db upgrade check should hopefully fix the CI issue

@nischalstha9
Copy link
Collaborator

@spwoodcock the test for required volume is working in the script. But I am still facing collation errors even after restarting the db container.

@spwoodcock spwoodcock changed the title Improve pg-upgrade.sh bash script Overhaul postgres upgrade process v14 --> v16 Jul 24, 2024
@spwoodcock spwoodcock requested review from nischalstha9 and removed request for nischalstha9 July 24, 2024 20:34
@spwoodcock
Copy link
Member Author

Found the issue and documented here!
pgautoupgrade/docker-pgautoupgrade#17 (comment)

The issue was using a Debian based pgupgrade on an Alpine based db image.

I overhauled the upgrade process to really slim it down.
It now works entirely using Alpine based images and Alpine pgupgrade process 👍

Is there any data in the prod db we need to recover?
The bodged upgrade may make the db unusable, but we can probably get the data out if needed.

@spwoodcock spwoodcock force-pushed the build/pg-upgrade-bash-script branch from 54ce412 to ed17875 Compare July 24, 2024 22:38
Copy link
Collaborator

@nischalstha9 nischalstha9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issues Faced in my local machine

Flag --abort-on-container-failure was not found on my docker version of Docker version 24.0.6, build ed223bc

image

Also the script breaks with role root doesn't exists for localhost. But, when I compose down both running upgrade and pg:14 db and then run compose up with, I can see the database is upgraded to version 16 with old data too. Also coalition issue is resolved too.

image image

@spwoodcock
Copy link
Member Author

spwoodcock commented Jul 25, 2024

The first issue is actually to do with docker compose version. I'm using v2.27.0.
The Github runner images should have a pretty up to date version depending on when they were last built: https://github.com/actions/runner-images/blob/abe90c161995c1485e54e0602d4833fb12889427/images/ubuntu/scripts/build/install-docker.sh

The second about role 'root' can be fixed by running the command from the repo main directory:

docker compose --file contrib/pg-upgrade/docker-compose.yml \
  --abort-on-container-failure

If you run from the contrib/pg-upgrade dir, then the .env file will not be picked up and the POSTGRES_USER var will not be included in the script.

@spwoodcock
Copy link
Member Author

I just added an update to properly use the vars:

    command:
      - |
        gosu postgres pg_ctl start -D /var/lib/postgresql/data

        # Upgrade PostGIS extension
        PGPASSWORD=${POSTGRES_PASSWORD:-dtm} \
        psql --host=localhost --username=${POSTGRES_USER:-dtm} \
        ${POSTGRES_DB:-dtm_db} -c '
          ALTER EXTENSION "postgis" UPDATE;
        '

        # Rebuild statistics use vacuum
        PGPASSWORD=${POSTGRES_PASSWORD:-dtm} \
        vacuumdb \
        --host=localhost \
        --username=${POSTGRES_USER:-dtm} \
        --all \
        --analyze-in-stages

I added defaults to the bash variables, so this should actually work when running in the contrib/pg-upgrade dir now, but you probably need to specify: --env-file ../../.env

Copy link
Collaborator

@nischalstha9 nischalstha9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solution is working great! It now upgrades database perfectly. I think it's good to go live now! 👍 🎉

@spwoodcock spwoodcock merged commit 692a5d4 into main Jul 25, 2024
1 check failed
@spwoodcock spwoodcock deleted the build/pg-upgrade-bash-script branch July 25, 2024 12:12
@nrjadkry
Copy link
Collaborator

@nischalstha9
Copy link
Collaborator

Looks like exit code 1 on db-check-upgrade service is messing up the CI. In my opinion, the db upgrade action should be optionally run rather than running on every build. What's your idea on this @spwoodcock ?

@spwoodcock
Copy link
Member Author

If you can think of a nice way to do that please go ahead @nischalstha9 🙏

I will come back to work on this later today / tomorrow 👍

@spwoodcock
Copy link
Member Author

To fix the issue in the CI we can simply use || true on the docker compose command

@spwoodcock
Copy link
Member Author

Just made an update that should fix this:

          docker compose --file contrib/pg-upgrade/docker-compose.yml \
            --env-file .env up || true

If not, let me know I will look into it tomorrow 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants