stolon-keper doesn't stop recovery after reaching recovery point #798

pyhalov · 2020-09-20T09:58:40Z

What happened:
I've applied the following config:

 cat ~/stolon_restore.json 
{
         "initMode": "pitr",
         "synchronousReplication": true,
         "minSynchronousStandbys": 0,
         "maxSynchronousStandbys": 1,
         "pgParameters":{
            "log_statement":"all",
            "log_line_prefix":"%m [%r][%p]",
            "log_min_messages":"INFO",
            "wal_level":"logical",
            "max_prepared_transactions":"200",
         },
         "pitrConfig": { 
                "dataRestoreCommand": "cp -r /path/to/backup/* %d" , 
                "archiveRecoverySettings": {
                    "restoreCommand": "cp \"/path/to/backup/pg_wal/%f\" \"%p\""
                },
               "recoveryTargetSettings": { "recoveryTargetName": "_sharded_backup_1600524319"}
         } 
}

PostgreSQL recovered, but after recovery was restarted in recovery mode, tried to fetch new wal files, failed to reach recovery target and exited.


Sep 20 09:30:50 vg1 stolon-keeper[1383]: 2020-09-20 09:30:50.975 MSK [][1938]LOG:  database system is ready to accept connections
Sep 20 09:30:51 vg1 stolon-keeper[1383]: 2020-09-20T09:30:51.619+0300        INFO        cmd/keeper.go:1250        recovery completed
Sep 20 09:30:51 vg1 stolon-keeper[1383]: 2020-09-20 09:30:51.631 MSK [[local]][1959]LOG:  statement: select 1
Sep 20 09:30:51 vg1 stolon-keeper[1383]: 2020-09-20 09:30:51.637 MSK [[local]][1960]LOG:  statement: select 1 from information_schema.tables where table_schema = 'pg_catalog' and table_name = 'pg_file_settings'
Sep 20 09:30:51 vg1 stolon-keeper[1383]: 2020-09-20 09:30:51.645 MSK [[local]][1960]LOG:  statement: select name, setting from pg_file_settings where name IS NOT NULL and setting IS NOT NULL
Sep 20 09:30:51 vg1 stolon-keeper[1383]: 2020-09-20T09:30:51.657+0300        INFO        postgresql/postgresql.go:384        stopping database
Sep 20 09:30:51 vg1 stolon-proxy[1353]: 2020-09-20T09:30:51.658+0300        INFO        cmd/proxy.go:272        master address        {"address": "127.0.0.1:5443"}
Sep 20 09:30:51 vg1 stolon-keeper[1383]: 2020-09-20 09:30:51.659 MSK [][1938]LOG:  received fast shutdown request
Sep 20 09:30:51 vg1 stolon-keeper[1383]: 2020-09-20 09:30:51.659 MSK [][1938]LOG:  aborting any active transactions
Sep 20 09:30:51 vg1 stolon-keeper[1383]: 2020-09-20 09:30:51.661 MSK [][1938]LOG:  background worker "logical replication launcher" (PID 1958) exited with exit code 1
Sep 20 09:30:51 vg1 stolon-keeper[1383]: waiting for server to shut down....2020-09-20 09:30:51.672 MSK [][1947]LOG:  shutting down
Sep 20 09:30:51 vg1 stolon-proxy[1353]: 2020-09-20T09:30:51.689+0300        INFO        cmd/proxy.go:290        proxying to master address        {"address": "127.0.0.1:5443"}
Sep 20 09:30:51 vg1 stolon-keeper[1383]: 2020-09-20 09:30:51.703 MSK [][1938]LOG:  database system is shut down
Sep 20 09:30:51 vg1 stolon-keeper[1383]:  done
Sep 20 09:30:51 vg1 stolon-keeper[1383]: server stopped
Sep 20 09:30:51 vg1 stolon-keeper[1383]: 2020-09-20T09:30:51.774+0300        INFO        cmd/keeper.go:1484        our db requested role is master
Sep 20 09:30:51 vg1 stolon-keeper[1383]: 2020-09-20T09:30:51.785+0300        INFO        postgresql/postgresql.go:858        writing recovery signal file
Sep 20 09:30:51 vg1 stolon-keeper[1383]: 2020-09-20T09:30:51.785+0300        INFO        postgresql/postgresql.go:319        starting database

Look at writing recovery signal file line - keeper shouldn't have written recovery signal file after postgres restart

What you expected to happen:
I've expected postgresql to recover and start accepting queries.

Environment:

Stolon version:
Head

The text was updated successfully, but these errors were encountered:

Fixes sorintlab#798

pyhalov added the bug label Sep 20, 2020

pyhalov pushed a commit to pyhalov/stolon that referenced this issue Sep 20, 2020

Remove recoverey options after finishing PITR recoovery

1fa024d

Fixes sorintlab#798

pyhalov linked a pull request Sep 20, 2020 that will close this issue

Remove recovery options after finishing PITR #799

Open

pyhalov pushed a commit to pyhalov/stolon that referenced this issue Sep 20, 2020

Remove recoverey options after finishing PITR recoovery

c5619f2

Fixes sorintlab#798

pyhalov pushed a commit to pyhalov/stolon that referenced this issue Sep 20, 2020

Remove recovery options after finishing PITR recoovery

e83a4c6

Fixes sorintlab#798

pyhalov added a commit to pyhalov/stolon that referenced this issue Sep 20, 2020

Remove recovery options after finishing PITR recoovery

b2bcd93

Fixes sorintlab#798

pyhalov added a commit to pyhalov/stolon that referenced this issue Sep 20, 2020

Remove recovery options after finishing PITR recoovery

0415c9c

Fixes sorintlab#798

pyhalov added a commit to pyhalov/stolon that referenced this issue Sep 20, 2020

Remove recovery options after finishing PITR recoovery

876a69a

Fixes sorintlab#798

pyhalov added a commit to pyhalov/stolon that referenced this issue Sep 20, 2020

Remove recovery options after finishing PITR

697c3af

Fixes sorintlab#798

pyhalov added a commit to pyhalov/stolon that referenced this issue Sep 20, 2020

Remove recovery options after finishing PITR

32faafc

Fixes sorintlab#798

pyhalov added a commit to pyhalov/stolon that referenced this issue Sep 20, 2020

Remove recovery options after finishing PITR

4d4cfca

Fixes sorintlab#798

pyhalov added a commit to postgrespro/stolon that referenced this issue Sep 23, 2020

Remove recovery options after finishing PITR

42e68fd

Fixes sorintlab#798

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stolon-keper doesn't stop recovery after reaching recovery point #798

stolon-keper doesn't stop recovery after reaching recovery point #798

pyhalov commented Sep 20, 2020

stolon-keper doesn't stop recovery after reaching recovery point #798

stolon-keper doesn't stop recovery after reaching recovery point #798

Comments

pyhalov commented Sep 20, 2020