Skip to content

surgeventures/db_upgrade_setup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Zero-Downtime Database Switchover Tool

Zero-downtime PostgreSQL database switchover with physical replicas and Debezium CDC connectors support. Handles read-only and read-write traffic separately via PgBouncer for minimal application disruption.

Features

  • Zero-downtime switchover using PgBouncer
  • Multiple physical replicas support
  • Debezium CDC connector management
  • Replication lag monitoring
  • Sequence synchronization
  • Separate read-only/read-write traffic handling
  • Forward and reverse switchover
  • Test environment with sample Elixir app
  • Continuous CDC monitoring
  • Read/write load simulation

Prerequisites

  • Docker and Docker Compose
  • PostgreSQL with logical replication
  • PgBouncer
  • Kafka Connect with Debezium
  • yq tool for YAML processing
  • psql client
  • Elixir (for test app)

Project Structure

db_upgrade_setup/
├── cdc_output/           # CDC and test load output logs
├── configs/
│   ├── pgbouncer.ini    # PgBouncer configuration  
│   ├── switchover-config.yaml # Main switchover configuration
│   └── userlist.txt     # PgBouncer user credentials
├── docker/              # Docker-related files
├── elixir_app/          # Sample Elixir application for testing
├── lib/                 # Core library functions
│   ├── config.sh
│   ├── debezium.sh 
│   ├── error_handler.sh
│   ├── health_checks.sh
│   ├── logging.sh
│   ├── pgbouncer.sh
│   ├── replication.sh
│   └── sequences.sh
└── scripts/
    ├── continuous-cdc.sh    # CDC monitoring script
    ├── copy_db.sh          # Database copy utility
    ├── init_replication.sh # Replication setup
    ├── seeds.sh           # Database seeding
    ├── switchover.sh      # Main switchover script
    ├── test-setup.sh      # Test environment setup
    ├── upgrade_pg.sh      # Database upgrade script
    └── run.sh            # Main wrapper script

Configuration

Configuration file switchover-config.yaml:

source:
  internal_name: postgres_source
  host: localhost
  port: 5433
  replicas:
    - name: postgres_source_ro_1
    - name: postgres_source_ro_2

target:
  internal_name: postgres_target
  host: localhost
  port: 5434
  replicas:
    - name: postgres_target_ro_1
    - name: postgres_target_ro_2

database:
  name: testdb
  user: testuser
  password: testpass

connectors:
  - name: postgres-connector
    slot_name: debezium
    publication_name: dbz_publication

pgbouncer:
  config_file: pgbouncer.ini
  admin_port: 6433
  admin_user: testuser
  admin_password: testpass
  admin_database: pgbouncer
  pools:
    read_write:
      name: testdb_rw
    read_only:
      name: testdb_ro

kafka:
  connect_clusters:
    - name: cdc
      url: http://localhost:8083
    - name: outbox
      url: http://localhost:8084

replication:
  max_lag_bytes: 20000
  catchup_timeout: 60
  sync_sequences_gap: 100000

Quick Start

Main Commands

./run.sh <command> [options]

Commands:
  switchover      # Run database switchover
  test-setup      # Set up test environment
  help           # Show help message

Test Environment Setup

# Full setup with sample app and CDC monitoring
./run.sh test-setup --full

# Clean up everything
./run.sh test-setup --destroy

# Restart only target database
./run.sh test-setup --restart-target

The --full setup:

  1. Starts Docker containers (databases, Kafka)
  2. Seeds database with test data
  3. Launches Elixir test application
  4. Starts CDC monitoring
  5. Sets up read/write load simulation
  6. Initializes logging

Running Switchover

./run.sh switchover [options]

Options:
  -d, --direction        # forward|reverse
  -m, --mode            # full|readonly (default: full)
  -dbz, --debezium-mode # catchup|no-wait (default: no-wait)
  -c, --config          # config file path
  -h, --help           # show help

Switchover Modes

Full Switchover (-m full)

  • Switches all traffic
  • Manages replication slots and CDC
  • Syncs sequences

Read-only Switchover (-m readonly)

  • Switches only read traffic
  • For testing/gradual migration
  • No write impact

Debezium Mode (-dbz, --debezium-mode)

  • no-wait (default): Fast switchover with minimal CDC delay

    • Gives 5 seconds for CDC catchup
    • Creates new publication slot quickly
    • Small risk of duplicate events
    • Recommended when:
      • Quick switchover is priority
      • Application can handle duplicate events
      • Replication lag is typically low
  • catchup: Safe switchover with CDC consistency

    • Uses replication slot monitoring
    • Waits until replication lag below threshold
    • Ensures no event loss
    • Recommended when:
      • Data consistency is critical
      • Duplicate events must be avoided
      • Longer switchover time is acceptable

Monitoring

CDC Output

All monitoring data in cdc_output/:

  • CDC events
  • Read/write load stats
  • App metrics
  • Replication status

Test Application

Sample Elixir app helps verify:

  • Connection handling
  • Data consistency
  • Performance impact

Cleanup

./run.sh test-setup --destroy

Stops:

  • Docker containers
  • CDC monitoring
  • Load simulation
  • Test application

Troubleshooting

Common Issues

High Replication Lag

  • Check cdc_output logs
  • Adjust max_lag_bytes
  • Verify load settings

CDC Issues

  • Check continuous-cdc.sh logs
  • Verify Kafka Connect
  • Check Debezium config

App Errors

  • Check Elixir app logs
  • Verify connectivity
  • Check PgBouncer status

Support

For issues provide:

  • Error messages
  • Relevant logs
  • Reproduction steps
  • Environment details

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published