Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix switchover when many connections appear #132

Merged
merged 5 commits into from
Sep 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,7 @@ replication_repair_max_attempts: 3
external_replication_type: off
show_only_gtid_diff: False
force_switchover: False
```

### Usage
Expand Down
15 changes: 15 additions & 0 deletions internal/app/app.go
Original file line number Diff line number Diff line change
Expand Up @@ -1208,6 +1208,21 @@ func (app *App) performSwitchover(clusterState map[string]*NodeState, activeNode
}
node := app.cluster.Get(host)
// in case node is a master

if app.config.ForceSwitchover {
err := node.SetOfflineForce()
if err != nil {
return fmt.Errorf("failed to set node %s force offline: %v", host, err)
}

defer func() {
err := node.SetOnline()
if err != nil {
app.logger.Errorf("failed to set node %s online after setting force offline: %v", host, err)
}
}()
}

err := node.SetReadOnly(true)
if err != nil || app.emulateError("freeze_ro") {
app.logger.Infof("switchover: failed to set node %s read-only, trying kill bad queries: %v", host, err)
Expand Down
2 changes: 2 additions & 0 deletions internal/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@ type Config struct {
ReplMonErrorWaitInterval time.Duration `config:"repl_mon_error_wait_interval" yaml:"repl_mon_error_wait_interval"`
ReplMonSlaveWaitInterval time.Duration `config:"repl_mon_slave_wait_interval" yaml:"repl_mon_slave_wait_interval"`
ShowOnlyGTIDDiff bool `config:"show_only_gtid_diff" yaml:"show_only_gtid_diff"`
ForceSwitchover bool `config:"force_switchover" yaml:"force_switchover"` // TODO: Remove when we will be sure it's right way to do switchover
}

// DefaultConfig returns default configuration for MySync
Expand Down Expand Up @@ -182,6 +183,7 @@ func DefaultConfig() (Config, error) {
ReplMonErrorWaitInterval: 10 * time.Second,
ReplMonSlaveWaitInterval: 10 * time.Second,
ShowOnlyGTIDDiff: false,
ForceSwitchover: false,
}
return config, nil
}
Expand Down
8 changes: 8 additions & 0 deletions internal/mysql/node.go
Original file line number Diff line number Diff line change
Expand Up @@ -833,6 +833,14 @@ func (n *Node) SetOnline() error {
return n.exec(queryDisableOfflineMode, nil)
}

func (n *Node) SetOfflineForce() error {
err := n.SemiSyncDisable()
if err != nil {
return err
}
return n.SetOffline()
}

// ChangeMaster changes master of MySQL Node, demoting it to slave
func (n *Node) ChangeMaster(host string) error {
useSsl := 0
Expand Down
10 changes: 9 additions & 1 deletion tests/features/switchover_from.feature
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,11 @@ Feature: manual switchover from old master
}
"""

Scenario: if switchover was approved, it will not be rejected
Scenario Outline: if switchover was approved, it will not be rejected
Given cluster environment is
"""
FORCE_SWITCHOVER=<force_switchover>
"""
Given cluster is up and running
Then zookeeper node "/test/active_nodes" should match json_exactly within "20" seconds
"""
Expand Down Expand Up @@ -71,6 +75,10 @@ Feature: manual switchover from old master
}
}
"""
Examples:
| force_switchover |
| true |
| false |

Scenario Outline: switchover from works on healthy cluster
Given cluster environment is
Expand Down
11 changes: 9 additions & 2 deletions tests/features/switchover_to.feature
Original file line number Diff line number Diff line change
@@ -1,7 +1,11 @@
Feature: manual switchover to new master


Scenario: switchover on kill all running query on old master
Scenario Outline: switchover on kill all running query on old master
Given cluster environment is
"""
FORCE_SWITCHOVER=<force_switchover>
"""
Given cluster is up and running
Then mysql host "mysql1" should be master
And mysql host "mysql2" should be replica of "mysql1"
Expand Down Expand Up @@ -54,6 +58,10 @@ Feature: manual switchover to new master
And mysql host "mysql3" should have variable "rpl_semi_sync_master_enabled" set to "0"
And mysql replication on host "mysql3" should run fine within "3" seconds
And mysql host "mysql3" should be read only
Examples:
| force_switchover |
| true |
| false |

Scenario Outline: switchover to works on healthy cluster
Given cluster environment is
Expand Down Expand Up @@ -274,4 +282,3 @@ Feature: manual switchover to new master
"""
mysql2 is not active
"""

1 change: 1 addition & 0 deletions tests/images/mysql/mysync.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -62,3 +62,4 @@ replication_channel: ''
external_replication_type: 'external'
show_only_gtid_diff: false
repl_mon: ${REPL_MON:-false}
force_switchover: ${FORCE_SWITCHOVER:-false}
Loading