Skip to content

Commit

Permalink
Address flakiness of vtgate_vindex.prefixfanout tests (vitessio#10216)
Browse files Browse the repository at this point in the history
* Wait for vtgate and tablets to be healthy in prefixfanout tests

Signed-off-by: Matt Lord <[email protected]>

* Setup already waits for vtgate proceess to be healthy

Signed-off-by: Matt Lord <[email protected]>

* Let's give tablets more time to become healthy

Sometimes GitHub Actions is *super* slow and our tests should
still be able to pass.

Signed-off-by: Matt Lord <[email protected]>

* Wait longer and check more frequently

Signed-off-by: Matt Lord <[email protected]>

* Mark vtgate_vindex test as heavy

Signed-off-by: Matt Lord <[email protected]>

* 60s is a more than reasonable upper limit in tablet+mysqld startup

Signed-off-by: Matt Lord <[email protected]>

* Also rename 17->vtgate_general and mark as heavy

Signed-off-by: Matt Lord <[email protected]>

* Update test config to match workflow renames

Signed-off-by: Matt Lord <[email protected]>

* Rename 20 to xb_backup

And get related files aligned

Signed-off-by: Matt Lord <[email protected]>

* Actually wait for all tablets in all shards to be healthy

We were waiting for 1 replica tablet when the clsuter defined
for the test did not have any replica tablets.

Signed-off-by: Matt Lord <[email protected]>

* We need to reset the replica and rdonly table counts for each shard

Signed-off-by: Matt Lord <[email protected]>

* Add log msg and get rid of extra flags added

Signed-off-by: Matt Lord <[email protected]>

* run CI tests again

Signed-off-by: Matt Lord <[email protected]>

* run CI tests one last time for goodness sake

Signed-off-by: Matt Lord <[email protected]>

* Minor correction to new Info log msg

Signed-off-by: Matt Lord <[email protected]>
  • Loading branch information
mattlord authored May 5, 2022
1 parent 46cb467 commit f7d6c0d
Show file tree
Hide file tree
Showing 8 changed files with 90 additions and 37 deletions.
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# DO NOT MODIFY: THIS FILE IS GENERATED USING "make generate_ci_workflows"

name: Cluster (17)
name: Cluster (vtgate_general_heavy)
on: [push, pull_request]
concurrency:
group: format('{0}-{1}', ${{ github.ref }}, 'Cluster (17)')
group: format('{0}-{1}', ${{ github.ref }}, 'Cluster (vtgate_general_heavy)')
cancel-in-progress: true

env:
Expand All @@ -13,7 +13,7 @@ env:

jobs:
build:
name: Run endtoend tests on Cluster (17)
name: Run endtoend tests on Cluster (vtgate_general_heavy)
runs-on: ubuntu-18.04

steps:
Expand Down Expand Up @@ -93,8 +93,27 @@ jobs:
set -x
# Increase our local ephemeral port range as we could exhaust this
sudo sysctl -w net.ipv4.ip_local_port_range="22768 61999"
# Increase our open file descriptor limit as we could hit this
ulimit -n 65536
cat <<-EOF>>./config/mycnf/mysql57.cnf
innodb_buffer_pool_dump_at_shutdown=OFF
innodb_buffer_pool_load_at_startup=OFF
innodb_buffer_pool_size=64M
innodb_doublewrite=OFF
innodb_flush_log_at_trx_commit=0
innodb_flush_method=O_DIRECT
innodb_numa_interleave=ON
innodb_adaptive_hash_index=OFF
sync_binlog=0
sync_relay_log=0
performance_schema=OFF
slow-query-log=OFF
EOF
# run the tests however you normally do, then produce a JUnit XML file
eatmydata -- go run test.go -docker=false -follow -shard 17 | tee -a output.txt | go-junit-report -set-exit-code > report.xml
eatmydata -- go run test.go -docker=false -follow -shard vtgate_general_heavy | tee -a output.txt | go-junit-report -set-exit-code > report.xml
- name: Print test output and Record test result in launchable
if: steps.changes.outputs.end_to_end == 'true' && always()
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# DO NOT MODIFY: THIS FILE IS GENERATED USING "make generate_ci_workflows"

name: Cluster (vtgate_vindex)
name: Cluster (vtgate_vindex_heavy)
on: [push, pull_request]
concurrency:
group: format('{0}-{1}', ${{ github.ref }}, 'Cluster (vtgate_vindex)')
group: format('{0}-{1}', ${{ github.ref }}, 'Cluster (vtgate_vindex_heavy)')
cancel-in-progress: true

env:
Expand All @@ -13,7 +13,7 @@ env:

jobs:
build:
name: Run endtoend tests on Cluster (vtgate_vindex)
name: Run endtoend tests on Cluster (vtgate_vindex_heavy)
runs-on: ubuntu-18.04

steps:
Expand Down Expand Up @@ -93,8 +93,27 @@ jobs:
set -x
# Increase our local ephemeral port range as we could exhaust this
sudo sysctl -w net.ipv4.ip_local_port_range="22768 61999"
# Increase our open file descriptor limit as we could hit this
ulimit -n 65536
cat <<-EOF>>./config/mycnf/mysql57.cnf
innodb_buffer_pool_dump_at_shutdown=OFF
innodb_buffer_pool_load_at_startup=OFF
innodb_buffer_pool_size=64M
innodb_doublewrite=OFF
innodb_flush_log_at_trx_commit=0
innodb_flush_method=O_DIRECT
innodb_numa_interleave=ON
innodb_adaptive_hash_index=OFF
sync_binlog=0
sync_relay_log=0
performance_schema=OFF
slow-query-log=OFF
EOF
# run the tests however you normally do, then produce a JUnit XML file
eatmydata -- go run test.go -docker=false -follow -shard vtgate_vindex | tee -a output.txt | go-junit-report -set-exit-code > report.xml
eatmydata -- go run test.go -docker=false -follow -shard vtgate_vindex_heavy | tee -a output.txt | go-junit-report -set-exit-code > report.xml
- name: Print test output and Record test result in launchable
if: steps.changes.outputs.end_to_end == 'true' && always()
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# DO NOT MODIFY: THIS FILE IS GENERATED USING "make generate_ci_workflows"

name: Cluster (20)
name: Cluster (xb_backup)
on: [push, pull_request]
concurrency:
group: format('{0}-{1}', ${{ github.ref }}, 'Cluster (20)')
group: format('{0}-{1}', ${{ github.ref }}, 'Cluster (xb_backup)')
cancel-in-progress: true

env:
Expand All @@ -19,7 +19,7 @@ env:

jobs:
build:
name: Run endtoend tests on Cluster (20)
name: Run endtoend tests on Cluster (xb_backup)
runs-on: ubuntu-18.04

steps:
Expand Down Expand Up @@ -112,7 +112,7 @@ jobs:
set -x
# run the tests however you normally do, then produce a JUnit XML file
eatmydata -- go run test.go -docker=false -follow -shard 20 | tee -a output.txt | go-junit-report -set-exit-code > report.xml
eatmydata -- go run test.go -docker=false -follow -shard xb_backup | tee -a output.txt | go-junit-report -set-exit-code > report.xml
- name: Print test output and Record test result in launchable
if: steps.changes.outputs.end_to_end == 'true' && always()
Expand Down
31 changes: 19 additions & 12 deletions go/test/endtoend/cluster/cluster_process.go
Original file line number Diff line number Diff line change
Expand Up @@ -683,25 +683,32 @@ func (cluster *LocalProcessCluster) RestartVtgate() (err error) {
return err
}

// WaitForTabletsToHealthyInVtgate waits for all tablets in all shards to be healthy as per vtgate
// WaitForTabletsToHealthyInVtgate waits for all tablets in all shards to be seen as
// healthy and serving in vtgate.
// For each shard:
// - It must have 1 (and only 1) healthy primary tablet so we always wait for that
// - For replica and rdonly tablets, which are optional, we wait for as many as we
// should have based on how the cluster was defined.
func (cluster *LocalProcessCluster) WaitForTabletsToHealthyInVtgate() (err error) {
var isRdOnlyPresent bool
for _, keyspace := range cluster.Keyspaces {
for _, shard := range keyspace.Shards {
isRdOnlyPresent = false
if err = cluster.VtgateProcess.WaitForStatusOfTabletInShard(fmt.Sprintf("%s.%s.primary", keyspace.Name, shard.Name), 1); err != nil {
return err
rdonlyTabletCount, replicaTabletCount := 0, 0
for _, tablet := range shard.Vttablets {
switch strings.ToLower(tablet.Type) {
case "replica":
replicaTabletCount++
case "rdonly":
rdonlyTabletCount++
}
}
if err = cluster.VtgateProcess.WaitForStatusOfTabletInShard(fmt.Sprintf("%s.%s.replica", keyspace.Name, shard.Name), 1); err != nil {
if err = cluster.VtgateProcess.WaitForStatusOfTabletInShard(fmt.Sprintf("%s.%s.primary", keyspace.Name, shard.Name), 1); err != nil {
return err
}
for _, tablet := range shard.Vttablets {
if tablet.Type == "rdonly" {
isRdOnlyPresent = true
}
if replicaTabletCount > 0 {
err = cluster.VtgateProcess.WaitForStatusOfTabletInShard(fmt.Sprintf("%s.%s.replica", keyspace.Name, shard.Name), replicaTabletCount)
}
if isRdOnlyPresent {
err = cluster.VtgateProcess.WaitForStatusOfTabletInShard(fmt.Sprintf("%s.%s.rdonly", keyspace.Name, shard.Name), 1)
if rdonlyTabletCount > 0 {
err = cluster.VtgateProcess.WaitForStatusOfTabletInShard(fmt.Sprintf("%s.%s.rdonly", keyspace.Name, shard.Name), rdonlyTabletCount)
}
if err != nil {
return err
Expand Down
4 changes: 3 additions & 1 deletion go/test/endtoend/cluster/vtgate_process.go
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,9 @@ func (vtgate *VtgateProcess) GetStatusForTabletOfShard(name string, endPointsCou
// WaitForStatusOfTabletInShard function waits till status of a tablet in shard is 1
// endPointsCount: how many endpoints to wait for
func (vtgate *VtgateProcess) WaitForStatusOfTabletInShard(name string, endPointsCount int) error {
timeout := time.Now().Add(15 * time.Second)
log.Infof("Waiting for healthy status of %d %s tablets in cell %s",
endPointsCount, name, vtgate.Cell)
timeout := time.Now().Add(30 * time.Second)
for time.Now().Before(timeout) {
if vtgate.GetStatusForTabletOfShard(name, endPointsCount) {
return nil
Expand Down
6 changes: 6 additions & 0 deletions go/test/endtoend/vtgate/prefixfanout/main_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -140,10 +140,16 @@ func TestMain(m *testing.M) {
}

// Start vtgate
// This waits for the vtgate process to be healthy
if err := clusterInstance.StartVtgate(); err != nil {
return 1
}

// Wait for the cluster to be running and healthy
if err := clusterInstance.WaitForTabletsToHealthyInVtgate(); err != nil {
return 1
}

return m.Run()
}()
os.Exit(exitCode)
Expand Down
8 changes: 4 additions & 4 deletions test/ci_workflow_gen.go
Original file line number Diff line number Diff line change
Expand Up @@ -52,9 +52,9 @@ var (
"ers_prs_newfeatures_heavy",
"15",
"shardedrecovery_stress_verticalsplit_heavy",
"17",
"vtgate_general_heavy",
"19",
"20",
"xb_backup",
"21",
"22",
"worker_vault_heavy",
Expand Down Expand Up @@ -92,7 +92,7 @@ var (
"vtgate_topo_etcd",
"vtgate_transaction",
"vtgate_unsharded",
"vtgate_vindex",
"vtgate_vindex_heavy",
"vtgate_vschema",
"vtgate_queries",
"vtgate_schema_tracker",
Expand All @@ -116,7 +116,7 @@ var (
}
clusterDockerList = []string{}
clustersRequiringXtraBackup = []string{
"20",
"xb_backup",
"xb_recovery",
}
clustersRequiringMakeTools = []string{
Expand Down
16 changes: 8 additions & 8 deletions test/config.json
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@
"Args": ["vitess.io/vitess/go/test/endtoend/backup/xtrabackup"],
"Command": [],
"Manual": false,
"Shard": "20",
"Shard": "xb_backup",
"RetryMax": 2,
"Tags": []
},
Expand All @@ -161,7 +161,7 @@
"Args": ["vitess.io/vitess/go/test/endtoend/backup/xtrabackupstream"],
"Command": [],
"Manual": false,
"Shard": "20",
"Shard": "xb_backup",
"RetryMax": 1,
"Tags": []
},
Expand Down Expand Up @@ -647,7 +647,7 @@
"Args": ["vitess.io/vitess/go/test/endtoend/vtgate"],
"Command": [],
"Manual": false,
"Shard": "17",
"Shard": "vtgate_general_heavy",
"RetryMax": 2,
"Tags": []
},
Expand Down Expand Up @@ -809,7 +809,7 @@
"Args": ["vitess.io/vitess/go/test/endtoend/vtgate/sequence"],
"Command": [],
"Manual": false,
"Shard": "17",
"Shard": "vtgate_general_heavy",
"RetryMax": 1,
"Tags": []
},
Expand Down Expand Up @@ -926,7 +926,7 @@
"Args": ["vitess.io/vitess/go/test/endtoend/vtgate/createdb_plugin"],
"Command": [],
"Manual": false,
"Shard": "17",
"Shard": "vtgate_general_heavy",
"RetryMax": 1,
"Tags": []
},
Expand Down Expand Up @@ -989,7 +989,7 @@
"Args": ["vitess.io/vitess/go/test/endtoend/vtgate/errors_as_warnings"],
"Command": [],
"Manual": false,
"Shard": "17",
"Shard": "vtgate_general_heavy",
"RetryMax": 1,
"Tags": []
},
Expand All @@ -998,7 +998,7 @@
"Args": ["vitess.io/vitess/go/test/endtoend/vtgate/prefixfanout"],
"Command": [],
"Manual": false,
"Shard": "vtgate_vindex",
"Shard": "vtgate_vindex_heavy",
"RetryMax": 1,
"Tags": []
},
Expand All @@ -1007,7 +1007,7 @@
"Args": ["vitess.io/vitess/go/test/endtoend/vtgate/vindex_bindvars"],
"Command": [],
"Manual": false,
"Shard": "vtgate_vindex",
"Shard": "vtgate_vindex_heavy",
"RetryMax": 2,
"Tags": []
},
Expand Down

0 comments on commit f7d6c0d

Please sign in to comment.