-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pxc 4469 sst clone #1961
base: 8.0
Are you sure you want to change the base?
Pxc 4469 sst clone #1961
Conversation
This version has a patch to go straight to ist so is not fully functional unless removing that bypass
SSL is NOT supported
bug. Wrong comma
fixing config missed wsrep allowed methods
fixing config missed wsrep allowed methods
fixing config missed wsrep allowed methods
…tcat is not running
I have a question... |
Need to ad extension or make script will not identify them correctly
…the user Some edit/cleanup as suggested by Kamil
@@ -1,7 +1,7 @@ | |||
--echo Performing State Transfer on a server that has been killed and restarted | |||
--echo while a DDL was in progress on it | |||
|
|||
--source include/have_debug.inc | |||
#source include/have_debug.inc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will cause another tests to fail
[sst] | ||
netcat_port=4442 | ||
wsrep-debug=true | ||
clone_instance_port=4444 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we need clone_instance_port
parameter at all. Please see following comments in sst script for justification/reasoning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do.
This may go through firewall in real life, and the use of port and ranges can be regulated. So better to keep the flexibility to define ports from the beginning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have a way to configure this port.
See my other comment about wsrep_sst_receive_address
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should remove these new params from cnf after rework
|
||
|
||
|
||
GRANT ALTER, CREATE, SELECT, INSERT ON PERCONA_SCHEMA.xtrabackup_history TO 'mysql.pxc.sst.role'@localhost; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No Idea it was there, not related to changes I have add.
@@ -1230,6 +1230,8 @@ int wsrep_remove_sst_user(bool initialize_thread) { | |||
nullptr, | |||
"SET SESSION lock_wait_timeout = 1;", | |||
nullptr, | |||
"REVOKE mysql.pxc.sst.role FROM 'mysql.pxc.sst.user'@'localhost';", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discuss this.
The correct way to remove the user when having a role, is to revoke the role and then drop the user.
Just dropping the user will work, but the role cleanup is delegated to MySQL, and as we have identified with mtr tests on very fast machines this looks like to be asynchronous.
So to avoid problems and to be on the safe side, we should operate following the right execution. Role first then drop user.
As we spoke today, we need to test it with encrypted tables and keyring component/plugin enabled |
- removed the need to perform the restart to clean up the final instance. - use wsrep-sst-receive-address to get IP and port definition - use one port for Netcat and MySQL - some code cleanup - Add message suppression when reporting clone Status %. It will be reported only if different from previous value.
If you outline what kind of operation are you expecting to test when talking about data encryption I can see how clone deal with it. Also if think this can be step 2 |
@@ -0,0 +1,1113 @@ | |||
#!/usr/bin/env bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the wsrep_sst*.sh
files have 775 as the default file permission. Can you also do the same for wsrep_sst_clone.sh
as well?
-rw-rw-r-- 1 venki venki 42263 Nov 21 22:17 wsrep_sst_clone.sh
-rwxrwxr-x 1 venki venki 35144 Jan 26 2024 wsrep_sst_common.sh*
-rwxrwxr-x 1 venki venki 23466 Jun 28 2023 wsrep_sst_rsync.sh*
-rwxrwxr-x 1 venki venki 102235 Mar 7 2024 wsrep_sst_xtrabackup-v2.sh*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can also if I was expecting to have these thing standardized by the building merging process. but ok will do
CALL mtr.add_suppression("Found an entry in the 'role_edges' table with unknown authorization ID '`mysql.pxc.sst.user`@`localhost`'" ); | ||
CALL mtr.add_suppression("Found an entry in the 'default_roles' table with unknown authorization ID '`mysql.pxc.sst.user`@`localhost`'"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, this error means that something is not correct with the mysql.pxc.sst.user
. We need to do figure out why we get this error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needed because you guys were not cleaning up the user correctly.
The change in wsrep_sst about revoke role is exactly for this.
# │ Joiner starts │ | ||
# └────────┬─────────┘ | ||
# ┌────────▼─────────┐ | ||
# │open NetCat listnr│ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# │open NetCat listnr│ | |
# │Open NetCat listnr│ |
readonly EINVAL=22 | ||
readonly EPIPE=32 | ||
readonly ETIMEDOUT=110 | ||
progress_monitor="" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
progress_monitor
appears unused. Please check once.
CLEANUP_CLONE_SSL="" | ||
CLONE_USER="" | ||
NC_PID="" | ||
RP_PURGED_EMERGENCY="" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a comment what's this variable is for?
|
||
# we stay on hold now, waiting for the Joiner to expose the service | ||
wsrep_log_info "-> WAIT for Joiner MySQL to be available nc -w 1 -i 1 $SST_HOST_STRIPPED $WSREP_SST_OPT_REMOTE_HOSTPORT" | ||
while [ 1 == 1 ]; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better readability
while [ 1 == 1 ]; do | |
while true; do |
CLONE_SSL_CA="NULL" | ||
else | ||
CLEANUP_CLONE_PLUGIN="" | ||
CLONE_SSL_CERT=`$MYSQL_ACLIENT -e "SELECT @@clone_ssl_cert"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my tests, the following command failed with the following error and thus SSL was not used.
2024-11-22T05:39:30.038722Z 0 [Note] [MY-000000] [WSREP-SST] ERROR 1193 (HY000) at line 1: Unknown system variable 'clone_ssl_cert'
Is there any race condition because of which this can happen?
if [ -z "$CLIENT_SSL_CERT" -o -z "$CLIENT_SSL_KEY" ] | ||
then | ||
CLIENT_SSL_CERT=$(parse_cnf client ssl_cert "") | ||
CLIENT_SSL_KEY=$(parse_cnf client ssl_key "") | ||
CLIENT_SSL_CA=$(parse_cnf client ssl_ca "") | ||
CLIENT_SSL_MODE=$(parse_cnf client ssl_mode "") | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a fallback method, why dont we read in the mysqld
section for the ssl certificates?
The XB SST script reads the certs from mysql section if it was not specified in the [sst] section
CLONE_EXECUTE_SQL=$(wsrep_mktemp_in_dir "$WSREP_SST_OPT_DATA" --suffix=.sql clone_execute_XXXX) | ||
CLONE_PREPARE_SQL=$(wsrep_mktemp_in_dir "$WSREP_SST_OPT_DATA" --suffix=.sql clone_prepare_XXXX) | ||
|
||
cleanup_donor # Remove potentially existing clone user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we call cleanup_donor
to remove the potentially existing clone user, however this will run a DROP USER IF EXISTS 'clone_sst'@'%'
in TOI and is replicated to other nodes as well. Is this something we want?
2024-11-22T08:58:32.975616Z 0 [Note] [MY-000000] [WSREP-SST] (debug) Cleanup MySQL ADMIN_PSWD: yx9!A-ezgMrQz5Q7HdKLwD7rEqHzJ0Ot
2024-11-22T08:58:32.975760Z 0 [Note] [MY-000000] [WSREP-SST] (debug) Cleanup MySQL MYSQL_ACLIENT: /home/venki/work/pxc/merge/80/bld/bin/mysql --no-defaults -umysql.pxc.sst.user -S/home/venki/work/pxc/merge/80/bld/mysql-test/var/tmp/mysqld.1.sock --batch --skip_column_names --silent
2024-11-22T08:58:32.986956Z 13 [Note] [MY-000000] [WSREP] wsrep_sync_wait: thd->variables.wsrep_sync_wait= 15, mask= 1, thd->variables.wsrep_on= 1
2024-11-22T08:58:32.987820Z 0 [Note] [MY-000000] [WSREP-SST] ERROR 1193 (HY000) at line 1: Unknown system variable 'clone_ssl_cert'
2024-11-22T08:58:32.997468Z 14 [Note] [MY-000000] [WSREP] wsrep_sync_wait: thd->variables.wsrep_sync_wait= 15, mask= 1, thd->variables.wsrep_on= 1
2024-11-22T08:58:33.004851Z 14 [Note] [MY-000000] [WSREP] Executing Query (DROP USER IF EXISTS 'clone_sst'@'%') with write-set (-1) and exec_mode: local in TO Isolation mode
2024-11-22T08:58:33.004913Z 14 [Note] [MY-000000] [WSREP] wsrep: initiating TOI for write set (-1)
2024-11-22T08:58:33.005813Z 14 [Note] [MY-000000] [WSREP] Query (DROP USER IF EXISTS 'clone_sst'@'%') with write-set (3) and exec_mode: toi replicated in TO Isolation mode
2024-11-22T08:58:33.005835Z 14 [Note] [MY-000000] [WSREP] wsrep: TO isolation initiated for write set (3)
2024-11-22T08:58:33.011278Z 14 [Note] [MY-000000] [WSREP] TO END: 3: DROP USER IF EXISTS 'clone_sst'@'%'
2024-11-22T08:58:33.011306Z 14 [Note] [MY-000000] [WSREP] wsrep: completed TOI write set (3)
2024-11-22T08:58:33.011323Z 14 [Note] [MY-000000] [WSREP] Setting WSREPXid (InnoDB): f0d030d8-a8af-11ef-927b-272a1375415c:3
2024-11-22T08:58:33.011372Z 14 [Note] [MY-000000] [WSREP] Updating WSREPXid: f0d030d8-a8af-11ef-927b-272a1375415c:3
2024-11-22T08:58:33.013101Z 14 [Note] [MY-000000] [WSREP] TO END: -1
2024-11-22T08:58:33.013812Z 14 [Warning] [MY-000000] [WSREP] Toggling wsrep_on to OFF will affect sql_log_bin. Check manual for more details
2024-11-22T08:58:33.019913Z 0 [Note] [MY-000000] [WSREP-SST] Cleanup DONOR DONE.
2024-11-22T08:58:33.019989Z 0 [Note] [MY-000000] [WSREP-SST] (debug) -> PREPARE DONOR
2024-11-22T08:58:33.033988Z 15 [Note] [MY-000000] [WSREP] wsrep_sync_wait: thd->variables.wsrep_sync_wait= 15, mask= 1, thd->variables.wsrep_on= 1
2024-11-22T08:58:33.034750Z 15 [Note] [MY-000000] [WSREP] wsrep_sync_wait: thd->variables.wsrep_sync_wait= 15, mask= 1, thd->variables.wsrep_on= 1
2024-11-22T08:58:33.040082Z 0 [Note] [MY-000000] [WSREP-SST] Installing CLONE plugin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch!
This is probably better to run this on the donor only!
wsrep_log_info "Server SSL settings on $ROLE: CERT=$SERVER_SSL_CERT, KEY=$SERVER_SSL_KEY, CA=$SERVER_SSL_CA" | ||
wsrep_log_info "Client SSL settings on $ROLE: CERT=$CLIENT_SSL_CERT, KEY=$CLIENT_SSL_KEY, CA=$CLIENT_SSL_CA" | ||
wsrep_log_info "CLONE SSL settings on $ROLE: CERT=$CLONE_SSL_CERT, KEY=$CLONE_SSL_KEY, CA=$CLONE_SSL_CA" | ||
REQUIRE_SSL="REQUIRE SSL" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, we set the REQURIE_SSL variable to REQUIRE SSL
, but it gets overwritten in line number 428? Is this expected?
This is what we will get in the error log even when both server and client have the ssl certificates in place.
2024-11-22T09:05:44.965420Z 0 [Note] [MY-000000] [WSREP-SST] Installing CLONE plugin
2024-11-22T09:05:45.086583Z 0 [Note] [MY-000000] [WSREP-SST] CLONE SSL not configured. Checking general MySQL SSL settings.
2024-11-22T09:05:45.086607Z 0 [Note] [MY-000000] [WSREP-SST] Using SSL configuration from MySQL Server.
2024-11-22T09:05:45.112218Z 0 [Note] [MY-000000] [WSREP-SST] Server SSL settings on recipient: CERT=/home/venki/work/pxc/merge/80/mysql-test/std_data/server-cert.pem, KEY=/home/venki/work/pxc/merge/80/mysql-test/std_data/server-key.pem, CA=/home/venki/work/pxc/merge/80/mysql-test/std_data/cacert.pem
2024-11-22T09:05:45.112236Z 0 [Note] [MY-000000] [WSREP-SST] Client SSL settings on recipient: CERT=/home/venki/work/pxc/merge/80/mysql-test/std_data/server-cert.pem, KEY=/home/venki/work/pxc/merge/80/mysql-test/std_data/server-key.pem, CA=/home/venki/work/pxc/merge/80/mysql-test/std_data/cacert.pem
2024-11-22T09:05:45.112256Z 0 [Note] [MY-000000] [WSREP-SST] CLONE SSL settings on recipient: CERT=/home/venki/work/pxc/merge/80/mysql-test/std_data/server-cert.pem, KEY=/home/venki/work/pxc/merge/80/mysql-test/std_data/server-key.pem, CA=/home/venki/work/pxc/merge/80/mysql-test/std_data/cacert.pem
2024-11-22T09:05:45.112321Z 0 [Note] [MY-000000] [WSREP-SST] SSL Not supported yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I have intentionally disabled SSL until we are ok with the global process.
Then will focus on SSL. Have SSL between the Donor/joiner is not a big problem if the certificates are correctly installed. More cumbersome is to deal with the temp instance doing the clone action. But most of the logic is already in the script, I just ignore it in favour of the clone mechanics.
[sst] | ||
netcat_port=4442 | ||
wsrep-debug=true | ||
clone_instance_port=4444 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should remove these new params from cnf after rework
# │Waits for IST │ | ||
# └─────────────────────────┘ | ||
#---------------------------------------------------------------------------- | ||
# Config variables: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not relevant anymore
WSREP_SST_OPT_REMOTE_JOINER_PSWD="" | ||
|
||
echo "continue" # donor can resume updating data | ||
WSREP_SST_OPT_REMOTE_AUTH=$(echo $WSREP_SST_OPT_ADDR_LOCAL | cut -d '@' -f 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now --address
parameter is in format clone_user:clone_passwd@joiner_ip:joiner:port
Why does it not conform to the format expected by --address
in sst_common (ip:port/path
)?
We are doing the full parsing here instead of taking the advantage of
what we already have in wsrep_sst_common.sh
If it was like joiner_ip:joiner_port/clone_user:clone_passwd
wsrep_sst_common, parses this into tokens
WSREP_SST_OPT_HOST -> joiner_ip
WSREP_SST_OPT_PORT -> joiner_port
WSREP_SST_OPT_PATH -> clone_user:clone_passwd
Then we just need to split WSREP_SST_OPT_PATH here.
WSREP_SST_OPT_REMOTE_HOST_WITH_PORT=$(echo $WSREP_SST_OPT_ADDR_LOCAL | cut -d '@' -f 2) | ||
WSREP_SST_OPT_REMOTE_HOST=$(echo $WSREP_SST_OPT_REMOTE_HOST_WITH_PORT | cut -d ':' -f 1) | ||
WSREP_SST_OPT_REMOTE_HOSTPORT=$(echo $WSREP_SST_OPT_REMOTE_HOST_WITH_PORT | cut -d ':' -f 2) | ||
TSST_PORT=$WSREP_SST_OPT_REMOTE_HOSTPORT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need two variables for the same. Let's remove TSST_PORT
and use WSREP_SST_OPT_REMOTE_HOSTPORT
cat $CLONE_PREPARE_SQL >> /dev/stderr | ||
exit $RC | ||
fi | ||
#Before waiting for the Joiner Clone mysql we send out the message this is a SST |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple occurences.
Comments format is inconsistent. Please add a space after leading hash.
JOINER_CLONE_HOST="$WSREP_SST_OPT_ADDR_LOCAL" | ||
JOINER_CLONE_PORT=$CLONE_INSTANCE_PORT | ||
fi | ||
TSST_PORT=$JOINER_CLONE_PORT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two variables for the same. Let's use JOINER_CLONE_PORT
(or rename it just to JOINER_SST_PORT
and remove TSST_PORT
|
||
# Report clone credentials/address to the caller | ||
wsrep_log_debug "-> ready passing string |$CLONE_USER:$CLONE_PSWD]$JOINER_CLONE_HOST:$JOINER_CLONE_PORT|" | ||
echo "ready $CLONE_USER:$CLONE_PSWD]$JOINER_CLONE_HOST:$JOINER_CLONE_PORT" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just pass it as $JOINER_CLONE_HOST:$JOINER_CLONE_PORT/$CLONE_USER:$CLONE_PSWD
This way it will be a lot easier to parse it on donor side (already in sst_common). Please see other related comments.
|
||
wsrep_log_debug "-> WAIT DIR DONOR MESSAGE DONE" | ||
|
||
if [[ ! `cat $WSREP_SST_OPT_DATA/XST_FILE.txt` =~ "SST@" ]];then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still didn't solve the problem of not full message in file
My proposal is to send from Donor such a string:
SST@${WSREP_SST_OPT_GTID}<EOF>
and then wait in a loop above until we receive <EOF>
substring in a file
then we can test the content of the file.
BTW. In the case of IST, shouldn't Donor signal by sending something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we have 2 messages.
SST or IST, there is NO other. So if not SST is IST, or as you fear a broken one. No need to add code to check the obvious (IST)
wsrep_log_debug "-> WAIT DIR DONOR MESSAGE DONE" | ||
|
||
if [[ ! `cat $WSREP_SST_OPT_DATA/XST_FILE.txt` =~ "SST@" ]];then | ||
wsrep_log_info "DONOR SAY IST" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anything being sent by Donor in case of IST. We should send something like
IST<EOF>
. This will allow the Joiner to immediately leave SST script.
@@ -1540,7 +1542,7 @@ static bool is_sst_request_valid(const std::string &msg) { | |||
Instead of this we will just allow alpha-num + a few special characters | |||
(colon, slash, dot, underscore, square brackets, hyphen). */ | |||
std::string data = msg.substr(method_len + 1, data_len); | |||
static const std::regex allowed_chars_regex("[\\w:/.[\\]-]+"); | |||
static const std::regex allowed_chars_regex("[\\w:/.[\\]@-]+"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need this change if we pass the address in the form ip:port/path
Adding preview POC for the Clone method for SST.
The script is based on the one from codership but then need to refactor almost in full to have it working with PXC.
birdeye flow