-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MariaDB Replication Is Broken #1921
Comments
Where is this log |
it is on the same path where MariaDB data is stored. root@b74866364bac:/data# pwd
/data
root@b74866364bac:/data# ls -la
total 131148
drwx------ 7 root root 4096 Jun 24 13:30 .
drwxr-xr-x 1 root root 4096 Jun 24 11:44 ..
-rw-r----- 1 root root 4096 Jun 24 11:44 ON.000001
-rw-r----- 1 root root 4096 Jun 24 11:44 ON.index
-rw-rw---- 1 root root 430080 Jun 24 11:51 aria_log.00000001
-rw-rw---- 1 root root 4096 Jun 24 11:44 aria_log_control
-rw-r----- 1 root root 24576 Jun 24 13:32 ddl_recovery.log
-rw-rw---- 1 root root 4096 Jun 24 11:44 ib_buffer_pool
-rw-r----- 1 root root 101715968 Jun 24 13:32 ib_logfile0
-rw-rw---- 1 root root 12718080 Jun 24 11:44 ibdata1
-rw-r----- 1 root root 12718080 Jun 24 11:44 ibtmp1
-rw-r----- 1 root root 0 Jun 24 11:51 master.info
-rw-r----- 1 root root 4096 Jun 24 11:44 multi-master.info
drwx------ 2 root root 4096 Jun 24 11:44 mysql
-rw-r--r-- 1 root root 4096 Jun 24 11:44 mysql_upgrade_info
-rw-r----- 1 root root 4096 Jun 24 11:51 mysqld-relay-bin.000001
-rw-r----- 1 root root 6610944 Jun 25 08:56 mysqld-relay-bin.000002
-rw-r----- 1 root root 4096 Jun 24 11:51 mysqld-relay-bin.index
drwx------ 2 root root 4096 Jun 24 11:44 performance_schema
-rw-r----- 1 root root 4096 Jun 24 11:51 relay-log.info
drwx------ 2 root root 4096 Jun 24 13:32 sbtest
drwx------ 2 root root 12288 Jun 24 11:44 sys
drwx------ 2 root root 4096 Jun 24 13:29 test |
The manifest file in the linked repo (https://github.com/enclaive/enclaive-docker-mariadb-sgx/blob/main/mariadb.manifest.template) doesn't seem to have the Could you show your manifest file that you're currently using? |
it is here actually, I am using the same file |
only changes I have made on that repository are;
git diff
diff --git a/conf/my.cnf b/conf/my.cnf
index 6161d85..c77d492 100644
--- a/conf/my.cnf
+++ b/conf/my.cnf
@@ -14,6 +14,7 @@ basedir = /usr
tmpdir = /tmp
bind-address = 0.0.0.0
skip-external-locking
+log_bin = ON
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
diff --git a/entrypoint.sh b/entrypoint.sh
index 543afaa..b8f4027 100755
--- a/entrypoint.sh
+++ b/entrypoint.sh
@@ -2,5 +2,5 @@
/aesmd.sh
-gramine-sgx-get-token --output mariadb.token --sig mariadb.sig
-gramine-sgx mariadb
+#gramine-sgx-get-token --output mariadb.token --sig mariadb.sig
+gramine-direct mariadb |
Ah, sorry, I missed line 20 in the manifest file. Now I can see that all files under This is unfortunately not supported currently: Encrypted Files are assumed to be used by a single Gramine instance (single SGX enclave). In your case, your encrypted file is used by two Gramine instances. Even though the encrypted key is the same, the internal format of Encrypted Files is single-process as of now. For example, the file size is cached inside the Gramine instance and is not updated from the outside. This means that one Gramine instance can for example update the encrypted file from 4KB to 8KB, but the other Gramine instance will still think that the encrypted file is only 4KB in size. So I think this is what happens in your MariaDB replication case: two MariaDB replicas try to access the same encrypted file in parallel. This is currently not supported, and I'm not sure Gramine will ever support such complex cases. |
but if I do not encrypt the data volume I still get the same error. So would not this solve this issue if it is because of that? I have made this change on the manifest and build another image and followed the same steps and I still have the same issue. git diff mariadb.manifest.template
diff --git a/mariadb.manifest.template b/mariadb.manifest.template
index 185ae17..7d50467 100644
--- a/mariadb.manifest.template
+++ b/mariadb.manifest.template
@@ -17,7 +17,7 @@ fs.mounts = [
{ path = "/lib", uri = "file:{{ gramine.runtimedir() }}" },
{ path = "{{ arch_libdir }}", uri = "file:{{ arch_libdir }}" },
{ path = "/app", uri = "file:/app" },
- { path = "/var/lib/mysql", uri = "file:/data", type = "encrypted", key_name = "default" },
+ { path = "/var/lib/mysql", uri = "file:/data" },
]
fs.insecure__keys.default = "ffeeddccbbaa99887766554433221100"
manifest file in the docker container: grep data mariadb.manifest.template
{ path = "/var/lib/mysql", uri = "file:/data" }, the error; MariaDB [sbtest]> show slave status \G
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: 172.17.0.2
Master_User: root
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: ON.000001
Read_Master_Log_Pos: 2098883
Relay_Log_File: mysqld-relay-bin.000002
Relay_Log_Pos: 1050305
Relay_Master_Log_File: ON.000001
Slave_IO_Running: No
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1594
Last_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MariaDB code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
Skip_Counter: 0
Exec_Master_Log_Pos: 1050013
Relay_Log_Space: 2099485
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 1595
Last_IO_Error: Relay log write failure: could not queue event from master
Last_SQL_Errno: 1594
Last_SQL_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MariaDB code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: Slave_Pos
Gtid_IO_Pos: 0-1-9
Replicate_Do_Domain_Ids:
Replicate_Ignore_Domain_Ids:
Parallel_Mode: optimistic
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Slave_DDL_Groups: 5
Slave_Non_Transactional_Groups: 0
Slave_Transactional_Groups: 3 |
You also need to specify that whole directory as
Otherwise Gramine doesn't know what kind of files are contained in this directory (not encrypted, not trusted and not allowed -- so Gramine conservatively forbids access to these files). Note that this is only for debugging purposes, this is insecure! Also, did you purge all your Docker images and containers? It looks to me like you still have the old files. |
but I did not remove { path = "/var/lib/mysql", uri = "file:/data" },
Yes, I did. Since I use the same configuration it creates new files with the same names |
Yes, that's normal. It's because some FS mounts are self-sufficient (the ones with |
ok, I added it to trusted_files and purged all remnants from the older tests and re-run the same steps, however I still have the same issue, git diff mariadb.manifest.template
diff --git a/mariadb.manifest.template b/mariadb.manifest.template
index 185ae17..16a87ce 100644
--- a/mariadb.manifest.template
+++ b/mariadb.manifest.template
@@ -17,7 +17,7 @@ fs.mounts = [
{ path = "/lib", uri = "file:{{ gramine.runtimedir() }}" },
{ path = "{{ arch_libdir }}", uri = "file:{{ arch_libdir }}" },
{ path = "/app", uri = "file:/app" },
- { path = "/var/lib/mysql", uri = "file:/data", type = "encrypted", key_name = "default" },
+ { path = "/var/lib/mysql", uri = "file:/data" },
]
fs.insecure__keys.default = "ffeeddccbbaa99887766554433221100"
@@ -38,6 +38,7 @@ sgx.trusted_files = [
"file:/app/mariadbd",
"file:/app/init.sql",
"file:/app/mysql.tar.gz",
+ "file:/var/lib/mysql/",
"file:/etc/my.cnf",
"file:/usr/lib/mysql/plugin/", the same error, MariaDB [sbtest]> show slave status \G
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: 172.17.0.2
Master_User: root
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: ON.000001
Read_Master_Log_Pos: 2098754
Relay_Log_File: mysqld-relay-bin.000002
Relay_Log_Pos: 1050176
Relay_Master_Log_File: ON.000001
Slave_IO_Running: No
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1594
Last_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MariaDB code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
Skip_Counter: 0 |
btw, regarding the following statement, I have a single replica only and it downloads the binary log from the master over TCP protocol and replays the logs on the replica. So there are no multiple replicas here.
|
Unfortunately you can't add this path as By the way, since you added that path to
Oh, so the master process writes to
|
No, I added that path to trusted_files and the database can create new files on that path. This is probably because it is the sgx options but I use
no, master creates its binary log which is called |
I moved the path to the the database files in the replica before I create a replication stream, you can see the relay log file is not here yet since there is no replication. /data# ls -la
total 12752
drwx------ 5 root root 4096 Jun 26 10:30 .
drwxr-xr-x 1 root root 4096 Jun 26 10:30 ..
-rw-r----- 1 root root 627 Jun 26 10:30 ON.000001
-rw-r----- 1 root root 12 Jun 26 10:30 ON.index
-rw-rw---- 1 root root 417792 Jun 26 10:30 aria_log.00000001
-rw-rw---- 1 root root 52 Jun 26 10:30 aria_log_control
-rw-r----- 1 root root 9 Jun 26 10:30 ddl_recovery.log
-rw-rw---- 1 root root 906 Jun 26 10:30 ib_buffer_pool
-rw-r----- 1 root root 100663296 Jun 26 10:30 ib_logfile0
-rw-rw---- 1 root root 12582912 Jun 26 10:30 ibdata1
-rw-r----- 1 root root 12582912 Jun 26 10:30 ibtmp1
-rw-r----- 1 root root 0 Jun 26 10:30 multi-master.info
drwx------ 2 root root 4096 Jun 26 10:30 mysql
-rw-r--r-- 1 root root 15 Jun 26 10:30 mysql_upgrade_info
drwx------ 2 root root 4096 Jun 26 10:30 performance_schema
drwx------ 2 root root 12288 Jun 26 10:30 sys And once I create replication stream and create a test database on the master, the database files on the replica. Here the relay log file is created by the MariaDB since we have a replication stream, drwx------ 6 root root 4096 Jun 26 10:37 .
drwxr-xr-x 1 root root 4096 Jun 26 10:30 ..
-rw-r----- 1 root root 627 Jun 26 10:30 ON.000001
-rw-r----- 1 root root 12 Jun 26 10:30 ON.index
-rw-rw---- 1 root root 417792 Jun 26 10:35 aria_log.00000001
-rw-rw---- 1 root root 52 Jun 26 10:30 aria_log_control
-rw-r----- 1 root root 9 Jun 26 10:30 ddl_recovery.log
-rw-rw---- 1 root root 906 Jun 26 10:30 ib_buffer_pool
-rw-r----- 1 root root 100663296 Jun 26 10:37 ib_logfile0
-rw-rw---- 1 root root 12582912 Jun 26 10:30 ibdata1
-rw-r----- 1 root root 12582912 Jun 26 10:30 ibtmp1
-rw-r----- 1 root root 134 Jun 26 10:35 master.info
-rw-r----- 1 root root 0 Jun 26 10:30 multi-master.info
drwx------ 2 root root 4096 Jun 26 10:30 mysql
-rw-r--r-- 1 root root 15 Jun 26 10:30 mysql_upgrade_info
-rw-r----- 1 root root 310 Jun 26 10:35 mysqld-relay-bin.000001
-rw-r----- 1 root root 1052 Jun 26 10:37 mysqld-relay-bin.000002
-rw-r----- 1 root root 52 Jun 26 10:35 mysqld-relay-bin.index
drwx------ 2 root root 4096 Jun 26 10:30 performance_schema
-rw-r----- 1 root root 46 Jun 26 10:35 relay-log.info
drwx------ 2 root root 4096 Jun 26 10:37 sbtest
drwx------ 2 root root 12288 Jun 26 10:30 sys manifest file; grep -A 3 allowed mariadb.manifest.template
sgx.allowed_files = [
"file:/var/lib/mysql/",
] and the same replication issue, MariaDB [sbtest]> show slave status \G
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: 172.17.0.2
Master_User: root
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: ON.000001
Read_Master_Log_Pos: 2098754
Relay_Log_File: mysqld-relay-bin.000002
Relay_Log_Pos: 1050176
Relay_Master_Log_File: ON.000001
Slave_IO_Running: No
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1594
Last_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MariaDB code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
Skip_Counter: 0
Exec_Master_Log_Pos: 1049884
Relay_Log_Space: 2099356
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 1595
Last_IO_Error: Relay log write failure: could not queue event from master
Last_SQL_Errno: 1594
Last_SQL_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MariaDB code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: Slave_Pos
Gtid_IO_Pos: 0-1-8
Replicate_Do_Domain_Ids: |
Sorry, yes, my stupid mistake. I forgot that you run with
Can you check if the file |
yes, |
Could you do some more investigation about whether the problem is in (2) or in (3) -- see my previous comment. |
on the same network when I create another replica without Gramine, it downloads the binary log from the Gramine master without any issues. Another test to eliminate (2) is; if I use mariadb-binlog utility to download the master binary log file to the replica that runs in gramine container it downloads without any issues with the following command. I think these 2 tests show that the issue is not related to (2) ./mariadb-binlog --read-from-remote-server --host=172.17.0.2 --user=<sanitized> --password=<sanitized> ON.000001 > binlog
root@sek-idp-gramine-test:/tmp# echo $?
0
root@sek-idp-gramine-test:/tmp# head binlog
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!40019 SET @@session.max_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#240626 10:29:55 server id 1 end_log_pos 256 CRC32 0xe65ed405 Start: binlog v 4, server v 10.6.19-MariaDB-log created 240626 10:29:55 at startup
ROLLBACK/*!*/;
BINLOG '
o+17Zg8BAAAA/AAAAAABAAAAAAQAMTAuNi4xOS1NYXJpYURCLWxvZwAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAACj7XtmEzgNAAgAEgAEBAQEEgAA5AAEGggAAAAICAgCAAAACgoKAAAAAAAA
|
@harunkucuk5 Ok, the problem seems quite complicated... Do you think you can take a look into the MariaDB code and see how exactly the download + the reading from the downloaded file happens? Very is something low level going on. Also, can you show the "trace" logs of Gramine on the MariaDB replica? |
@dimakuv, the MariaDB replication has 2 threads. the first one is responsible for downloading the binary log files from the master (IO_THREAD) and the other one(SQL_THREAD) is responsible for reading the binary logs from this relay log file and applying them to the replica database. if I set up replication and stop SQL_THREAD, the replica downloads the binary logs without issues. If I stop the IO_THREAD and start SQL_THREAD, the SQL_THREAD applies the downloaded binary logs without issues. But if I let them run at the same time it ends up with the replication error that I shared before. I compiled mariadb with debug and reproduced the issue. That time it crashed with the following trace. mariadbd: /server/mysys/mf_iocache.c:1723: my_b_flush_io_cache: Assertion `info->end_of_file == inline_mysql_file_tell("/server/mysys/mf_iocache.c", 1723, info->file, (myf) (0))' failed.
240703 11:51:27 [ERROR] mysqld got signal 6 ;
Sorry, we probably made a mistake, and this is a bug.
Your assistance in bug reporting will enable us to fix this for the next release.
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Server version: 10.6.19-MariaDB-debug-log source revision: f6fcfc1a6a058fd7cac6bf53216ea73f3a04b22d
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=1
max_threads=153
thread_count=3
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 468166 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x24582abdb098
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x24582abd9b48 thread_stack 0x49000
/app/mariadbd(my_print_stacktrace+0x44)[0x2458481fa5e9]
/app/mariadbd(handle_fatal_signal+0x421)[0x2458477d7b76]
libc_sigaction.c:0(__restore_rt)[0x2458461b8ed0]
pthread_kill.c:0(__pthread_kill_implementation)[0x245846204766]
/lib/libc.so.6(raise+0x16)[0x2458461b8e26]
/lib/libc.so.6(abort+0xc5)[0x2458461a37e5]
loadmsgcat.c:0(_nl_load_domain.cold)[0x2458461a371b]
/lib/libc.so.6(+0x3acc6)[0x2458461b1cc6]
/app/mariadbd(my_b_flush_io_cache+0x2a4)[0x2458481ce3ec]
/app/mariadbd(_ZN13MYSQL_BIN_LOG14flush_and_syncEPb+0x110)[0x245847937cc2]
/app/mariadbd(_ZN13MYSQL_BIN_LOG18write_event_bufferEPhj+0x509)[0x245847937ad5]
/app/mariadbd(+0x8f4353)[0x2458472f1353]
/app/mariadbd(handle_slave_io+0x13c2)[0x2458472e9f0f]
/app/mariadbd(+0x1198ccb)[0x245847b95ccb]
pthread_create.c:0(start_thread)[0x245846202a17]
/lib/libc.so.6(clone+0x51)[0x245846284d61]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x0): (null)
Connection ID (thread ID): 6
Status: NOT_KILLED
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off,hash_join_cardinality=off,cset_narrowing=off
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/ contains
information that should help you find out what is causing the crash.
so it points this part of the MariaDB server code; info->end_of_file= eof;
info->append_read_pos= info->write_buffer;
DBUG_ASSERT(info->end_of_file == mysql_file_tell(info->file, MYF(0)));
}
else it is in it has the following comments section of this file here,
Basically, there is a writer thread and a reader thread. They both use the same file. There is also a buffer, the reader thread first reads from the disk and then reads the buffer. Maybe reading the buffer with Gramine causes some conflicts. I will keep reading the code, but please note that I am not a c developer and an expert on that topic. But will try to find some more details. I also set the Gramine log level to trace and re-produced the issue again. Sending the logs in the attachment. |
Thanks @harunkucuk5, this was a great exploration on your part! Without your thorough experiments and debugging, we wouldn't root cause this bug. So I got it. Gramine currently does not support
Basically, there is a reader thread that messes with the current position ( I'll create a draft PR with a quick fix now, just to check that your problem will be solved. |
FIXME: - Add all checks and return errors for incorrect O_APPEND usage - Add libos/test/fs tests - Make sure that read before first write starts with pos=0 (?) - Make sure that read after first write starts with pos=size - Make sure that lseek+read in one thread and write in another thread don't lead to corrupted data (reproducer of issue #1921) - Make sure that all kinds of FS mounts work fine with O_APPEND - Make it explicit that O_APPEND is currently emulated only for a single process; maybe mark O_APPEND files as poisoned in the child process Signed-off-by: Dmitrii Kuvaiskii <[email protected]>
@harunkucuk5 Can you try PR #1935 ? |
@dimakuv , thanks for the quick PR. I tried to test this however probably I am missing something as I am getting the following error when I try to create container with the image docker run --name replica --security-opt seccomp=/root/docker_seccomp_aug_2022.json gsc-gramine-mariadb:v17-unsigned bash
Invalid application path specified (/gramine/app_files/entrypoint.manifest.sgx does not exist).
The path should point to application configuration files, so that they can be
found after appending corresponding extensions. So what I did to use your branch, Modified the docker file of that enclaive-docker-mariadb-sgx project. Changed the enclaive/gramine-os image to ubuntu image as I will use gsc to graminize the image with your branch ~/enclaive-docker-mariadb-sgx# git diff Dockerfile
diff --git a/Dockerfile b/Dockerfile
index d013e6f..5acd5d0 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -66,7 +66,7 @@ RUN gcc main.c -Wall -Wextra -Werror -Wpedantic -Wno-unused-parameter -O3 \
# final stage
-FROM enclaive/gramine-os:jammy-7e9d6925
+FROM ubuntu:jammy
RUN apt-get update \
&& apt-get install -y --no-install-recommends liblzo2-2 \
@@ -86,10 +86,6 @@ COPY --from=middlemain /build/launcher /app/
COPY ./mariadb.manifest.template ./entrypoint.sh ./conf/init.sql /app/
COPY ./conf/my.cnf /etc/
-RUN gramine-argv-serializer "/app/mariadbd" "--init-file=/app/init.sql" > ./argv \
- && gramine-manifest -Darch_libdir=/lib/x86_64-linux-gnu mariadb.manifest.template mariadb.manifest \
- && gramine-sgx-sign --key "$SGX_SIGNER_KEY" --manifest mariadb.manifest --output mariadb.manifest.sgx
-
VOLUME /data/
EXPOSE 3306/tcp Also modified the manifest file as well since it got error message like "gramine is not a known variable" ~/enclaive-docker-mariadb-sgx# git diff mariadb.manifest.template
diff --git a/mariadb.manifest.template b/mariadb.manifest.template
index 185ae17..8a922fe 100644
--- a/mariadb.manifest.template
+++ b/mariadb.manifest.template
@@ -1,9 +1,7 @@
-loader.entrypoint = "file:{{ gramine.libos }}"
-
libos.entrypoint = "/app/launcher"
loader.argv_src_file = "file:/app/argv"
-loader.log_level = "error"
+loader.log_level = "trace"
#loader.log_file = "log.txt"
loader.env.LD_LIBRARY_PATH = "/lib:{{ arch_libdir }}"
@@ -14,10 +12,10 @@ fs.mounts = [
{ path = "/run/mysqld", uri = "file:/run/mysqld", type = "tmpfs" },
{ path = "/etc", uri = "file:/etc" },
{ path = "/usr", uri = "file:/usr" },
- { path = "/lib", uri = "file:{{ gramine.runtimedir() }}" },
+ { path = "/lib", uri = "file:/usr/local/lib/x86_64-linux-gnu/gramine/runtime" },
{ path = "{{ arch_libdir }}", uri = "file:{{ arch_libdir }}" },
{ path = "/app", uri = "file:/app" },
- { path = "/var/lib/mysql", uri = "file:/data", type = "encrypted", key_name = "default" },
+ { path = "/var/lib/mysql", uri = "file:/data" },
]
fs.insecure__keys.default = "ffeeddccbbaa99887766554433221100"
@@ -29,8 +27,8 @@ sgx.isvprodid = 1
sgx.isvsvn = 1
sgx.trusted_files = [
- "file:{{ gramine.libos }}",
- "file:{{ gramine.runtimedir() }}/",
+ "file:/usr/local/lib/x86_64-linux-gnu/gramine/libsysdb.so",
+ "file:/usr/local/lib/x86_64-linux-gnu/gramine/runtime",
"file:{{ arch_libdir }}/",
"file:/app/argv", And cloned the gsc project and updated the config template to use your branch ~/gsc# git diff
diff --git a/config.yaml.template b/config.yaml.template
index ef64080..a74f374 100644
--- a/config.yaml.template
+++ b/config.yaml.template
@@ -29,7 +29,7 @@ Registry: ""
# branch is guaranteed to work with current Gramine `master` branch).
Gramine:
Repository: "https://github.com/gramineproject/gramine.git"
- Branch: "master"
+ Branch: "dimakuv/add-oappend-only-singleprocess"
# Specify the Intel SGX driver installed on your machine (more specifically, on the machine where
# the graminized Docker container will run); there are several variants of the SGX driver: and then I build docker image with the following command first, ~/enclaive-docker-mariadb-sgx# docker build -t gramine-mariadb:v17 . and then graminized this image with the following command, ./gsc build -c config.yaml.template gramine-mariadb:v17 /root/enclaive-docker-mariadb-sgx/mariadb.manifest.template and got the following output,
I have the following images. docker images|grep v17
gsc-gramine-mariadb v17-unsigned 9accc8c3d2b0 38 minutes ago 222MB
gramine-mariadb v17 889a841b215f 52 minutes ago 112MB however, when I try to create a container, I get the following error; docker run --name replica --security-opt seccomp=/root/docker_seccomp_aug_2022.json gsc-gramine-mariadb:v17-unsigned bash
Invalid application path specified (/gramine/app_files/entrypoint.manifest.sgx does not exist).
The path should point to application configuration files, so that they can be
found after appending corresponding extensions. |
You seem to have forgotten to do the You need to sign the image (with whatever signing key, you can create a dummy one, see the example in the docs), otherwise GSC won't finalize the resulting Docker image. |
Thanks @dimakuv , I signed the image and I made it work. Unfortunately, the same issue still exists. I will enable trace logging and send you gramine trace again. 2024-07-05 14:11:27 8 [ERROR] Error in Log_event::read_log_event(): 'Replication event checksum verification failed while reading from a log file', data_len: 524293, event_type: 2
2024-07-05 14:11:27 8 [ERROR] Error reading relay log event: slave SQL thread aborted because of I/O error
2024-07-05 14:11:27 8 [ERROR] Slave SQL: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MariaDB code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Internal MariaDB error code: 1594
2024-07-05 14:11:27 8 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'ON.000001' position 26227194; GTID position '0-1-59'
2024-07-05 14:11:27 8 [Note] Slave SQL thread exiting, replication stopped in log 'ON.000001' at position 26227194; GTID position '0-1-59', master: 172.30.43.5:3306
2024-07-05 14:11:27 7 [ERROR] Slave I/O: Relay log write failure: could not queue event from master, Internal MariaDB error code: 1595
2024-07-05 14:11:27 7 [Note] Slave I/O thread exiting, read up to log 'ON.000001', position 26751666; GTID position 0-1-60, master 172.30.43.5:3306 *************************** 1. row ***************************
Slave_IO_State:
Master_Host: 172.30.43.5
Master_User: root
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: ON.000001
Read_Master_Log_Pos: 26751666
Relay_Log_File: mysqld-relay-bin.000002
Relay_Log_Pos: 26227486
Relay_Master_Log_File: ON.000001
Slave_IO_Running: No
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1594
Last_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MariaDB code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
Skip_Counter: 0
Exec_Master_Log_Pos: 26227194
Relay_Log_Space: 26752268
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 1595
Last_IO_Error: Relay log write failure: could not queue event from master
Last_SQL_Errno: 1594
Last_SQL_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MariaDB code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: Slave_Pos
Gtid_IO_Pos: 0-1-60
Replicate_Do_Domain_Ids:
Replicate_Ignore_Domain_Ids:
Parallel_Mode: optimistic
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Slave_DDL_Groups: 8
Slave_Non_Transactional_Groups: 0
Slave_Transactional_Groups: 52 |
@dimakuv , the gramine trace logs are in the attached file. |
@harunkucuk5 Unfortunately I don't find anything interesting in the logs. Could you build MariaDB in debug mode, like you did previously? Maybe we'll hit some assert, which will give us more information on what's going on. |
Actually, @harunkucuk5, can you check #1935 again? I modified it today. |
@dimakuv, thanks for the PR. I could not reproduce the issue with this PR 🚀 I will run broader tests to ensure it is fixed and get back to you. I am adding Gramine trace logs and the MariaDB debug logs in case you need to check something. |
@dimakuv, it seems it is working without issues. Thanks for the fix |
Hi @dimakuv, we are currently using Gramine v1.6. Would it be possible to receive this fix in our current version (1.6.x), or would it be advisable to upgrade Gramine to the latest version once the fix becomes available? |
The latter (upgrade to latest version). We don't do backports, i.e., we will not release a 1.6.x version. |
@dimakuv can you provide transparency when the fix will be officially released? |
@oliver-steinbrecher If things go well, the next release of Gramine (v1.8) will happen around September. Unless some critical issues will be detected in the code review process for the fix, I expect the fix to be incorporated in that release. |
@dimakuv can you confirm that this fix is included in 1.8 tag ? |
@oliver-steinbrecher @molk-ibm Correct, the fix is not yet merged. Sorry, this had lower priority in our task queue, and it wasn't yet reviewed by other maintainers. |
hey @dimakuv thanks for the reply! |
@molk-ibm This depends on reviewers: @kailun-qin and @mkow mainly |
Probably in the next two/three months, but we're low on resources and high on high-priority tasks, unfortunately... |
Can you help to understand what is left or needs to be done in order to integrate it ? I'm asking because we do need this correction for a productive solution. We're considering to build a fork verison now . |
@mkow already explained: it needs time for review. Yes, review takes time and we have limited time allocated. It's as simple as that. |
Description of the problem
when we use MariaDB replication with Gramine it fails with the following error
Without Gramine all is good, but with Gramine we have this error. This is not also related to the sgx because we get the same error with gramine-direct as well.
Steps to reproduce
I used this https://github.com/enclaive/enclaive-docker-mariadb-sgx repository to reproduce this issue but the issue also exists on the Gramine 1.5,1.6 and 1.7 versions as well.
Clone the repository
Build docker image
docker build -t gramine-mariadb:v2 .
then create 2 instances with this docker image and create replication between them.
Once the replication is started all is good. But some time later it fails with this error. You can just run the following sysbench test and reproduce this replication error
Expected results
A running replication stream
Actual results
replication status
if we check the relay logs with the following command we see they are corrupted.
logs:
Gramine commit hash
10e9353
The text was updated successfully, but these errors were encountered: