Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing records !! #89

Closed
shahzaibcb opened this issue Nov 28, 2018 · 11 comments
Closed

Missing records !! #89

shahzaibcb opened this issue Nov 28, 2018 · 11 comments

Comments

@shahzaibcb
Copy link

shahzaibcb commented Nov 28, 2018

Hi,

We're running mysql 5.7 with master-slave replication based on GTID mode. The database is large 11Gb of size and we need it to convert to postgres. To do it, we used chameleon tool and setup replication with mysql slave (without GTID). On init-replica, the data took some time and replicated to postgres but on checking we found that records are missing e.g there is one table 'users' whose count in mysql is 5lac while its count in postgres replicated databases is 1lac. Is there any buffers that we need to tweak to migrate large database with chameleon?

PS: On pgloader, all records were migrated correctly but since it'll required significant amount of downtime during live migration from mysql to postgres due to which we want to go with pg_chameleon solution.

Awaiting your response!

@shahzaibcb
Copy link
Author

Update: I just came to know that after 1lac rows are inserted, following errors are started to generate for other entries in same table :-

http://prntscr.com/lo1897

Is there any fix for this ?

@the4thdoctor
Copy link
Owner

Hi, can you please give me more details about the table? It seems like a wrong not null constraint set in the postgresql database

@the4thdoctor
Copy link
Owner

Did you set the row image full on mysql?

@shahzaibcb
Copy link
Author

Did you set the row image full on mysql?

Hi, our replication structure is as below :-

Master -> Slave1 -> Slave2 (It is Slave of Slave1.This slave is accessed by pg_chameleon for migration)

Now Master is in production and our live application reads/writes on it. Since changing its row to FULL will required restart which will make our application inaccessible for few seconds. So instead of changing row on master, we've changed the row on slave2(Slave of Slave1) which will be accessed by chameleon for migration.

Is that fine ?

@shahzaibcb
Copy link
Author

Hi, can you please give me more details about the table? It seems like a wrong not null constraint set in the postgresql database

Hi, it was these values creating issues 0000-00-00 00:00:00

We fixed these values in mysql before running chameleon and error got fixed. However we now have another error on different table :-

"sprite_thumbs_no","tr_stored_thumbs","mass_embed_status","is_hd","ebay_epn_keywords","ebay_pre_desc","unique_embed_code","mature_content","remote_play_url","server_ip","file_server_path","files_thumbs_path","file_thumbs_count","has_hq","has_mobile","filegrp_size","process_status","has_hd","file_directory","extras","test") VALUES (\'203471\',\'3B18NBXRUN9W\',\'3\',\'\',\' \',\'6123\',\'Woe To Me \xe1\xb4\xb4\xe1\xb4\xb0 - O That I Regret\',\'1374303078076cd\',\'Woe To Me \xe1\xb4\xb4\xe1\xb4\xb0 - O That I Regret\',\'Woe To Me \xe1\xb4\xb4\xe1\xb4\xb0 - O That I Regret\',\'#24# \',\'24\',\'public\',\'\',\'2013-07-20\',\'0\',\'yes\',\'0\',\'0\',\'0\',\'0\',\'\',\'yes\',\'yes\',\'0\',\'2018-11-28 11:09:23\',\'pending\',\'no\',\'2018-11-28 11:06:51\',\'\',\'yes\',\'yes\',\'0\',\'0\',\'1\',\'2018-11-28 11:11:58\',\'2013-07-20 02:53:47\',\'no\',\'255.76\',\'\',\'none\',NULL,\'1\',\'\',\'none\',\'\',\'0\',\'174.34.243.230\',\'0\',\'\',\'0\',\'0\',\'0\',\'no\',\'no\',\'\',\'\',\'\',\'no\',\'none\',\'95.211.16.250\',\'http://lwx003.domain.com/files\',\'http://lwx003.domain.com/files/thumbs\',\'25\',\'no\',\'yes\',\'40886689\',\'2\',\'no\',\'2013/07/20//\',\'{"redirect":"","invactive_message":"","redirect_id":""}\',NULL);' 2018-11-28 14:00:31 MainProcess ERROR pg_lib.py (3509): SQLCODE: 22P02 SQLERROR: ERROR: invalid input value for enum _cb_tune_01_tmp.enum_cb_video_status: "" LINE 1: ...28 11:11:58','2013-07-20 02:55:30','no','1830.77','','none',...

@the4thdoctor
Copy link
Owner

the4thdoctor commented Nov 29, 2018

0000-00-00 00:00:00 it's an issue with the library :( it converts the output in null. there is a PR julien-duponchelle/python-mysql-replication#228 to have it working differently but still in wait since 2017 :(

For the different error, can you provide the table structure and an example record so I can reproduce it?

thanks

@shahzaibcb
Copy link
Author

For the different error, can you provide the table structure and an example record so I can reproduce it?

It was some enum value related, we ran mysql query to clean it and then did the migration which fixed this issue. Now we've a table which is falling back to INSERT from COPY and taking significantly high time to complete. Is there a way to know why is it falling back to insert ? Here is some warnings this table generates during migration :

2018-11-29 05:22:27 MainProcess DEBUG mysql_lib.py (509): estimating rows in domain_001.cb_action_log 2018-11-29 05:22:27 MainProcess DEBUG mysql_lib.py (539): The table domain_001.cb_action_log will be copied in 2 estimated slice(s) of 3932160 rows 2018-11-29 05:22:27 MainProcess DEBUG mysql_lib.py (477): locking the table domain_001.cb_action_log2018-11-29 05:22:27 MainProcess DEBUG mysql_lib.py (479): collecting the master's coordinates for tabledomain_001.cb_action_log 2018-11-29 05:22:27 MainProcess DEBUG mysql_lib.py (548): Executing query for table domain_001.cb_action_log 2018-11-29 05:23:55 MainProcess INFO mysql_lib.py (570): Table _cb_domain_01_tmp.cb_action_log error in PostgreSQL copy, saving slice number for the fallback to insert statements 2018-11-29 05:23:55 MainProcess INFO mysql_lib.py (644): Table domain_001.cb_action_log copied 1 slice of 2 2018-11-29 05:24:46 MainProcess INFO mysql_lib.py (644): Table domain_001.cb_action_log copied 2 slice of 2 2018-11-29 05:24:46 MainProcess INFO mysql_lib.py (621): Executing inserts in _cb_domain_01_tmp.cb_action_log. Slice 1. Rows per slice 3932160. 2018-11-29 05:35:07 MainProcess WARNING pg_lib.py (3512): character mismatch when inserting the data, trying to cleanup the row data 2018-11-29 05:35:07 MainProcess WARNING pg_lib.py (3512): character mismatch when inserting the data, trying to cleanup the row data 2018-11-29 05:35:07 MainProcess WARNING pg_lib.py (3512): character mismatch when inserting the data, trying to cleanup the row data

@the4thdoctor
Copy link
Owner

Mmm it should show the failing row. I'll have a look ASAP. In the meantime, what error is logged by postgresql?
Thanks

@shahzaibcb
Copy link
Author

Mmm it should show the failing row. I'll have a look ASAP. In the meantime, what error is logged by postgresql?
Thanks
Hi, That's what postgres showed in logs, chameleon didn't showed anything during migration though:

2018-11-29 05:55:27.746 EST [60170] ERROR: invalid byte sequence for encoding "UTF8": 0x00 2018-11-29 05:55:27.746 EST [60170] CONTEXT: COPY cb_action_log, line 1221141 2018-11-29 05:55:27.746 EST [60170] STATEMENT: COPY "_cb_tune_01_tmp"."cb_action_log" ("action_id","action_type","action_username","action_userid","action_useremail","action_userlevel","action_ip","date_added","action_success","action_details","action_link","action_obj_id","action_done_id") FROM STDIN WITH NULL 'NULL' CSV QUOTE '"' DELIMITER ',' ESCAPE '"' ;

@the4thdoctor
Copy link
Owner

ok this is a classic issue with the nul character that the system manages with the inserts.
if you want to speed up the copy you should cleanup the data on mysql removing the nul chars.

@the4thdoctor
Copy link
Owner

I'm closing the ticket as there's no follow up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants