-
Notifications
You must be signed in to change notification settings - Fork 679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Made QueryEvent decode options class variables #296
base: master
Are you sure you want to change the base?
Conversation
That's a elegant solution to a problem I've been complaining about for way too long :D So thank you! Would you mind sharing how you use this? Are you using subclass or just injecting QueryEvent.charset as a global variable? |
Re-pushed because I forgot to add self. to charset and on_errors. Also added keywords to make it more explicit.
Sorry about the re-pushes. It was a bit late and I missed the @baloo Just set the proper charset/on_errors right after import. An example that I've tested is as follows: from pymysqlreplication import BinLogStreamReader
from pymysqlreplication.event import QueryEvent
QueryEvent.charset = 'cp932'
QueryEvent.on_errors = 'ignore'
stream = BinLogStreamReader(
connection_settings={'host': '127.0.0.1', 'port': 3306, 'user': 'root', 'passwd': 'nopasswd'},
blocking=True,
server_id=100)
for e in stream:
e.dump() For those not sure why this is needed, try shuffing a create table test_table (some_column text comment 'コラムおおおおお') engine=innodb; using cp932 (e.g. using iconv -t cp932) then running the above test script with the charset and on_errors settings commented out. You should get a UnicodeDecodeError: 'utf8' codec can't decode byte 0x83 in position 47: invalid start byte |
On second thoughts, I would prefer to not call decode all together, break the api, and store and reply bytestring to the user of this. This change as you propose would make it impossible to have different charset per schema, and I would sincerely prefer to push that responsability to the consumer of the api. @noplay any thoughts? |
It seem possible to get the information.
https://dev.mysql.com/doc/internals/en/query-event.html
In the status_vars their is a Q_CHARSET_CODE and Q_CHARSET_DATABASE_CODE
I didn't found the mapping to charset. But I guess it can be extracted from
this.
Le lun. 15 juil. 2019 à 21:45, Arthur Gautier <[email protected]> a
écrit :
… On second thoughts, I would prefer to *not* call decode all together,
break the api, and store and reply bytestring to the user of this.
This change as you propose would make it impossible to have different
charset per schema, and I would sincerely prefer to push that
responsability to the consumer of the api.
Just wondering, have you check binlog recently see if the charset was
specified in the transaction in the replication stream or something? That's
what I would expect, but I haven't checked.
@noplay <https://github.com/noplay> any thoughts?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#296>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACUKXJV6OEFWVEENBFXKSLP7THUZANCNFSM4IDQ6FRQ>
.
|
I got this parsed:
the patch was straitforward enough, BUT:
|
@baloo You’re right. This is only a temporary solution that at least allows my project to move since all schemas have the same encoding. What about directly querying the server for charset names? SELECT default_character_set_name FROM information_schema.SCHEMATA
WHERE schema_name = "example_schema"; |
@baloo I am dealing with a similar issue. When a query inserting binary data gets parsed, the reader is failing with the same exception @joy13975 reported above. Is it possible to make a change to allow the |
@joy13975 well no that does not work, you can't query server to get this afaict. @cmayo117 I would prefer to go the call decode with "values passed from status_vars" way. But I do not have time to handle this myself. If you'd like to throw in a PR, that would be great. |
@baloo Unfortunately I don't think that will solve our issue as it isn't a problem of using the wrong charset. Some queries in our case have embedded binary data which cannot be decoded easily. Instead, it would be helpful to have the option of just ignoring decoding errors altogether. |
QueryEvent decodes the packet using utf-8, which is fine except it's hardcoded and there's no option to set the
errors
argument fordecode()
.In a project I'm working on, we need cp932 plus the errors='ignore' argument. In general it can just be set through class variables right after importing them.