You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When testing extracting data from a MariaDB database (v10.5) in Meltano (which uses pipelinewise-tap-mysql that in turn uses this package), I came across this error (using the Meltano preferred cursor pymysql.cursors.SSCursor):
bin/tap-mysql, line 10, in <module> sys.exit(main())
lib/python3.9/site-packages/tap_mysql/__init__.py, line 443, in main
raise exc
lib/python3.9/site-packages/tap_mysql/__init__.py, line 440, in main
main_impl()
lib/python3.9/site-packages/tap_mysql/__init__.py, line 429, in main_impl
do_sync(mysql_conn, args.config, args.catalog, state)
lib/python3.9/site-packages/tap_mysql/__init__.py, line 385, in do_sync
sync_binlog_streams(mysql_conn, binlog_catalog, config, state)
lib/python3.9/site-packages/tap_mysql/__init__.py, line 368, in sync_binlog_streams
binlog.sync_binlog_stream(mysql_conn, config, binlog_streams_map, state)
lib/python3.9/site-packages/tap_mysql/sync_strategies/binlog.py, line 882, in sync_binlog_stream
_run_binlog_sync(mysql_conn, reader, binlog_streams_map, state, config, end_log_file, end_log_pos)
lib/python3.9/site-packages/tap_mysql/sync_strategies/binlog.py, line 607, in _run_binlog_sync
for binlog_event in reader:
lib/python3.9/site-packages/pymysqlreplication/binlogstream.py, line 496, in fetchone
binlog_event = BinLogPacketWrapper(pkt, self.table_map,
lib/python3.9/site-packages/pymysqlreplication/packet.py, line 136, in __init__
self.event = event_class(self, event_size_without_header, table_map,
lib/python3.9/site-packages/pymysqlreplication/row_event.py, line 628, in __init__
if i != (column_schema['ORDINAL_POSITION'] - 1):
TypeError: tuple indices must be integers or slices, not str
After debugging a bit, I think that the error originates in pymysqlreplication/row_event.py in this piece of logic:
From what I gather, if the row event touches a table that we know the schema for (if-logic is true), we use the column schema stored in table_map, which is a dict (originally from pymysqlreplication/table.py):
but, if we do not have any table data on hand (else-logic of block executes), we run into this logic in pymysqlreplication/binlogstream.py:BinLogStreamReader:_get_table_information(...) that fetches that information from the information_schema:
...
cur.execute(""" SELECT COLUMN_NAME, COLLATION_NAME, CHARACTER_SET_NAME, COLUMN_COMMENT, COLUMN_TYPE, COLUMN_KEY, ORDINAL_POSITION FROM information_schema.columns WHERE table_schema = %s AND table_name = %s ORDER BY ORDINAL_POSITION """, (schema, table))
returncur.fetchall()
...
but this final call is where things supposedly go south: cur.fetchall() returns a tuple containing only the values from the query - and not the headers & values in a dictlike fashion. What's weird is that the implementation signatures for fetchall() seem to differ between cursor-implementations: Cursor:fetchall() seems to return a dict as expected, whereas SSCursor:fetchall() seems to return the tuple.
Not sure if this inconsistency is something to check for and deal with in this package or in the pymysql-one, raising it here to bring it up for discussion. Given that the different cursors have different purposes (SSCursor to be very performant with large datasets, DictCursor to return dicts etc.), it might make sense to be more explicit in this package on what's expected to be returned - and cast items to the correct type where unexpected formats are found.
The text was updated successfully, but these errors were encountered:
When testing extracting data from a MariaDB database (v10.5) in Meltano (which uses
pipelinewise-tap-mysql
that in turn uses this package), I came across this error (using the Meltano preferred cursorpymysql.cursors.SSCursor
):After debugging a bit, I think that the error originates in
pymysqlreplication/row_event.py
in this piece of logic:From what I gather, if the row event touches a table that we know the schema for (
if
-logic is true), we use the column schema stored intable_map
, which is adict
(originally frompymysqlreplication/table.py
):but, if we do not have any table data on hand (
else
-logic of block executes), we run into this logic inpymysqlreplication/binlogstream.py:BinLogStreamReader:_get_table_information(...)
that fetches that information from theinformation_schema
:but this final call is where things supposedly go south:
cur.fetchall()
returns a tuple containing only the values from the query - and not the headers & values in a dictlike fashion. What's weird is that the implementation signatures forfetchall()
seem to differ between cursor-implementations:Cursor:fetchall()
seems to return a dict as expected, whereasSSCursor:fetchall()
seems to return the tuple.Not sure if this inconsistency is something to check for and deal with in this package or in the
pymysql
-one, raising it here to bring it up for discussion. Given that the different cursors have different purposes (SSCursor
to be very performant with large datasets,DictCursor
to return dicts etc.), it might make sense to be more explicit in this package on what's expected to be returned - and cast items to the correct type where unexpected formats are found.The text was updated successfully, but these errors were encountered: