-
Notifications
You must be signed in to change notification settings - Fork 69
Troubleshooting_ETL
General tips for troubleshooting the XDMoD ETL pipeline.
In many cases, the XDMoD ETL will query a data source, transform the data, and load it into one or more destination tables using LOAD DATA INFILE
. Data is typically queried using an unbuffered query and loaded in manageable chunks (e.g., 250k records at a time) while processing the source data. For large result sets, an unbuffered query is needed since the data cannot be placed into memory all at once.
while ( $record = $unbufferedStmt->fetch() ) {
$transformedRecord = transform($record);
writeRecordToFile($transformedRecord);
$numRecordsInFile++;
if ( $numRecordsInFile == self::MAX_RECORDS_PER_INFILE ) {
loadDataInfile();
$numRecordsInFile = 0;
}
}
On a busy server or slower hardware, it is possible that the mysql server will close the connection on us if we are using an unbuffered query and the processing of the LOAD DATA INFILE
takes longer than the net_write_timeout (See net_write_timeout).
When the number of records hits a threshold, the files are loaded and the client process does not read from the connection during loading. If loading takes longer than the net_write_timeout
, the server will close the connection on us. The client will read whatever is left in the result buffer and assume that all is well until another operation is attempted on the connection at which point we will get the ambiguous MySQL server has gone away error. Presumably there is some buffer when the server writes result data to the connection so we do not immediately see an error.
The ETL process attempts to adjust the value of net_write_timeout
based on the number of destination files that it must write to, but if these errors persist try increasing the value of net_write_timeout
in the mysql server configuration file. The location of the configuration file varies according to Linux distribution: Centos/RedHat use /etc/my.cnf.d/server.cnf
and Debian/Ubuntu use /etc/mysql/my.cnf
.