Skip to content

Troubleshooting_ETL

Joe White edited this page Oct 22, 2021 · 4 revisions

Troubleshooting the XDMoD ETL

General tips for troubleshooting the XDMoD ETL pipeline.

"MySQL server has gone away" error during ingestion

In many cases, the XDMoD ETL will query a data source, transform the data, and load it into one or more destination tables using LOAD DATA INFILE. Data is typically queried using an unbuffered query and loaded in manageable chunks (e.g., 250k records at a time) while processing the source data. For large result sets, an unbuffered query is needed since the data cannot be placed into memory all at once.

while ( $record = $unbufferedStmt->fetch() ) {
    $transformedRecord = transform($record);
    writeRecordToFile($transformedRecord);
    $numRecordsInFile++;
    if ( $numRecordsInFile == self::MAX_RECORDS_PER_INFILE ) {
        loadDataInfile();
        $numRecordsInFile = 0;
    }
}

On a busy server or slower hardware, it is possible that the mysql server will close the connection on us if we are using an unbuffered query and the processing of the LOAD DATA INFILE takes longer than the net_write_timeout (See net_write_timeout).

When the number of records hits a threshold, the files are loaded and the client process does not read from the connection during loading. If loading takes longer than the net_write_timeout, the server will close the connection on us. The client will read whatever is left in the result buffer and assume that all is well until another operation is attempted on the connection at which point we will get the ambiguous MySQL server has gone away error. Presumably there is some buffer when the server writes result data to the connection so we do not immediately see an error.

Solution

The ETL process attempts to adjust the value of net_write_timeout based on the number of destination files that it must write to, but if these errors persist try increasing the value of net_write_timeout in the mysql server configuration file. The location of the configuration file varies according to Linux distribution: Centos/RedHat use /etc/my.cnf.d/server.cnf and Debian/Ubuntu use /etc/mysql/my.cnf.