Skip to content

Investigate/mitigate acquisition lock timeouts #1600

Open
@melange396

Description

@melange396

There appears to be a conflicting interaction between covidcast acquisition and the "coverage_crossref_updater" that causes acquisition to encounter a lock timeout (mysql.connector.errors.DatabaseError: 1205 (HY000): Lock wait timeout exceeded; try restarting transaction) which in turn causes failure on some input files. The failing files can be re-acquired, but this takes manual intervention and is suboptimal.

We have noticed a pattern in timing of errors for some patching job runs that leads to this hypothesis (see slack thread for more detail and links to logs). It makes sense because the updater has a long-running query that is reading from the same table that acquisition writes into.

Look into this to verify that it is indeed the case, and if so, figure out a way around it (which could just be "figure out how to prevent these jobs from running concurrently in our scheduler").

Metadata

Metadata

Assignees

No one assigned

    Labels

    acquisitionchanges acquisition logicbugdevopsbuilding, running, deploying, environment stuff, handy utils, repository-related, engineer QoL, etcmysqlmysql database related

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions