Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IntegrityError('(psycopg2.IntegrityError) duplicate key value violates unique constraint "records_bibcode_key"\nDETAIL: Key (bibcode)=(2017arXiv170902052H) already exists.\n',) #31

Open
romanchyla opened this issue Dec 13, 2017 · 0 comments

Comments

@romanchyla
Copy link
Contributor

this is a very curious error, it only affected set of ~700K bibcodes that were submitted to fulltext pipeline

this is the stacktrace:

{"asctime": "2017-12-13T20:43:06.492Z", "msecs": 492.58899688720703, "levelname": "INFO", "process": 28524, "threadName": "MainThread", "filename": "__init__.py", "lineno": 459, "message": "Retrying 9eca350f-f79d-4307-9b01-c8cb3a0f2aac because of exc=(psycopg2.IntegrityError) duplicate key value violates unique constraint \"records_bibcode_key\"\nDETAIL:  Key (bibcode)=(2017arXiv170902022O) already exists.\n [SQL: 'INSERT INTO records (bibcode, bib_data, orcid_claims, nonbib_data, fulltext, metrics, bib_data_updated, orcid_claims_updated, nonbib_data_updated, fulltext_updated, metrics_updated, created, updated, processed, solr_processed, metrics_processed, status) VALUES (%(bibcode)s, %(bib_data)s, %(orcid_claims)s, %(nonbib_data)s, %(fulltext)s, %(metrics)s, %(bib_data_updated)s, %(orcid_claims_updated)s, %(nonbib_data_updated)s, %(fulltext_updated)s, %(metrics_updated)s, %(created)s, %(updated)s, %(processed)s, %(solr_processed)s, %(metrics_processed)s, %(status)s) RETURNING records.id'] [parameters: {'status': None, 'nonbib_data': None, 'fulltext': '{\"body\": \"Superposition as a Relativistic Filter\\\\n\\\\narXiv:1709.02022v1 [quant-ph] 6 Sep 2017\\\\n\\\\nG. N. Ord\\\\nDepartment of Mathematics\\\\nRyerson U ... (43326 characters truncated) ... uum limits and\\\\nschr\\\\u00f6dinger\\\\u2019s equation. Phys. Rev. A., 54(5):3772\\\\u20133778, 1996.\\\\n\\\\n19\\\\n\\\\n\\\\f\", \"bibcode\": \"2017arXiv170902022O\"}', 'bibcode': u'2017arXiv170902022O', 'created': datetime.datetime(2017, 12, 13, 20, 43, 6, 487915, tzinfo=tzutc()), 'orcid_claims_updated': None, 'updated': datetime.datetime(2017, 12, 13, 20, 43, 6, 486608, tzinfo=tzutc()), 'bib_data_updated': None, 'metrics': None, 'nonbib_data_updated': None, 'processed': None, 'fulltext_updated': datetime.datetime(2017, 12, 13, 20, 43, 6, 486608, tzinfo=tzutc()), 'metrics_processed': None, 'bib_data': None, 'orcid_claims': None, 'metrics_updated': None, 'solr_processed': None}]", "timestamp": "2017-12-13T20:43:06.492Z", "hostname": "adsvm07"}

it seems like a race condition; only happens when multiple workers are consuming 'update-records' queue

if I run it with a single worker, errors disappeared

what is also interesting, is the fact that the record gets created (so one of the workers succeeded) -- and it only affects deleted records; I checked a few and they were all deleted from the import pipeline

import_pipeline=> select * from change_log where key = 'deleted' and oldvalue = '2017arXiv170902057K';
   id   |          created           |   key   | newvalue |      oldvalue       
--------+----------------------------+---------+----------+---------------------
 336520 | 2017-12-11 21:44:40.231005 | deleted |          | 2017arXiv170902057K
(1 row)

import_pipeline=> select * from change_log where key = 'deleted' and oldvalue = '2017arXiv170902017E';                                                                                                              
   id   |          created           |   key   | newvalue |      oldvalue       
--------+----------------------------+---------+----------+---------------------
 239759 | 2017-12-11 21:41:56.536064 | deleted |          | 2017arXiv170902017E
(1 row)

import_pipeline=> select * from change_log where key = 'deleted' and oldvalue = '2017arXiv170902037B';                                                                                                              
   id   |          created           |   key   | newvalue |      oldvalue       
--------+----------------------------+---------+----------+---------------------
 668949 | 2017-12-11 21:53:51.495622 | deleted |          | 2017arXiv170902037B
(1 row)

import_pipeline=> select * from change_log where key = 'deleted' and oldvalue = '2017arXiv170901605M';                                                                                                              
   id   |          created           |   key   | newvalue |      oldvalue       
--------+----------------------------+---------+----------+---------------------
 635200 | 2017-12-11 21:52:55.371403 | deleted |          | 2017arXiv170901605M
master_pipeline=> select bibcode,created,updated,bib_data_updated,fulltext_updated from records where bibcode in ('2017arXiv170901605M', '2017arXiv170902017E', '2017arXiv170902037B', '2017arXiv170901605M');
       bibcode       |          created           |          updated           | bib_data_updated |      fulltext_updated      
---------------------+----------------------------+----------------------------+------------------+----------------------------
 2017arXiv170901605M | 2017-12-13 20:43:08.320375 | 2017-12-13 20:43:12.340331 |                  | 2017-12-13 20:43:12.340331
 2017arXiv170902017E | 2017-12-13 20:43:06.659393 | 2017-12-13 20:43:16.810655 |                  | 2017-12-13 20:43:16.810655
 2017arXiv170902037B | 2017-12-13 20:43:06.520504 | 2017-12-13 20:43:15.767556 |                  | 2017-12-13 20:43:15.767556
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant