Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint #411

Open
lightjay opened this issue Aug 15, 2024 · 5 comments
Labels
Awaiting Action/Author bug Something isn't working

Comments

@lightjay
Copy link

lightjay commented Aug 15, 2024

I'm getting this error when running a pipeline from a custom tap build on the SDK (tap-pipedrive).

sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "leads_pkey"
DETAIL:  Key (id)=(8e7e49f0-affa-11ee-842e-53335752a2c3) already exists.

I've double checked the results of the tap and it doesn't seem to be sending duplicates. This error occurs on the first record of the stream. Does this target not overwrite pre-existing rows?

Supposedly this issue was already raised with #28 and fixed with #29 but if so, I'm not sure why I'm getting this ...

@edgarrmondragon
Copy link
Member

Hey @lightjay 👋!

Do you have a stream of Singer messages that can be used to reproduce this? Meaning both

  1. does meltano invoke tap-pipedrive > singer.jsonl followed by cat singer.jsonl | meltano invoke target-postgres also fail?

  2. and if so, is there a minimal singer.jsonl you can share that reproduces this error?

This would make it easier and quicker for us to debug and patch.

Thanks for reporting!

@edgarrmondragon edgarrmondragon added bug Something isn't working Awaiting Action/Author labels Aug 15, 2024
@edgarrmondragon
Copy link
Member

Also, someone pointed out in Slack that this may be relevant: psycopg/psycopg2#1036.

@lightjay
Copy link
Author

@edgarrmondragon yes running cat singer.jsonl | meltano invoke target-postgres will produce the same error.

Minimal singer.jsonl file:

{"type":"STATE","value":{}}
{"type":"SCHEMA","stream":"files","schema":{"properties":{"active_flag":{"type":["boolean","null"]},"activity_id":{"type":["integer","null"]},"add_time":{"format":"date-time","type":["string","null"]},"cid":{"type":["string","null"]},"deal_id":{"type":["integer","null"]},"deal_name":{"type":["string","null"]},"description":{"type":["string","null"]},"file_name":{"type":["string","null"]},"file_size":{"type":["integer","null"]},"file_type":{"type":["string","null"]},"id":{"type":["integer","null"]},"inline_flag":{"type":["boolean","null"]},"lead_id":{"type":["integer","null"]},"lead_name":{"type":["string","null"]},"log_id":{"type":["integer","null"]},"mail_message_id":{"type":["integer","null"]},"mail_template_id":{"type":["integer","null"]},"name":{"type":["string","null"]},"org_id":{"type":["integer","null"]},"org_name":{"type":["string","null"]},"person_id":{"type":["integer","null"]},"person_name":{"type":["string","null"]},"product_id":{"type":["integer","null"]},"product_name":{"type":["string","null"]},"remote_id":{"type":["string","null"]},"remote_location":{"type":["string","null"]},"s3_bucket":{"type":["string","null"]},"update_time":{"format":"date-time","type":["string","null"]},"url":{"type":["string","null"]},"user_id":{"type":["integer","null"]}},"type":"object"},"key_properties":["id"]}
{"type":"RECORD","stream":"files","record":{"id":1},"time_extracted":"2024-08-15T23:17:56.678252+00:00"}
{"type":"STATE","value":{"bookmarks":{"files":{}}}}

Be aware that this table in postgres already has pretty much all the rows that are trying to be inserted. Though only trying to update one of those rows appears to produce the error.

Here are some (perhaps) relevant logs:

2024-08-15 17:27:38,539 | INFO     | target-postgres      | Target 'target-postgres' is listening for input from tap.
2024-08-15 17:27:38,540 | INFO     | target-postgres      | Initializing 'target-postgres' target sink...
2024-08-15 17:27:38,540 | INFO     | target-postgres.activities | Initializing target sink for stream 'activities'...
2024-08-15 17:27:41,729 | INFO     | target-postgres      | Initializing 'target-postgres' target sink...
2024-08-15 17:27:41,729 | INFO     | target-postgres.activity_types | Initializing target sink for stream 'activity_types'...
2024-08-15 17:27:43,488 | INFO     | target-postgres      | Initializing 'target-postgres' target sink...
2024-08-15 17:27:43,488 | INFO     | target-postgres.deals | Initializing target sink for stream 'deals'...
2024-08-15 17:27:45,300 | INFO     | target-postgres      | Initializing 'target-postgres' target sink...
2024-08-15 17:23:14,128 | INFO     | target-postgres.files | Initializing target sink for stream 'files'...
2024-08-15 17:23:17,192 | INFO     | target-postgres      | Target sink for 'files' is full. Current size is '10000'. Draining...
2024-08-15 17:23:18,936 | INFO     | target-postgres.files | Inserting with SQL: INSERT INTO "8f74096d_2298_40f0_b823_b454af8a61c1" (active_flag, activity_id, add_time, cid, deal_id, deal_name, description, file_name, file_size, file_type, id, inline_flag, lead_id, lead_name, log_id, mail_message_id, mail_template_id, name, org_id, org_name, person_id, person_name, product_id, product_name, remote_id, remote_location, s3_bucket, update_time, url, user_id, _sdc_extracted_at, _sdc_received_at, _sdc_batched_at, _sdc_deleted_at, _sdc_sequence, _sdc_table_version, _sdc_sync_started_at) VALUES (:active_flag, :activity_id, :add_time, :cid, :deal_id, :deal_name, :description, :file_name, :file_size, :file_type, :id, :inline_flag, :lead_id, :lead_name, :log_id, :mail_message_id, :mail_template_id, :name, :org_id, :org_name, :person_id, :person_name, :product_id, :product_name, :remote_id, :remote_location, :s3_bucket, :update_time, :url, :user_id, :_sdc_extracted_at, :_sdc_received_at, :_sdc_batched_at, :_sdc_deleted_at, :_sdc_sequence, :_sdc_table_version, :_sdc_sync_started_at)
Traceback (most recent call last):
  File "/Users/jl/Documents/code/investor-nest-elt/meltano/.meltano/loaders/target-postgres/venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1967, in _exec_single_context
    self.dialect.do_execute(
  File "/Users/jl/Documents/code/investor-nest-elt/meltano/.meltano/loaders/target-postgres/venv/lib/python3.12/site-packages/sqlalchemy/engine/default.py", line 941, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "files_pkey"
DETAIL:  Key (id)=(1) already exists.

...

[SQL: UPDATE pipedrive.files SET active_flag="8f74096d_2298_40f0_b823_b454af8a61c1".active_flag, activity_id="8f74096d_2298_40f0_b823_b454af8a61c1".activity_id, add_time="8f74096d_2298_40f0_b823_b454af8a61c1".add_time, cid="8f74096d_2298_40f0_b823_b454af8a61c1".cid, deal_id="8f74096d_2298_40f0_b823_b454af8a61c1".deal_id, deal_name="8f74096d_2298_40f0_b823_b454af8a61c1".deal_name, description="8f74096d_2298_40f0_b823_b454af8a61c1".description, file_name="8f74096d_2298_40f0_b823_b454af8a61c1".file_name, file_size="8f74096d_2298_40f0_b823_b454af8a61c1".file_size, file_type="8f74096d_2298_40f0_b823_b454af8a61c1".file_type, id="8f74096d_2298_40f0_b823_b454af8a61c1".id, inline_flag="8f74096d_2298_40f0_b823_b454af8a61c1".inline_flag, lead_id="8f74096d_2298_40f0_b823_b454af8a61c1".lead_id, lead_name="8f74096d_2298_40f0_b823_b454af8a61c1".lead_name, log_id="8f74096d_2298_40f0_b823_b454af8a61c1".log_id, mail_message_id="8f74096d_2298_40f0_b823_b454af8a61c1".mail_message_id, mail_template_id="8f74096d_2298_40f0_b823_b454af8a61c1".mail_template_id, name="8f74096d_2298_40f0_b823_b454af8a61c1".name, org_id="8f74096d_2298_40f0_b823_b454af8a61c1".org_id, org_name="8f74096d_2298_40f0_b823_b454af8a61c1".org_name, person_id="8f74096d_2298_40f0_b823_b454af8a61c1".person_id, person_name="8f74096d_2298_40f0_b823_b454af8a61c1".person_name, product_id="8f74096d_2298_40f0_b823_b454af8a61c1".product_id, product_name="8f74096d_2298_40f0_b823_b454af8a61c1".product_name, remote_id="8f74096d_2298_40f0_b823_b454af8a61c1".remote_id, remote_location="8f74096d_2298_40f0_b823_b454af8a61c1".remote_location, s3_bucket="8f74096d_2298_40f0_b823_b454af8a61c1".s3_bucket, update_time="8f74096d_2298_40f0_b823_b454af8a61c1".update_time, url="8f74096d_2298_40f0_b823_b454af8a61c1".url, user_id="8f74096d_2298_40f0_b823_b454af8a61c1".user_id, _sdc_extracted_at="8f74096d_2298_40f0_b823_b454af8a61c1"._sdc_extracted_at, _sdc_received_at="8f74096d_2298_40f0_b823_b454af8a61c1"._sdc_received_at, _sdc_batched_at="8f74096d_2298_40f0_b823_b454af8a61c1"._sdc_batched_at, _sdc_deleted_at="8f74096d_2298_40f0_b823_b454af8a61c1"._sdc_deleted_at, _sdc_sequence="8f74096d_2298_40f0_b823_b454af8a61c1"._sdc_sequence, _sdc_table_version="8f74096d_2298_40f0_b823_b454af8a61c1"._sdc_table_version, _sdc_sync_started_at="8f74096d_2298_40f0_b823_b454af8a61c1"._sdc_sync_started_at FROM "8f74096d_2298_40f0_b823_b454af8a61c1" WHERE "8f74096d_2298_40f0_b823_b454af8a61c1".id = pipedrive.files.id]

As for the linked issue: you can see from the logs that it isn't trying to run a CREATE statement but an UPDATE or INSERT INTO. Don't know if that really excludes it from consideration, though ...

@acarter24
Copy link

acarter24 commented Aug 20, 2024

Just FYI I tried setting

config:
    parallelism: 1
    parallelism_max: 1

But still received same error.

EDIT: Sorry, this relates to pipelinewise variant

@visch
Copy link
Member

visch commented Aug 21, 2024

@edgarrmondragon yes running cat singer.jsonl | meltano invoke target-postgres will produce the same error.

Minimal singer.jsonl file:

{"type":"STATE","value":{}}
{"type":"SCHEMA","stream":"files","schema":{"properties":{"active_flag":{"type":["boolean","null"]},"activity_id":{"type":["integer","null"]},"add_time":{"format":"date-time","type":["string","null"]},"cid":{"type":["string","null"]},"deal_id":{"type":["integer","null"]},"deal_name":{"type":["string","null"]},"description":{"type":["string","null"]},"file_name":{"type":["string","null"]},"file_size":{"type":["integer","null"]},"file_type":{"type":["string","null"]},"id":{"type":["integer","null"]},"inline_flag":{"type":["boolean","null"]},"lead_id":{"type":["integer","null"]},"lead_name":{"type":["string","null"]},"log_id":{"type":["integer","null"]},"mail_message_id":{"type":["integer","null"]},"mail_template_id":{"type":["integer","null"]},"name":{"type":["string","null"]},"org_id":{"type":["integer","null"]},"org_name":{"type":["string","null"]},"person_id":{"type":["integer","null"]},"person_name":{"type":["string","null"]},"product_id":{"type":["integer","null"]},"product_name":{"type":["string","null"]},"remote_id":{"type":["string","null"]},"remote_location":{"type":["string","null"]},"s3_bucket":{"type":["string","null"]},"update_time":{"format":"date-time","type":["string","null"]},"url":{"type":["string","null"]},"user_id":{"type":["integer","null"]}},"type":"object"},"key_properties":["id"]}
{"type":"RECORD","stream":"files","record":{"id":1},"time_extracted":"2024-08-15T23:17:56.678252+00:00"}
{"type":"STATE","value":{"bookmarks":{"files":{}}}}

Be aware that this table in postgres already has pretty much all the rows that are trying to be inserted. Though only trying to update one of those rows appears to produce the error.

Here are some (perhaps) relevant logs:

2024-08-15 17:27:38,539 | INFO     | target-postgres      | Target 'target-postgres' is listening for input from tap.
2024-08-15 17:27:38,540 | INFO     | target-postgres      | Initializing 'target-postgres' target sink...
2024-08-15 17:27:38,540 | INFO     | target-postgres.activities | Initializing target sink for stream 'activities'...
2024-08-15 17:27:41,729 | INFO     | target-postgres      | Initializing 'target-postgres' target sink...
2024-08-15 17:27:41,729 | INFO     | target-postgres.activity_types | Initializing target sink for stream 'activity_types'...
2024-08-15 17:27:43,488 | INFO     | target-postgres      | Initializing 'target-postgres' target sink...
2024-08-15 17:27:43,488 | INFO     | target-postgres.deals | Initializing target sink for stream 'deals'...
2024-08-15 17:27:45,300 | INFO     | target-postgres      | Initializing 'target-postgres' target sink...
2024-08-15 17:23:14,128 | INFO     | target-postgres.files | Initializing target sink for stream 'files'...
2024-08-15 17:23:17,192 | INFO     | target-postgres      | Target sink for 'files' is full. Current size is '10000'. Draining...
2024-08-15 17:23:18,936 | INFO     | target-postgres.files | Inserting with SQL: INSERT INTO "8f74096d_2298_40f0_b823_b454af8a61c1" (active_flag, activity_id, add_time, cid, deal_id, deal_name, description, file_name, file_size, file_type, id, inline_flag, lead_id, lead_name, log_id, mail_message_id, mail_template_id, name, org_id, org_name, person_id, person_name, product_id, product_name, remote_id, remote_location, s3_bucket, update_time, url, user_id, _sdc_extracted_at, _sdc_received_at, _sdc_batched_at, _sdc_deleted_at, _sdc_sequence, _sdc_table_version, _sdc_sync_started_at) VALUES (:active_flag, :activity_id, :add_time, :cid, :deal_id, :deal_name, :description, :file_name, :file_size, :file_type, :id, :inline_flag, :lead_id, :lead_name, :log_id, :mail_message_id, :mail_template_id, :name, :org_id, :org_name, :person_id, :person_name, :product_id, :product_name, :remote_id, :remote_location, :s3_bucket, :update_time, :url, :user_id, :_sdc_extracted_at, :_sdc_received_at, :_sdc_batched_at, :_sdc_deleted_at, :_sdc_sequence, :_sdc_table_version, :_sdc_sync_started_at)
Traceback (most recent call last):
  File "/Users/jl/Documents/code/investor-nest-elt/meltano/.meltano/loaders/target-postgres/venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1967, in _exec_single_context
    self.dialect.do_execute(
  File "/Users/jl/Documents/code/investor-nest-elt/meltano/.meltano/loaders/target-postgres/venv/lib/python3.12/site-packages/sqlalchemy/engine/default.py", line 941, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "files_pkey"
DETAIL:  Key (id)=(1) already exists.

...

[SQL: UPDATE pipedrive.files SET active_flag="8f74096d_2298_40f0_b823_b454af8a61c1".active_flag, activity_id="8f74096d_2298_40f0_b823_b454af8a61c1".activity_id, add_time="8f74096d_2298_40f0_b823_b454af8a61c1".add_time, cid="8f74096d_2298_40f0_b823_b454af8a61c1".cid, deal_id="8f74096d_2298_40f0_b823_b454af8a61c1".deal_id, deal_name="8f74096d_2298_40f0_b823_b454af8a61c1".deal_name, description="8f74096d_2298_40f0_b823_b454af8a61c1".description, file_name="8f74096d_2298_40f0_b823_b454af8a61c1".file_name, file_size="8f74096d_2298_40f0_b823_b454af8a61c1".file_size, file_type="8f74096d_2298_40f0_b823_b454af8a61c1".file_type, id="8f74096d_2298_40f0_b823_b454af8a61c1".id, inline_flag="8f74096d_2298_40f0_b823_b454af8a61c1".inline_flag, lead_id="8f74096d_2298_40f0_b823_b454af8a61c1".lead_id, lead_name="8f74096d_2298_40f0_b823_b454af8a61c1".lead_name, log_id="8f74096d_2298_40f0_b823_b454af8a61c1".log_id, mail_message_id="8f74096d_2298_40f0_b823_b454af8a61c1".mail_message_id, mail_template_id="8f74096d_2298_40f0_b823_b454af8a61c1".mail_template_id, name="8f74096d_2298_40f0_b823_b454af8a61c1".name, org_id="8f74096d_2298_40f0_b823_b454af8a61c1".org_id, org_name="8f74096d_2298_40f0_b823_b454af8a61c1".org_name, person_id="8f74096d_2298_40f0_b823_b454af8a61c1".person_id, person_name="8f74096d_2298_40f0_b823_b454af8a61c1".person_name, product_id="8f74096d_2298_40f0_b823_b454af8a61c1".product_id, product_name="8f74096d_2298_40f0_b823_b454af8a61c1".product_name, remote_id="8f74096d_2298_40f0_b823_b454af8a61c1".remote_id, remote_location="8f74096d_2298_40f0_b823_b454af8a61c1".remote_location, s3_bucket="8f74096d_2298_40f0_b823_b454af8a61c1".s3_bucket, update_time="8f74096d_2298_40f0_b823_b454af8a61c1".update_time, url="8f74096d_2298_40f0_b823_b454af8a61c1".url, user_id="8f74096d_2298_40f0_b823_b454af8a61c1".user_id, _sdc_extracted_at="8f74096d_2298_40f0_b823_b454af8a61c1"._sdc_extracted_at, _sdc_received_at="8f74096d_2298_40f0_b823_b454af8a61c1"._sdc_received_at, _sdc_batched_at="8f74096d_2298_40f0_b823_b454af8a61c1"._sdc_batched_at, _sdc_deleted_at="8f74096d_2298_40f0_b823_b454af8a61c1"._sdc_deleted_at, _sdc_sequence="8f74096d_2298_40f0_b823_b454af8a61c1"._sdc_sequence, _sdc_table_version="8f74096d_2298_40f0_b823_b454af8a61c1"._sdc_table_version, _sdc_sync_started_at="8f74096d_2298_40f0_b823_b454af8a61c1"._sdc_sync_started_at FROM "8f74096d_2298_40f0_b823_b454af8a61c1" WHERE "8f74096d_2298_40f0_b823_b454af8a61c1".id = pipedrive.files.id]

As for the linked issue: you can see from the logs that it isn't trying to run a CREATE statement but an UPDATE or INSERT INTO. Don't know if that really excludes it from consideration, though ...

I ran this locally and couldn't replicate the issue. Can you make sure you're on the latest target-postgres version with a meltano install --clean maybe share you meltano.yml

Steps to try to replicate

visch@DESKTOP-9BDPA9T:~/git/target-postgres$ cat test_411_out | meltano invoke target-postgres
visch@DESKTOP-9BDPA9T:~/git/target-postgres$ cat test_411_out | meltano invoke target-postgres

Both succeeded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting Action/Author bug Something isn't working
Projects
Status: No status
Development

No branches or pull requests

4 participants