Adding a new "precombine" upsert method to redshift #1284

bnimam · 2022-04-18T19:56:38Z

bnimam
Apr 18, 2022

Hello awslabs team.

I figured I'd start a discussion on this before writing an implementation.

Our team has a need for a more complex upsert method into redshift borrowing an idea from Apache Hudi called a precombine field

Essentially, we receive files with IDs used as primary keys, and we want to update these records based on files we receive. Building in file processing order guarantees is harder to implement and maintain for us than it would be to more intelligently upsert data. The precombine field would allow us to specify a column of which to keep the highest value between the target table and the stage table (file being processed).

Imagine we have table users like

ID	Name	Date
1	bob	2022-04-02
2	sally	2022-04-02
3	sue	2022-04-02

And we want to use Date as the precombine field, which is extracted from our processed file. So if we had candidate file like

ID	Name	Date
1	bob	2022-04-02
2	sal	2022-04-03
3	suzy	2022-04-01

Our resulting table would be

ID	Name	Date
1	bob	2022-04-02
2	sal	2022-04-03
3	sue	2022-04-02

This could be achieved by first deleting from the stage table with a query like

DELETE FROM users_stage
USING users
WHERE users.id = users_stage.id
AND users.date >= users_stage.date

Result

ID	Name	Date
2	sal	2022-04-03

and

DELETE FROM users
USING users_stage
WHERE users.id = users_stage.id
AND users.date < users_stage.date

Result

ID	Name	Date
1	bob	2022-04-02
3	sue	2022-04-02

Finally

INSERT INTO users
(SELECT * FROM users_stage)

If this is a feature that would be an accepted addition, I will go ahead with implementing it, otherwise I will not waste my time.

jaidisido · 2022-04-22T10:55:20Z

jaidisido
Apr 22, 2022
Maintainer

Hi @bnimam, thank you for this suggestion, it could certainly be an interesting additional pattern to include in our Redshit api methods. I have transferred this to an issue. Looking forward to your PR and collaborating on this

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding a new "precombine" upsert method to redshift #1284

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Adding a new "precombine" upsert method to redshift #1284

bnimam Apr 18, 2022

Replies: 1 comment

jaidisido Apr 22, 2022 Maintainer

bnimam
Apr 18, 2022

jaidisido
Apr 22, 2022
Maintainer