Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve table permissions on SaveMode.overwrite #126

Open
lakshman-kollipara opened this issue Apr 4, 2023 · 3 comments
Open

Preserve table permissions on SaveMode.overwrite #126

lakshman-kollipara opened this issue Apr 4, 2023 · 3 comments

Comments

@lakshman-kollipara
Copy link

Hi

We have a use case of overwriting data in existing Redshift table. Using "SaveMode.Overwrite" is dropping table metadata and hence losing all the table permissions. We cannot really use the workaround of truncating the table in preActions and populate the table because the current implementation runs each query in preActions and postActions in separate individual transactions by setting auto-commit to true.

Do we have any alternative to preserve table permissions when using "SaveMode.Overwrite" or a way to run preActions and postActions within a single transaction block?

@jsleight
Copy link
Collaborator

jsleight commented Apr 4, 2023

Can you write the data into a staging table, then merge the staging table into your main table?

We have auto-commit as false though each of the pre-/post-Actions is executed serially via jdbc.

SaveMode.Overwrite drops the table if it exists, so makes sense that it would also drop the metadata. Saving the metadata before dropping it would be a nice improvement -- though that is kind of the staging table approach I guess?

@lakshman-kollipara
Copy link
Author

lakshman-kollipara commented Apr 4, 2023

The problem with staging table is same as mentioned earlier. If postActions are executing in individual transaction blocks, It could cause dirty-reads because Truncate happens in one transaction and loading from stage table happens in another transaction. And usually the INSERT into SELECT queries take significant amount of time for medium-to-huge datasets.

Currently, The autocommit is set to false before doRedshiftLoad and we are committing if the method didn't return any errors. Does that mean All the preActions, Drop table, createTable and postActions are running within a single transaction block?

@jsleight
Copy link
Collaborator

jsleight commented Apr 4, 2023

Caveat that I'm not being too familiar with the RedshiftWriter and the difference between the jdbc query execution and the transaction blocks is messing with me.

But from what I understand, we only do the commit once at the end -- so it should it be a single transaction block.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants