-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUGFIX] Added get_partition pagination #82
base: master
Are you sure you want to change the base?
Changes from 9 commits
bffec3f
3835cc4
2c517c9
3dfff0c
f237ea2
02ac5f3
59a886b
aa99e66
daad664
7e21301
49eedb4
e8b1995
3667334
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -58,14 +58,19 @@ def clean_up_partitions( | |
with boto3_client_lock: | ||
glue_client = boto3.client('glue', region_name=client.region_name) | ||
s3_resource = boto3.resource('s3', region_name=client.region_name) | ||
partitions = glue_client.get_partitions( | ||
# CatalogId='123456789012', # Need to make this configurable if it is different from default AWS Account ID | ||
DatabaseName=database_name, | ||
TableName=table_name, | ||
Expression=where_condition | ||
) | ||
p = re.compile('s3://([^/]*)/(.*)') | ||
for partition in partitions["Partitions"]: | ||
paginator = glue_client.get_paginator("get_partitions") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if you have a chance, can you check what would happen if the current user doesn't have enough permission to fetch the partitions? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You bet. I should have some time to look into it tomorrow to check the permission case and run through the proper dbt-athena testing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have validated that the dbt run fails before doing any work when permission to access the permissions is not available:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @DVerzal this is an issue due to missing permissions. You are missing a |
||
partition_params = { | ||
"DatabaseName": database_name, | ||
"TableName": table_name, | ||
"Expression": where_condition, | ||
"ExcludeColumnSchema": True, | ||
} | ||
partition_pg = paginator.paginate(**partition_params) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. alternatively you could do this
Doing so you avoid a for loop, that will happen automatically when calling |
||
partitions = [] | ||
for pg in partition_pg: | ||
partitions.extend(pg["Partitions"]) | ||
p = re.compile("s3://([^/]*)/(.*)") | ||
for partition in partitions: | ||
logger.debug("Deleting objects for partition '{}' at '{}'", partition["Values"], partition["StorageDescriptor"]["Location"]) | ||
m = p.match(partition["StorageDescriptor"]["Location"]) | ||
if m is not None: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you have a chance, can you check what would happen if the current user doesn't have enough permission to fetch the partitions?