-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specifying external_location isn't cleaning up the existing data. #107
Comments
The tables not getting cleaned up was resolved by #73 For making external_location dynamic in environments I specify a different staging directory for dev vs. prod and use the target.s3_staging_dir variable when setting the external location. {{ config(materialized='table', external_location=target.s3_staging_dir+'_tables/[table name]') }} |
I saw that one, but I updated to the latest version of DBT and it's still happening. Has #73 not been released yet? Also, I can't use the same bucket for staging as I do for external_location (a constraint from my IT).
But that's no bueno. 'ex_loc' is undefined. This can happen when calling a macro that does not exist. Check for typos and/or install package dependencies with "dbt deps". |
Did you update just DBT or dbt-athena plugin? I just updated the dbt-athena plugin and its working fine for me. I'm not sure if there is a way to get around the different buckets but did see #50 still open which looks like what we really need. |
@mrshu Well I'm still getting this issue. /shrug I'll try |
Okay so I think I found the problem. clean_up_table is using my default credentials, (which don't have permission to get table for this environment, not even remote related actually). So it's failing. And since the error code was I changed my profile to use default and changed that to the credentials that actually matter for this and it's running fine. With or without external location. I also changed the function to this and it worked even with default being unrelated. @available
def clean_up_table(
self, database_name: str, table_name: str
):
# Look up Glue partitions & clean up
conn = self.connections.get_thread_connection()
session = boto3.session.Session(profile_name=conn._credentials.aws_profile_name)
with boto3_client_lock:
glue_client = session.client('glue')
try:
print("trying to get table")
table = glue_client.get_table(
DatabaseName=database_name,
Name=table_name
)
except ClientError as e:
if e.response['Error']['Code'] == 'EntityNotFoundException':
logger.debug("Table '{}' does not exists - Ignoring", table_name)
return
if table is not None:
logger.debug("Deleting table data from'{}'", table["Table"]["StorageDescriptor"]["Location"])
p = re.compile('s3://([^/]*)/(.*)')
m = p.match(table["Table"]["StorageDescriptor"]["Location"])
if m is not None:
bucket_name = m.group(1)
prefix = m.group(2)
s3_resource = session.resource('s3')
s3_bucket = s3_resource.Bucket(bucket_name)
s3_bucket.objects.filter(Prefix=prefix).delete() Edit Oh but this won't work for you CI Pipeline since you use env vars. crap. |
@VDFaller Thank you for sharing your research in the issue!! |
Problem
I'm not understanding the accepted way to make
external_location
dynamic from dev to production.I unfortunately can't use the
s3_staging_dir
fromprofiles.yml
because of a constraint of my IT. Also the standard{s3_staging_dir}/tables/uuid
doesn't make it easy to know which data goes with which table while looking through s3.Because right now if I run it twice I'm getting
my tbl_test is simply
I'm hoping I don't have to manually drop the table
Pipe dream
I'd love it if I could have the
external_location
in the form ofs3://my-bucket/some_extra_path/tbl_schema/tbl_name
Where I give the external_location prefix (different one for dev, qa, prod) and the file would know what table schema it is going to and what the table name is without me having to configure it.
Possible fix
PR 74 seems to fix the issue.
The text was updated successfully, but these errors were encountered: