Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support s3_data_dir and s3_data_naming #4

Closed

Conversation

thenaturalist
Copy link

Ported PR from Tomme/dbt-athena#50 by @chronitis

@thenaturalist thenaturalist changed the base branch from main to release/1.0.4 October 25, 2022 19:30
@nicor88 nicor88 added enhancement enhancement New feature or request labels Oct 26, 2022
@mattiamatrix
Copy link
Contributor

@thenaturalist @nicor88 what is preventing this from behind merged? My team has been waiting for this for a very long time ;)

if creds.s3_data_dir is not None:
return creds.s3_data_dir
else:
return f"{creds.s3_staging_dir}tables/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If s3_staging_dir has a trailing slash or it miss a / we can have issues. We can do simply:
os.path.join(creds.s3_staging_dir, 'tables/')

@nicor88
Copy link
Contributor

nicor88 commented Nov 7, 2022

@mattiamatrix we wanted to full test the feature with all supported cases, the implementation looks actually good to me, I left a tiny comment. Also the README needs to be updated a bit more to make the usage a bit smoother.

@mattiamatrix
Copy link
Contributor

We might also want to ping the people (@chronitis @fbrueck) who were initially involved if they are still interested.

@chronitis
Copy link

I originally opened this, and we used it in production for maybe 6 months before we moved away from Athena. So I'm not in a position to test it further now, but if it's useful to anyone else, please feel free to merge.

It was workable and having predictable S3 paths made organising the resulting data much easier, although one issue encountered was that (at the time, I haven't checked if this has now changed) the temporary table used when building an incremental table always had a fixed name (table + _dbt_tmp) and hence a fixed path. That meant it wasn't possible to generate two independent sets of partitions at the same time. Fixed in northvolt/dbt-athena@223d2cc

@chronitis
Copy link

nb, one other feature I wrote which might be of interest: using the Glue API for listing tables, schemas, columns instead of INFORMATION_SCHEMA queries, which were generally a lot slower. I don't think I ever opened a PR for it, but again, you're welcome to pull it if it looks useful: northvolt/dbt-athena@49c7111

@nicor88
Copy link
Contributor

nicor88 commented Nov 7, 2022

Regarding the not unique path, I also add a workaround in the Iceberg implementation, I think that needs to be revisit a bit, but was similar to what you suggested.

Regarding #4 (comment) definitely worth to add it.

@mattiamatrix and @chronitis I'm wondering if you both want to be added as maintainers to help us out to deliver feature faster, we fork the repo to enable faster merge, but also me and @thenaturalist are quite busy. Feel free to reach out to me in Slack dbt.

@nicor88 nicor88 deleted the branch dbt-labs:release/1.0.4 November 15, 2022 07:50
@nicor88 nicor88 closed this Nov 15, 2022
@rumbin
Copy link

rumbin commented Nov 15, 2022

@nicor88 could you please provide a hint to why this PR has been closed?

@nicor88
Copy link
Contributor

nicor88 commented Nov 15, 2022

@rumbin just a mistake when we relesed the 1.0.4, the branch was deleted and the pr closed, we will re-open again.

@thenaturalist in case you have time, you can re-open it.

@mattiamatrix
Copy link
Contributor

Hey @nicor88, what is the plan for this PR?

@nicor88
Copy link
Contributor

nicor88 commented Nov 18, 2022

@mattiamatrix feel free to re-open it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants