Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support s3_data_dir and s3_data_naming #39

Merged
merged 11 commits into from
Nov 21, 2022
Merged

Support s3_data_dir and s3_data_naming #39

merged 11 commits into from
Nov 21, 2022

Conversation

nicor88
Copy link
Contributor

@nicor88 nicor88 commented Nov 19, 2022

What

This PR brings back #4 originally covered in Tomme/dbt-athena#50

Compare to the original PR, this PR changes add the possibility to setup the s3_data_naming on the model config and fallback to the global target value. Also this change is not introducing breaking changes, as we should set data dir is optional and use the default staging dir as fallback.

I refactored also iceberg implementation to have the same beahvior, as before we created another way to store the data, this is more uniformed.

To test I setup this in my profile:

athena:
  target: dev
  outputs:
    dev:
      type: athena
      s3_staging_dir: s3://my_bucket/tmp/
      s3_data_dir: s3://my_bucket/
      s3_data_naming: schema_table
      region_name: eu-central-1
      schema: silver
      database: awsdatacatalog
      work_group: dbt_athena_v3
      num_retries: 1

Models used for testing

uuid

{{ config(
    materialized='table',
    s3_data_naming='uuid'
) }}


SELECT 'A' AS user_id, 'active' AS status, 'pi' AS name
UNION ALL
SELECT 'B' AS user_id, 'active' AS status, 'sh' AS name
UNION ALL
SELECT 'C' AS user_id, 'active' AS status, 'zh' AS name

the default table location was s3://my_bucket/a700e6ba-508b-4037-aa11-52239c3b6931

schema_table

{{ config(
    materialized='table',
    s3_data_naming='schema_table'
) }}


SELECT 'A' AS user_id, 'active' AS status, 'pi' AS name
UNION ALL
SELECT 'B' AS user_id, 'active' AS status, 'sh' AS name
UNION ALL
SELECT 'C' AS user_id, 'active' AS status, 'zh' AS name

the default table location was s3://my_bucket/my_schema/my_model

schema_table_unique

{{ config(
    materialized='table',
    s3_data_naming='schema_table_unique'
) }}


SELECT 'A' AS user_id, 'active' AS status, 'pi' AS name
UNION ALL
SELECT 'B' AS user_id, 'active' AS status, 'sh' AS name
UNION ALL
SELECT 'C' AS user_id, 'active' AS status, 'zh' AS name

the default table location was s3://my_bucket/my_schema/example/96e19633-6794-40fc-bbc6-28d99d43ab44

external_location

{{ config(
    materialized='table',
    external_location='s3://' + env_var('SILVER_S3_BUCKET') + '/custom_example/',
    s3_data_naming='schema_table_unique'
) }}


SELECT 'A' AS user_id, 'active' AS status, 'pi' AS name
UNION ALL
SELECT 'B' AS user_id, 'active' AS status, 'sh' AS name
UNION ALL
SELECT 'C' AS user_id, 'active' AS status, 'zh' AS name

the s3_data_naming is ignored and then the external location is used.

I also tested to don't set the s3_data_dir, and by default the staging_dir is used.

@nicor88 nicor88 marked this pull request as draft November 19, 2022 19:40
@nicor88 nicor88 added enhancement enhancement New feature or request labels Nov 19, 2022
@nicor88 nicor88 marked this pull request as ready for review November 19, 2022 22:53
@nicor88 nicor88 self-assigned this Nov 19, 2022
@nicor88
Copy link
Contributor Author

nicor88 commented Nov 20, 2022

FYI @jessedobbelaere

@nicor88 nicor88 marked this pull request as draft November 20, 2022 14:28
@nicor88 nicor88 marked this pull request as ready for review November 20, 2022 19:45
@nicor88 nicor88 requested review from jessedobbelaere and Jrmyy and removed request for jessedobbelaere November 20, 2022 19:45
@nicor88 nicor88 merged commit eb49b61 into main Nov 21, 2022
@nicor88 nicor88 deleted the feature/data-dir branch November 21, 2022 08:17
@VDFaller
Copy link

VDFaller commented Dec 8, 2022

For me, the location is coming up as s3://my_bucket/dbt/my_schemamymodel with no directory between schema and model.

dbt-athena-community 1.3.3

athena:
  outputs:
    dev:
      aws_profile_name: somethingfake
      database: awsdatacatalog
      region_name: us-west-2
      s3_data_dir: s3://my_bucket/dbt/
      s3_data_naming: schema_table
      s3_staging_dir: s3://my_staging_bucket/dbt/
      schema: my_schema
      type: athena
      work_group: somethingfake
      threads: 8
      num_retries: 1

@nicor88
Copy link
Contributor Author

nicor88 commented Dec 8, 2022

@VDFaller there was no change on that section of the code between 1.3.2 and 1.3.3. I just tested with one of my testing models and seems working fine for me.
Could you share your model config??

@VDFaller
Copy link

VDFaller commented Dec 8, 2022

None. Just default.
Edit oh, I think I still had the old version of dbt-athena (non-community) installed as well. Let me double check this. Sorry .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants