Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: cross account catalog_id glue client function calls #370

Merged
merged 10 commits into from
Sep 7, 2023

Conversation

brunofaustino
Copy link
Contributor

Description

Using Athena with Glue in cross-account environments results in the error bellow:

CleanShot 2023-08-11 at 12 12 51@2x

Context: Currently, I'm using the Athena service within an AWS account (Account A) to query data hosted in a Glue Catalog situated in another AWS account (Account B). The glue client typically infers this information from the active account's ID. However, I have identified an error that occurs due to an account ID parameter that needs adjustment to reflect the account ID where the tables reside (Account B, in this case).

In this scenario, it is advisable to provide the catalog_id when invoking the Glue client function.

Models used to test - Optional

Checklist

  • You followed contributing section
  • You kept your Pull Request small and focused on a single feature or bug fix.
  • You added unit testing when necessary
  • You added functional testing when necessary

@brunofaustino
Copy link
Contributor Author

brunofaustino commented Aug 11, 2023

I've noticed potentially redundant sections due to the repeated calls to:

data_catalog = self._get_data_catalog(relation.database)
catalog_id = get_catalog_id(data_catalog)

I'm not familiar with the detailed structure of DBT's objects and classes. I'm uncertain whether it's feasible to directly incorporate catalog_id as an attribute of the Relation. This would potentially enable a streamlined approach like glue_client.get_table(CatalogId=relation.catalog_id, ...). Are there any recommendations for enhancing this approach?

@nicor88
Copy link
Contributor

nicor88 commented Aug 25, 2023

@brunofaustino I've a similar setup - I use account B (consumer) to query data from a table in account A (producer) - and all works fine for me. Did you create a resource link for your database in lakeformation?
But based on your scenario- are you planning to write to the table in the account from where you use athena?

I'm not 100% sure that this changes are needed.

@brunofaustino
Copy link
Contributor Author

@nicor88 I'm write and read tables in account A (producer), performing these operations through account B (consumer).
Therefore, no table is written directly to account B (consumer).

Due to specific characteristics of the project I'm working on, I'm not using Lakeformation (where I would use use resource link).
Instead, I'm leveraging Athena's cross-account AWS Glue catalog feature to register the pre-existing AWS Glue catalog in account A within the environment of account B. By doing so, we register the catalog (from account A) as an Athena DataCatalog resource within account B (consumer).

As a result, we use Athena in account B (consumer) to register the catalog that originates from account A (producer), thereby enabling Athena to execute cross-account queries.

In this scenario, the execution of dbt-athena fails when calling the glue_client.get_table function, requiring the CatalogId parameter to be provided.

@nicor88
Copy link
Contributor

nicor88 commented Aug 25, 2023

@brunofaustino thanks. I believe that your implementation should work, please fix the CI.

@nicor88 nicor88 added the enable-functional-tests Label to trigger functional testing label Aug 29, 2023
@brunofaustino
Copy link
Contributor Author

brunofaustino commented Sep 5, 2023

@nicor88
I checked that the CI now is failing with "An error occurred (InvalidClientTokenId) when calling the GetCallerIdentity operation: The security token included in the request is invalid.".

Could you please validate this?

@nicor88
Copy link
Contributor

nicor88 commented Sep 5, 2023

@brunofaustino the issue is most probably due to the fact that the mock calls on aws are not mocked as they should - therefore the unit tests must be re-adapted I believe.

@nicor88
Copy link
Contributor

nicor88 commented Sep 6, 2023

@svdimchenko @Jrmyy do you want to have another look?

@nicor88
Copy link
Contributor

nicor88 commented Sep 7, 2023

I tried this PR with the setup that I currently use, where I access cross account tables via resource links and lakeformation sharing - all still worked good.

@nicor88 nicor88 merged commit 905746f into dbt-labs:main Sep 7, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enable-functional-tests Label to trigger functional testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants