Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: multi-thread issues by removing global boto3 session #45

Merged

Conversation

jessedobbelaere
Copy link
Contributor

@jessedobbelaere jessedobbelaere commented Nov 21, 2022

I run a dbt project usually with threads: 4 or threads: 8. When I switched my projects away from the Tomme/dbt-athena to this adapter, I received:

18:29:33.190966 [info ] [MainThread]: Found 79 models, 9 tests, 0 snapshots, 0 analyses, 510 macros, 0 operations, 0 seed files, 64 sources, 0 exposures, 0 metrics
18:29:33.193761 [info ] [MainThread]: 
18:29:33.194168 [debug] [MainThread]: Acquiring new athena connection "master"
18:29:33.197412 [debug] [ThreadPool]: Acquiring new athena connection "list_awsdatacatalog"
18:29:33.198456 [debug] [ThreadPool]: Acquiring new athena connection "list_awsdatacatalog"
18:29:33.198652 [debug] [ThreadPool]: Acquiring new athena connection "list_awsdatacatalog"
18:29:33.198835 [debug] [ThreadPool]: Acquiring new athena connection "list_awsdatacatalog"
18:29:33.199121 [debug] [ThreadPool]: Opening a new connection, currently in state init
18:29:33.202800 [debug] [ThreadPool]: Acquiring new athena connection "list_awsdatacatalog"
18:29:33.203007 [debug] [ThreadPool]: Opening a new connection, currently in state init
18:29:33.236405 [debug] [ThreadPool]: Opening a new connection, currently in state init
18:29:33.243815 [debug] [ThreadPool]: Opening a new connection, currently in state init
18:29:33.249740 [debug] [ThreadPool]: Opening a new connection, currently in state init
18:29:33.250769 [error] [ThreadPool]: Athena adapter: Got an error when attempting to open a Athena connection due to 'credential_provider'
Traceback (most recent call last):
  File "/Users/jesse.dobbelaere/Code/side-projects/dbt-athena/dbt/adapters/athena/connections.py", line 155, in open
    handle = AthenaConnection(
  File "/Users/jesse.dobbelaere/XXX/.venv/lib/python3.10/site-packages/pyathena/connection.py", line 139, in __init__
    self._client = self._session.client(
  File "/Users/jesse.dobbelaere/XXX/.venv/lib/python3.10/site-packages/boto3/session.py", line 299, in client
    return self._session.create_client(
  File "/Users/jesse.dobbelaere/XXX/.venv/lib/python3.10/site-packages/botocore/session.py", line 951, in create_client
    credentials = self.get_credentials()
  File "/Users/jesse.dobbelaere/XXX/.venv/lib/python3.10/site-packages/botocore/session.py", line 507, in get_credentials
    self._credentials = self._components.get_component(
  File "/Users/jesse.dobbelaere/XXX/.venv/lib/python3.10/site-packages/botocore/session.py", line 1112, in get_component
    del self._deferred[name]
KeyError: 'credential_provider'

I did some investigation in #43 and found that the global session we use (singleton) is probably the culprit. Each thread needs its own session. Re-using a session is not thread-safe.

This is nicely mentioned on the documentation: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/resources.html#multithreading-or-multiprocessing-with-resources

...each thread would have its own Boto3 session...

Therefore, let's remove the global session and then each thread will make a new AthenaConnection (with a new session)?

Fixes #43

@jessedobbelaere jessedobbelaere force-pushed the feature/remove-global-boto-session branch from d213f24 to e51c4d0 Compare November 21, 2022 19:48
nicor88
nicor88 previously approved these changes Nov 21, 2022
Copy link
Contributor

@nicor88 nicor88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @Jrmyy could you have a look too. I think that using boto3 sessions should fix the original issue why a global session was introduced.

@nicor88 nicor88 requested a review from Jrmyy November 21, 2022 20:48
@Jrmyy
Copy link
Contributor

Jrmyy commented Nov 22, 2022

I have to test it. Last time I try that solution, with MFA, it was an horror, a code was asked for every request, and since we can only use a code one time and a code is issued every 30 seconds it can be very annoying.

@nicor88
Copy link
Contributor

nicor88 commented Nov 22, 2022

@Jrmyy with MFA did you try to use aws-vault? it should keep the session for some time, I think more than 15 minutes.
I generally run my dbt models like this aws-vault exec my_profile -- dbt run --models my-model

@Jrmyy
Copy link
Contributor

Jrmyy commented Nov 22, 2022

I did not know about that. Thanks for the heads up ! I test it today.

@Jrmyy
Copy link
Contributor

Jrmyy commented Nov 22, 2022

So I tested without aws-vault and he is indeed asking me at each operation the MFA code, which is definitely not great from a developer perspective ... I will take a look at aws-vault.

@jessedobbelaere
Copy link
Contributor Author

I use AWS control tower and AWS SSO to assume a role (aws sso get-role-credentials) and get temporary credentials. Did not experience that issue. I haven't looked into aws-vault but maybe you can get role credentials (and not a session token) with aws-vault exec your-role --no-session -- dbt run --models my-model (see this table)?

@jessedobbelaere jessedobbelaere changed the title Fix multi-thread issues by removing global boto3 session fix: multi-thread issues by removing global boto3 session Nov 22, 2022
@nicor88
Copy link
Contributor

nicor88 commented Nov 23, 2022

@Jrmyy aws-vault have --no-session and --duration, maybe you can play around with that?

@Jrmyy
Copy link
Contributor

Jrmyy commented Nov 23, 2022

I think I am struggling with getting aws-vault to work with my AWS Configuration / MFA and aws_profile_name from the adapter ... Do you put any aws_profile_name in the adapter conf ?

@nicor88
Copy link
Contributor

nicor88 commented Nov 23, 2022

I'm not using any AWS profile on dbt profile in athena. When you cann aws-vault exec your_profile it store bunch of AWS_ env variables that then are used under the hood by whatever API call your are going to make it.

easy way to see if aws-vault works is to run this aws-vault exec your_profile -- aws s3 ls, if it's all fine, you should see all your buckets.

Copy link
Contributor

@Jrmyy Jrmyy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested with leapp, works like a charm

@Jrmyy Jrmyy merged commit e11876d into dbt-labs:main Nov 24, 2022
@jessedobbelaere jessedobbelaere deleted the feature/remove-global-boto-session branch November 24, 2022 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multi-threading issues on information_schema queries
3 participants