-
Notifications
You must be signed in to change notification settings - Fork 914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Credentials in kedro #1646
Labels
Milestone
Comments
merelcht
added
the
Type: Technical DR 💾
Decision Records (technical decisions made)
label
Aug 2, 2022
A user recently told me that they struggled a bit with Snowflake authentication in particular. Looks like the References:
|
Moved this to a wiki page: https://github.com/kedro-org/kedro/wiki/Credentials-in-kedro |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
How do credentials currently work in kedro?
The basic pattern is as follows:
To be concrete, here's an example for Azure Blob Storage:
The
credentials
key is injected into the call that instantiatespandas.CSVDataSet
when kedro is run. Specifically, here:kedro/kedro/io/data_catalog.py
Line 276 in a925fd5
Note:
credentials
is a special reserved keyword. This doesn't work for any other key nameConfigLoader
makes to how yaml is parsed. In-file variable injection is (kind of) supported in yaml using anchors, but injecting a variable from another file is not. The mechanism that does the injection here is entirely defined by kedrolocal
and injected into the catalog at runtimeDoes this work well?
In my experience and from talking to users: in the case that credentials can be stored in a file, yes. Very little confusion is caused by the custom behaviour of injecting credentials.
What are the problems with this?
The biggest problem is that credentials might not be stored in a file. Alternatives are:
TemplatedConfigLoader
. This works ok but feels hacky and is so common it shouldn't really require a workaroundAPIDataSet
works with arequests.auth.AuthBase
object for credentials;pandas.GBQTableDataSet
works withgoogle.oauth2.credentials.Credentials
. This is handled by instantiating the corresponding credentials class in the dataset using the kwargs given in the credentials.yml file. This works ok but is awkward and not done consistently throughout kedro (e.g. Additional options for APIDataSet (e.g. proxies) #711; Adding BigQuery authentication to credentials.yml #1621).TemplatedConfigLoader
trick as used for env vars would work here. See Cloud native credentials storage #1280 and Global credentials file for multiple pipelines #930 for more.Another problem with credentials is that the way they are handled for
PartitionedDataSet
is pretty complicated. I'm not sure we'll be able to solve that here but would be nice if we could.Possible solutions
Environment variables
At a bare minimum I think we need a way of directly injecting environment variables into credentials. Given how common this is outside credentials files also (using
TemplatedConfigLoader
), my opinion is that this mechanism should not be credentials-specific but instead common across all kedro configuration.e.g. with OmegaConf you'd do this as:
Quotes from #770:
Beyond environment variables
So far the best discussion of this is in #1280. From @Galileo-Galilei:
Also worth noting the factory approach of @daBlesr discussed in #711 (comment) and following comments.
I don't yet have any particular ideas myself here so I'd love to hear what others think and hear @Galileo-Galilei's idea in more detail 🚀 It would be especially great to hear from people who use cloud-native credentials systems like AWS secrets. This is a bit of a blindspot for us at the moment I think.
The text was updated successfully, but these errors were encountered: