Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation around credentials management #2669

Open
noklam opened this issue Jun 12, 2023 · 6 comments
Open

Update documentation around credentials management #2669

noklam opened this issue Jun 12, 2023 · 6 comments
Labels
Component: Documentation 📄 Issue/PR for markdown and API documentation Hacktoberfest Issue suitable for hacktoberfest

Comments

@noklam
Copy link
Contributor

noklam commented Jun 12, 2023

Description

Short description of the problem here.

parquet_dataset:
  type: dask.ParquetDataSet
  filepath: "s3://bucket_name/path/to/folder"
  credentials:
    client_kwargs:
      aws_access_key_id: YOUR_KEY
      aws_secret_access_key: "YOUR SECRET"

This is the way we mentioned how to provide credentials in Kedro's doc. However fsspec has update the API for quite a while and if you are using newer version of fsspec, you should use key,secret instead of aws_access_key_id instead.

It could be only affecting s3fs (This is how I bump into error), but potentially affect gcs and more.

Context

The docs on credentials are out of date and mention wrong key names. All doc chapters mentioning credentials should be updated to use the correct keys.

@astrojuanlu astrojuanlu changed the title Update outdated documentation in Kedro's documentation Update documentation around credentials management Jun 12, 2023
@astrojuanlu astrojuanlu added the Component: Documentation 📄 Issue/PR for markdown and API documentation label Aug 26, 2023
@astrojuanlu
Copy link
Member

Today I was helping @ricardopicon-mck and it was not clear how to use Google Cloud credentials. There are excellent examples of how to set up the catalog.yml:

https://docs.kedro.org/en/stable/data/data_catalog_yaml_examples.html#load-an-excel-file-from-google-cloud-storage

But how does credentials.yml look in that case?

For the record, this did the trick for me:

gcp_credentials:
  token: gcp_credentials.json

But this only worked with a flat file structure. When having a full-fledged Kedro project with conf/base and conf/local, I had to specify the absolute path:

gcp_credentials:
  token: /Users/juan_cano/Projects/QuantumBlack Labs/tmp/test-credentials/conf/local/gcp_credentials.json

I'm sure there is a better way.

In general, the credentials page is not very useful: https://docs.kedro.org/en/stable/configuration/credentials.html

It places a lot of emphasis in how to load them from code, but I'd consider this "advanced" or "programmatic" usage, which is not how most users experience Kedro.

(see also fsspec/gcsfs#583)

@stichbury
Copy link
Contributor

That's a good point and this page needs a clean up to bring up to the same standards as the recent data catalog updates.

@datajoely
Copy link
Contributor

See this for reference
#3164

@astrojuanlu
Copy link
Member

We might need to document as well how credentials work during development vs in production, see this response by @noklam to a Prefect user https://linen-slack.kedro.org/t/16019525/hi-another-question-is-there-a-way-to-directly-store-the-con#146bb5db-314d-414f-947a-fd9d64f4d223

@astrojuanlu
Copy link
Member

There are more problems with the snippet @noklam shared. This is a setup that worked for me:

# catalog.yml
executive_summary:
  type: text.TextDataset
  filepath: s3://social-summarizer/executive-summary.txt
  versioned: true
  credentials: minio_fsspec

# credentials.yml
minio_fsspec:
  endpoint_url: "http://127.0.0.1:9010"
  key: "minioadmin"
  secret: "minioadmin"

This worked fine. But if I put the endpoint_url, key, secret inside client_kwargs, then I get

DatasetError: Failed while loading data from data set TextDataset(filepath=social-summarizer/executive-summary.txt, protocol=s3, 
version=Version(load=None, save='2023-11-25T10.02.34.586Z')).
AioSession._create_client() got an unexpected keyword argument 'key'

The fact that our dataset code is so contrived doesn't help:

_fs_args = deepcopy(fs_args) or {}
_fs_open_args_load = _fs_args.pop("open_args_load", {})
_fs_open_args_save = _fs_args.pop("open_args_save", {})
_credentials = deepcopy(credentials) or {}
protocol, path = get_protocol_and_path(filepath, version)
if protocol == "file":
_fs_args.setdefault("auto_mkdir", True)
self._protocol = protocol
self._fs = fsspec.filesystem(self._protocol, **_credentials, **_fs_args)

(the "copy paste" problems mentioned in #1778)

For the record, I'm using fsspec==2023.10.0.

@astrojuanlu
Copy link
Member

I think we should do this after #3811

@merelcht merelcht added the Hacktoberfest Issue suitable for hacktoberfest label Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Documentation 📄 Issue/PR for markdown and API documentation Hacktoberfest Issue suitable for hacktoberfest
Projects
Status: No status
Status: Todo
Development

No branches or pull requests

5 participants