Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Updated the docs for OAuth verification. #1040

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 64 additions & 3 deletions docs/website/docs/dlt-ecosystem/verified-sources/filesystem.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Sources and resources that can be used with this verified source are:
| read_jsonl | Resource-transformer | Reads jsonl file content and extract the data |
| read_parquet | Resource-transformer | Reads parquet file content and extract the data with **Pyarrow** |


## Setup Guide

### Grab credentials
Expand All @@ -49,8 +50,11 @@ For more info, see
[AWS official documentation.](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html)

#### Google Cloud Storage / Google Drive credentials
There are two ways to access GDrive, using:
- Service account credentials, which is the preferred option for those who have a GCP account.
- OAuth credentials, which is suitable for those who don't have a GCP account.

To get GCS/GDrive access:
To get GCS/GDrive access using **Service account** credentials:

1. Log in to [console.cloud.google.com](http://console.cloud.google.com/).
1. Create a [service account](https://cloud.google.com/iam/docs/service-accounts-create#creating).
Expand All @@ -63,6 +67,57 @@ To get GCS/GDrive access:
For more info, see how to
[create service account](https://support.google.com/a/answer/7378726?hl=en).

To get GCS/GDrive access using **OAuth** credentials:

You need to create a GCP account to get OAuth credentials if you don't have one. To create one,
follow these steps:

1. Initialize the verified source as explained in ["initialize-the-verified-source"](#initialize-the-verified-source) to set up OAuth credentials.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that for markdown it is just fine to have 1.s and nevertheless should it enumerate this as usual 1,2,3...?


1. Open a GCP project in your GCP account.

1. Enable the Google Drive API in the project.

1. Search credentials in the search bar and go to Credentials.

1. Go to Credentials -> OAuth client ID -> Select Desktop App from the Application type and give an
appropriate name.

1. Download the credentials and fill "client_id", "client_secret" and "project_id" in
"secrets.toml". You can comment out the "refresh_token" field since we will grab it
in the next steps.

1. Go back to credentials and select the OAuth consent screen on the left.

1. Fill in the App name, user support email(your email), authorized domain (localhost.com), and dev
contact info (your email again).

1. To access GDrive, add the following scope:

```
"https://www.googleapis.com/auth/drive.readonly"
```

1. To access GCS, add the following scope:

```
"https://www.googleapis.com/auth/devstorage.read_only"
```

1. Add your email as a test user.

1. Generate `refresh_token`:

After configuring "client_id", "client_secret" and "project_id" in "secrets.toml". To generate
the refresh token, run the following script from the root folder:

```bash
python filesystem/setup_script_gcp_oauth.py
```

Once you have executed the script and completed the authentication, you will receive a "refresh
token" that can be used to set up the ".dlt/secrets.toml".

#### Azure Blob Storage credentials

To obtain Azure blob storage access:
Expand Down Expand Up @@ -111,11 +166,17 @@ For more information, read the
aws_access_key_id="Please set me up!"
aws_secret_access_key="Please set me up!"

# For GCS bucket / Google Drive access:
# For GCS bucket / Google Drive access (Service account method):
client_email="Please set me up!"
private_key="Please set me up!"
project_id="Please set me up!"

# For GCS bucket / Google Drive access (OAuth method):
client_id = "Please set me up!"
client_secret = "Please set me up!"
refresh_token = "Please set me up!"
project_id = "Please set me up!"

# For Azure blob storage access:
azure_storage_account_name="Please set me up!"
azure_storage_account_key="Please set me up!"
Expand Down Expand Up @@ -164,7 +225,7 @@ For more information, read the
```bash
pip install adlfs>=2023.9.0
```
- GCS storage: No separate module needed.
- GCS, GDrive storage: No separate module needed.

1. You're now ready to run the pipeline! To get started, run the following command:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,8 @@ You need to create a GCP service account to get API credentials if you don't hav
You need to create a GCP account to get OAuth credentials if you don't have one. To create one,
follow these steps:

1. Initialize the verified source as explained in ["initialize-the-verified-source"](#initialize-the-verified-source) to set up OAuth credentials.

1. Ensure your email used for the GCP account has access to the GA4 property.

1. Open a GCP project in your GCP account.
Expand All @@ -75,7 +77,8 @@ follow these steps:
appropriate name.

1. Download the credentials and fill "client_id", "client_secret" and "project_id" in
"secrets.toml".
"secrets.toml". You can comment out the "refresh_token" field since we will grab it
in the next steps.

1. Go back to credentials and select the OAuth consent screen on the left.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,8 @@ You need to create a GCP service account to get API credentials if you don't hav
You need to create a GCP account to get OAuth credentials if you don't have one. To create one,
follow these steps:

1. Initialize the verified source as explained in ["initialize-the-verified-source"](#initialize-the-verified-source) to set up OAuth credentials.

1. Open a GCP project in your GCP account.

1. Enable the Sheets API in the project.
Expand All @@ -78,7 +80,8 @@ follow these steps:
appropriate name.

1. Download the credentials and fill "client_id", "client_secret" and "project_id" in
"secrets.toml".
"secrets.toml". You can comment out the "refresh_token" field since we will grab it
in the next steps.

1. Go back to credentials and select the OAuth consent screen on the left.

Expand Down
Loading