From 90d650a54c9dcfd79524dbdd6620a5cd9b7818de Mon Sep 17 00:00:00 2001 From: dat-a-man <98139823+dat-a-man@users.noreply.github.com> Date: Sat, 17 Feb 2024 08:51:40 +0000 Subject: [PATCH 1/6] Small correction to docs --- .../docs/dlt-ecosystem/verified-sources/google_sheets.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md b/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md index ea0acc6824..4b99c0db66 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md @@ -68,8 +68,6 @@ You need to create a GCP service account to get API credentials if you don't hav You need to create a GCP account to get OAuth credentials if you don't have one. To create one, follow these steps: -1. Ensure your email used for the GCP account has access to the GA4 property. - 1. Open a GCP project in your GCP account. 1. Enable the Sheets API in the project. From 3503d349c4789086135e3be4ef8ccdd6f4151c8f Mon Sep 17 00:00:00 2001 From: dat-a-man <98139823+dat-a-man@users.noreply.github.com> Date: Sat, 17 Feb 2024 11:08:28 +0000 Subject: [PATCH 2/6] Updated documentation for data types --- .../verified-sources/google_sheets.md | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md b/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md index 4b99c0db66..e3c50e7e11 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md @@ -321,6 +321,25 @@ For more information, read the guide on [how to run a pipeline](../../walkthroug ## Data types The `dlt` normalizer uses the first row of data to infer types and attempts to coerce subsequent rows, creating variant columns if unsuccessful. This is standard behavior. It also recognizes date and time types using additional metadata from the first row. +To handle mixed data types in a single column, you can provide a type hint for the affected column in the resource. Use the `apply_hints` method on the resource to achieve this. Here's how you can do it: + +``` +for resource in resources: + resource.apply_hints(columns={"total_amount": {"data_type": "double"}}) +``` +In this example, the total_amount column is enforced to be of type double. This will ensure that all values in the total_amount column are treated as double, regardless of whether they are integers or decimals in the original Google Sheets data. + +for a single resource (say Sheet1), you can simply use: +``` +source.Sheet1.apply_hints(columns={"total_amount": {"data_type": "double"}}) +``` + +To get name of resources you can use: +``` +print(source.resources.keys()) +``` + +To read more, please refer to the documentation: ## Sources and resources From d4aec690903dc2e2973ba240af75cf59513f001f Mon Sep 17 00:00:00 2001 From: dat-a-man <98139823+dat-a-man@users.noreply.github.com> Date: Thu, 22 Feb 2024 01:19:19 +0000 Subject: [PATCH 3/6] Updated for comments. --- .../docs/dlt-ecosystem/verified-sources/google_sheets.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md b/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md index e3c50e7e11..c9f8619013 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md @@ -327,9 +327,9 @@ To handle mixed data types in a single column, you can provide a type hint for t for resource in resources: resource.apply_hints(columns={"total_amount": {"data_type": "double"}}) ``` -In this example, the total_amount column is enforced to be of type double. This will ensure that all values in the total_amount column are treated as double, regardless of whether they are integers or decimals in the original Google Sheets data. +In this example, the `total_amount` column is enforced to be of type double. This will ensure that all values in the `total_amount` column are treated as `double`, regardless of whether they are integers or decimals in the original Google Sheets data. -for a single resource (say Sheet1), you can simply use: +For a single resource (ex. Sheet1), you can simply use: ``` source.Sheet1.apply_hints(columns={"total_amount": {"data_type": "double"}}) ``` @@ -339,7 +339,7 @@ To get name of resources you can use: print(source.resources.keys()) ``` -To read more, please refer to the documentation: +To read more, please refer to the documentation: [Customize sources](/docs/website/docs/general-usage/source#customize-sources). ## Sources and resources From 4f89e0d556988675201b4de89154ec221546f6e4 Mon Sep 17 00:00:00 2001 From: dat-a-man <98139823+dat-a-man@users.noreply.github.com> Date: Thu, 22 Feb 2024 01:38:01 +0000 Subject: [PATCH 4/6] Fixing broken links --- .../docs/dlt-ecosystem/verified-sources/google_sheets.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md b/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md index c9f8619013..deefe4dc1f 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md @@ -339,7 +339,7 @@ To get name of resources you can use: print(source.resources.keys()) ``` -To read more, please refer to the documentation: [Customize sources](/docs/website/docs/general-usage/source#customize-sources). +To read more, please refer to the documentation: [Customize sources](../../general-usage/source#customize-sources). ## Sources and resources From 8a33e410ae6afcbd8c338b458fe51bfd07c23665 Mon Sep 17 00:00:00 2001 From: dat-a-man <98139823+dat-a-man@users.noreply.github.com> Date: Thu, 22 Feb 2024 12:44:54 +0000 Subject: [PATCH 5/6] Updated link. --- .../docs/dlt-ecosystem/verified-sources/google_sheets.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md b/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md index deefe4dc1f..9f414550ec 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md @@ -339,7 +339,7 @@ To get name of resources you can use: print(source.resources.keys()) ``` -To read more, please refer to the documentation: [Customize sources](../../general-usage/source#customize-sources). +To read more about tables, columns and datatypes, please refer to [our documentation here.](../../general-usage/schema#tables-and-columns) ## Sources and resources From 2816701beb4f77d9d16098de926fc04606503a16 Mon Sep 17 00:00:00 2001 From: AstrakhantsevaAA Date: Mon, 26 Feb 2024 12:00:33 +0100 Subject: [PATCH 6/6] add more info about time types and refactor --- .../verified-sources/google_sheets.md | 44 ++++++++++++++----- 1 file changed, 32 insertions(+), 12 deletions(-) diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md b/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md index 672c798420..2a5d4b03ab 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md @@ -303,7 +303,7 @@ For more information, read the [General Usage: Credentials.](../../general-usage 1. You're now ready to run the pipeline! To get started, run the following command: ```bash - python3 google_sheets_pipeline.py + python google_sheets_pipeline.py ``` 1. Once the pipeline has finished running, you can verify that everything loaded correctly by using @@ -320,26 +320,46 @@ For more information, read the guide on [how to run a pipeline](../../walkthroug ## Data types -The `dlt` normalizer uses the first row of data to infer types and attempts to coerce subsequent rows, creating variant columns if unsuccessful. This is standard behavior. It also recognizes date and time types using additional metadata from the first row. -To handle mixed data types in a single column, you can provide a type hint for the affected column in the resource. Use the `apply_hints` method on the resource to achieve this. Here's how you can do it: +The `dlt` normalizer uses the first row of data to infer types and attempts to coerce subsequent rows, creating variant columns if unsuccessful. This is standard behavior. +If `dlt` did not correctly determine the data type in the column, or you want to change the data type for other reasons, +then you can provide a type hint for the affected column in the resource. +Also, since recently `dlt`'s no longer recognizing date and time types, so you have to designate it yourself as `timestamp`. -``` +Use the `apply_hints` method on the resource to achieve this. +Here's how you can do it: + +```python for resource in resources: - resource.apply_hints(columns={"total_amount": {"data_type": "double"}}) + resource.apply_hints(columns={ + "total_amount": {"data_type": "double"}, + "date": {"data_type": "timestamp"}, + }) ``` -In this example, the `total_amount` column is enforced to be of type double. This will ensure that all values in the `total_amount` column are treated as `double`, regardless of whether they are integers or decimals in the original Google Sheets data. +In this example, the `total_amount` column is enforced to be of type double and `date` is enforced to be of type timestamp. +This will ensure that all values in the `total_amount` column are treated as `double`, regardless of whether they are integers or decimals in the original Google Sheets data. +And `date` column will be represented as dates, not integers. -For a single resource (ex. Sheet1), you can simply use: -``` -source.Sheet1.apply_hints(columns={"total_amount": {"data_type": "double"}}) +For a single resource (e.g. `Sheet1`), you can simply use: +```python +source.Sheet1.apply_hints(columns={ + "total_amount": {"data_type": "double"}, + "date": {"data_type": "timestamp"}, +}) ``` -To get name of resources you can use: -``` +To get the name of resources, you can use: +```python print(source.resources.keys()) ``` -To read more about tables, columns and datatypes, please refer to [our documentation here.](../../general-usage/schema#tables-and-columns) +To read more about tables, columns, and datatypes, please refer to [our documentation here.](../../general-usage/schema#tables-and-columns) + +:::caution +`dlt` will **not modify** tables after they are created. +So if you changed data types with hints, +then you need to **delete the dataset** +or set `full_refresh=True`. +::: ## Sources and resources