Skip to content

Commit

Permalink
Added info about how to reorder the columns to adjust a schema (#1364)
Browse files Browse the repository at this point in the history
* Added info about how to reorder the columns

* Updated rest_api.md with configuration examples

* Update docs/website/docs/walkthroughs/adjust-a-schema.md

* Updated ../website/docs/dlt-ecosystem/verified-sources/rest_api.md

* fix naming convention for bigquery custom destination

---------

Co-authored-by: Anton Burnashev <[email protected]>
Co-authored-by: AstrakhantsevaAA <[email protected]>
  • Loading branch information
3 people authored May 23, 2024
1 parent bf92ada commit bab9e90
Show file tree
Hide file tree
Showing 2 changed files with 36 additions and 10 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@
keywords: [destination, credentials, example, bigquery, custom destination]
---
In this example, you'll find a Python script that demonstrates how to load to bigquey with the custom destination.
In this example, you'll find a Python script that demonstrates how to load to BigQuery with the custom destination.
We'll learn how to:
- use [built-in credentials](../general-usage/credentials/config_specs#gcp-credentials)
- use the [custom destination](../dlt-ecosystem/destinations/destination.md)
- Use pyarrow tables to create complex column types on bigquery
- Use bigquery `autodetect=True` for schema inference from parquet files
- Use [built-in credentials.](../general-usage/credentials/config_specs#gcp-credentials)
- Use the [custom destination.](../dlt-ecosystem/destinations/destination.md)
- Use pyarrow tables to create complex column types on BigQuery.
- Use BigQuery `autodetect=True` for schema inference from parquet files.
"""

Expand All @@ -38,7 +38,7 @@
def resource(url: str):
# load pyarrow table with pandas
table = pa.Table.from_pandas(pd.read_csv(url))
# we add a list type column to demontrate bigquery lists
# we add a list type column to demonstrate bigquery lists
table = table.append_column(
"tags",
pa.array(
Expand All @@ -57,12 +57,12 @@ def resource(url: str):
yield table


# dlt biquery custom destination
# dlt bigquery custom destination
# we can use the dlt provided credentials class
# to retrieve the gcp credentials from the secrets
@dlt.destination(name="bigquery", loader_file_format="parquet", batch_size=0)
@dlt.destination(name="bigquery", loader_file_format="parquet", batch_size=0, naming_convention="snake_case")
def bigquery_insert(
items, table, credentials: GcpServiceAccountCredentials = dlt.secrets.value
items, table=BIGQUERY_TABLE_ID, credentials: GcpServiceAccountCredentials = dlt.secrets.value
) -> None:
client = bigquery.Client(
credentials.project_id, credentials.to_native_credentials(), location="US"
Expand All @@ -74,7 +74,7 @@ def bigquery_insert(
)
# since we have set the batch_size to 0, we get a filepath and can load the file directly
with open(items, "rb") as f:
load_job = client.load_table_from_file(f, BIGQUERY_TABLE_ID, job_config=job_config)
load_job = client.load_table_from_file(f, table, job_config=job_config)
load_job.result() # Waits for the job to complete.


Expand Down
26 changes: 26 additions & 0 deletions docs/website/docs/walkthroughs/adjust-a-schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,32 @@ Do not rename the tables or columns in the yaml file. `dlt` infers those from th
You can [adjust the schema](../general-usage/resource.md#adjust-schema) in Python before resource is loaded.
:::

### Reorder columns
To reorder the columns in your dataset, follow these steps:

1. Initial Run: Execute the pipeline to obtain the import and export schemas.
1. Modify Export Schema: Adjust the column order as desired in the export schema.
1. Sync Import Schema: Ensure that these changes are mirrored in the import schema to maintain consistency.
1. Delete Dataset: Remove the existing dataset to prepare for the reload.
1. Reload Data: Reload the data. The dataset should now reflect the new column order as specified in the import YAML.

These steps ensure that the column order in your dataset matches your specifications.

**Another approach** to reorder columns is to use the `add_map` function. For instance, to rearrange ‘column1’, ‘column2’, and ‘column3’, you can proceed as follows:

```py
# Define the data source and reorder columns using add_map
data_source = resource().add_map(lambda row: {
'column3': row['column3'],
'column1': row['column1'],
'column2': row['column2']
})
# Run the pipeline
load_info = pipeline.run(data_source)
```

In this example, the `add_map` function reorders columns by defining a new mapping. The lambda function specifies the desired order by rearranging the key-value pairs. When the pipeline runs, the data will load with the columns in the new order.

### Load data as json instead of generating child table or columns from flattened dicts

Expand Down

0 comments on commit bab9e90

Please sign in to comment.