Added info about how to reorder the columns to adjust a schema (#1364)

* Added info about how to reorder the columns * Updated rest_api.md with configuration examples * Update docs/website/docs/walkthroughs/adjust-a-schema.md * Updated ../website/docs/dlt-ecosystem/verified-sources/rest_api.md * fix naming convention for bigquery custom destination --------- Co-authored-by: Anton Burnashev <[email protected]> Co-authored-by: AstrakhantsevaAA <[email protected]>
dlt-hub · May 23, 2024 · bab9e90 · bab9e90
1 parent bf92ada
commit bab9e90
Show file tree

Hide file tree

Showing 2 changed files with 36 additions and 10 deletions.
diff --git a/docs/examples/custom_destination_bigquery/custom_destination_bigquery.py b/docs/examples/custom_destination_bigquery/custom_destination_bigquery.py
@@ -5,13 +5,13 @@
 keywords: [destination, credentials, example, bigquery, custom destination]
 ---
 
-In this example, you'll find a Python script that demonstrates how to load to bigquey with the custom destination.
+In this example, you'll find a Python script that demonstrates how to load to BigQuery with the custom destination.
 
 We'll learn how to:
-- use [built-in credentials](../general-usage/credentials/config_specs#gcp-credentials)
-- use the [custom destination](../dlt-ecosystem/destinations/destination.md)
-- Use pyarrow tables to create complex column types on bigquery
-- Use bigquery `autodetect=True` for schema inference from parquet files
+- Use [built-in credentials.](../general-usage/credentials/config_specs#gcp-credentials)
+- Use the [custom destination.](../dlt-ecosystem/destinations/destination.md)
+- Use pyarrow tables to create complex column types on BigQuery.
+- Use BigQuery `autodetect=True` for schema inference from parquet files.
 
 """
 
@@ -38,7 +38,7 @@
 def resource(url: str):
     # load pyarrow table with pandas
     table = pa.Table.from_pandas(pd.read_csv(url))
-    # we add a list type column to demontrate bigquery lists
+    # we add a list type column to demonstrate bigquery lists
     table = table.append_column(
         "tags",
         pa.array(
@@ -57,12 +57,12 @@ def resource(url: str):
     yield table
 
 
-# dlt biquery custom destination
+# dlt bigquery custom destination
 # we can use the dlt provided credentials class
 # to retrieve the gcp credentials from the secrets
-@dlt.destination(name="bigquery", loader_file_format="parquet", batch_size=0)
+@dlt.destination(name="bigquery", loader_file_format="parquet", batch_size=0, naming_convention="snake_case")
 def bigquery_insert(
-    items, table, credentials: GcpServiceAccountCredentials = dlt.secrets.value
+    items, table=BIGQUERY_TABLE_ID, credentials: GcpServiceAccountCredentials = dlt.secrets.value
 ) -> None:
     client = bigquery.Client(
         credentials.project_id, credentials.to_native_credentials(), location="US"
@@ -74,7 +74,7 @@ def bigquery_insert(
     )
     # since we have set the batch_size to 0, we get a filepath and can load the file directly
     with open(items, "rb") as f:
-        load_job = client.load_table_from_file(f, BIGQUERY_TABLE_ID, job_config=job_config)
+        load_job = client.load_table_from_file(f, table, job_config=job_config)
     load_job.result()  # Waits for the job to complete.
 
 

diff --git a/docs/website/docs/walkthroughs/adjust-a-schema.md b/docs/website/docs/walkthroughs/adjust-a-schema.md
@@ -121,6 +121,32 @@ Do not rename the tables or columns in the yaml file. `dlt` infers those from th
 You can [adjust the schema](../general-usage/resource.md#adjust-schema) in Python before resource is loaded.
 :::
 
+### Reorder columns
+To reorder the columns in your dataset, follow these steps:
+
+1.	Initial Run: Execute the pipeline to obtain the import and export schemas.
+1.	Modify Export Schema: Adjust the column order as desired in the export schema.
+1.	Sync Import Schema: Ensure that these changes are mirrored in the import schema to maintain consistency.
+1.	Delete Dataset: Remove the existing dataset to prepare for the reload.
+1.	Reload Data: Reload the data. The dataset should now reflect the new column order as specified in the import YAML.
+
+These steps ensure that the column order in your dataset matches your specifications.
+
+**Another approach** to reorder columns is to use the `add_map` function. For instance, to rearrange ‘column1’, ‘column2’, and ‘column3’, you can proceed as follows:
+
+```py
+# Define the data source and reorder columns using add_map
+data_source = resource().add_map(lambda row: {
+    'column3': row['column3'], 
+    'column1': row['column1'], 
+    'column2': row['column2']
+})
+
+# Run the pipeline
+load_info = pipeline.run(data_source)
+```
+
+In this example, the `add_map` function reorders columns by defining a new mapping. The lambda function specifies the desired order by rearranging the key-value pairs. When the pipeline runs, the data will load with the columns in the new order.
 
 ### Load data as json instead of generating child table or columns from flattened dicts