Skip to content

Commit

Permalink
examples for docs (#616)
Browse files Browse the repository at this point in the history
* remove snipsync (#613)

* add custom snippets element

* remove snipsync

* migrate performance snippets

* add missing init files

* refine snippet watcher

* docs examples

* fixes chess example

* fixes DLT -> dlt

* more work on transformers example

* make header smaller

* example for zendesk incremental loading

* move incremental loading example to right location

* added text and output example to incremental zendesk


* allow secrets files in examples

* add zendesk credentials

* correct text and code snippets for zendesk example

* add main clause

* add config example

* pytest marker to skip tests running in PRs from forks on github

* removes more typings and adds comments to zendesk example

* shortens example titles

---------

Co-authored-by: Marcin Rudolf <[email protected]>
Co-authored-by: AstrakhantsevaAA <[email protected]>
  • Loading branch information
3 people authored Oct 4, 2023
1 parent 513e73b commit 58f8ad1
Show file tree
Hide file tree
Showing 95 changed files with 935 additions and 81 deletions.
3 changes: 3 additions & 0 deletions .github/workflows/test_doc_snippets.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@ env:
DESTINATION__WEAVIATE__VECTORIZER: text2vec-contextionary
DESTINATION__WEAVIATE__MODULE_CONFIG: "{\"text2vec-contextionary\": {\"vectorizeClassName\": false, \"vectorizePropertyName\": true}}"

# zendesk vars for example
SOURCES__ZENDESK__CREDENTIALS: ${{ secrets.ZENDESK__CREDENTIALS }}

jobs:

run_lint:
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ experiments/*
# !experiments/pipeline/
# !experiments/pipeline/*
secrets.toml
!docs/**/secrets.toml
*.session.sql
*.duckdb
*.wal
Expand Down
2 changes: 2 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ help:
@echo " tests all components unsing local destinations: duckdb and postgres"
@echo " test-common"
@echo " tests common components"
@echo " test-and-lint-snippets"
@echo " tests and lints snippets and examples in docs"
@echo " build-library"
@echo " makes dev and then builds dlt package for distribution"
@echo " publish-library"
Expand Down
2 changes: 1 addition & 1 deletion dlt/common/schema/schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -420,7 +420,7 @@ def _coerce_non_null_value(self, table_columns: TTableSchemaColumns, table_name:
raise CannotCoerceColumnException(table_name, col_name, py_type, table_columns[col_name]["data_type"], v)
# otherwise we must create variant extension to the table
# pass final=True so no more auto-variants can be created recursively
# TODO: generate callback so DLT user can decide what to do
# TODO: generate callback so dlt user can decide what to do
variant_col_name = self.naming.shorten_fragments(col_name, VARIANT_FIELD_FORMAT % py_type)
return self._coerce_non_null_value(table_columns, table_name, variant_col_name, v, is_variant=True)

Expand Down
2 changes: 1 addition & 1 deletion dlt/common/storages/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ class SchemaStorageException(StorageException):

class InStorageSchemaModified(SchemaStorageException):
def __init__(self, schema_name: str, storage_path: str) -> None:
msg = f"Schema {schema_name} in {storage_path} was externally modified. This is not allowed as that would prevent correct version tracking. Use import/export capabilities of DLT to provide external changes."
msg = f"Schema {schema_name} in {storage_path} was externally modified. This is not allowed as that would prevent correct version tracking. Use import/export capabilities of dlt to provide external changes."
super().__init__(msg)


Expand Down
2 changes: 1 addition & 1 deletion dlt/helpers/airflow_helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@

class PipelineTasksGroup(TaskGroup):
"""
Represents a DLT Airflow pipeline task group.
Represents a dlt Airflow pipeline task group.
"""

def __init__(
Expand Down
2 changes: 1 addition & 1 deletion dlt/helpers/streamlit_helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -208,7 +208,7 @@ def write_data_explorer_page(pipeline: Pipeline, schema_name: str = None, show_d
#### Args:
pipeline (Pipeline): Pipeline instance to use.
schema_name (str, optional): Name of the schema to display. If None, default schema is used.
show_dlt_tables (bool, optional): Should show DLT internal tables. Defaults to False.
show_dlt_tables (bool, optional): Should show dlt internal tables. Defaults to False.
example_query (str, optional): Example query to be displayed in the SQL Query box.
show_charts (bool, optional): Should automatically show charts for the queries from SQL Query box. Defaults to True.
Expand Down
26 changes: 26 additions & 0 deletions docs/examples/.dlt/secrets.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# here is a file with the secrets for all the example pipelines in `examples` folder

[sources]
# redshift password for query tables example
query_table.credentials.password="8P5gyDPNo9zo582rQG6a"
query_sql.credentials.password="8P5gyDPNo9zo582rQG6a"

# google sheets example
[sources.google_spreadsheet.credentials]
project_id="chat-analytics-317513"
client_email="[email protected]"
private_key="-----BEGIN PRIVATE KEY-----\nMIIEuwIBADANBgkqhkiG9w0BAQEFAASCBKUwggShAgEAAoIBAQCNEN0bL39HmD+S\n7inCg8CdRKEMZ/q7Rv5uUiTyUMjQLNXySOPRSSJBSXBPpLJPbcmfxCYgOPWadA3F\noa54WJFR3Uxd+SjAC848dGz5+JEL5u2rHcjzL1IbjDd5oH9rap/QxYm/R9Q5eSdC\nlGiiFh4zH+U9nhWWUovn+ofixbQkhrMFOfgHt+Jvdh/2m7Sdz47itjWFC258R1Ki\nH9vPVtHxw0LrcUZN7HACV3NICRUkvX8U2+JC25esmPxc/qqmxoFlQ7ono/NeMjQa\nq2TyTyNSh4pDtf30PAW4cU2AUbtaSPhIdRloDuwzW9d88VUUbVelqHHeJRgGjkSX\nQz2MCBuFAgMBAAECgf8zlepWYEgm8xtdim2ydB3mdhR/zoZmZM/Q1NthJ8u/IbdO\nb2HPEXxGbDKIIJzhAA17Un98leBLwYKuLZhOpdB+igyJlTG8XlCRF88XiUosJWR3\niHmiuMkndHA7WyTXDc0n3GpUFYWkGGng2cKLx7V/OFmpMwhC9LEKMNOrBKnf9U6Z\n/9nanIerFZU4m5mWbNW/ZRc+qvd+1zGw/JYM6ntdkKLo/TwNOmOS5FS01yLvx7Xw\nm12f9I3VceGXWyrYEh+UCWk0gsEb8xnGGZKy3op5I6trsXzH8I3HCXvapkeWSaFe\n/gmT3CHZIK9hang6f4yMG+niuNtZE2/METgvcjkCgYEAwTg1SZAYSaL6LoFV92Kq\nyHV0DP8KivDIKrByOym9oikPK/2ZMNi9NivVmSALuR54wj7pFxFmyEj6UTklSeLb\nRvOjcPnZEMbFspRHIzkySfsnfScoHZXOeakjOub1K5FehYsLXQIfe7iwRg/mcd/2\noFVyJrm2aNXcvNuug4scEE0CgYEAuuaRmGY5mKv+viuZ/zzOky7IjDnp4w2BMJt0\noMXznKuLHJpexnQ9A/ZHxpAp6Bi6Glk0XLi2uaI+ggXlEUfNa3DHMQu7xg1RaCqN\n957WGRO0ETtIWdp1BHhWPtT5kdOrjSZy9vRSZ0vh2WnZe5SgKRVCqQsV7ExcEltz\nUc9WlBkCgYA9MaQOzEgk6iz6FZQ4aVNVcX1zsEKShneerYtAGZQpi392mzatNbeX\nNILNoEyWMIRmYK5J1AUNYa+FkeexYtu3uOoGmdqZaZqrWDK/gRngPF7hUElwNUXT\nWjICMatsRPn+qW7L4iQ+dtu9FMQTRK9DUEx6305aHYFvftPibWhR8QKBgQCAd3GG\nNmXKihaMsr2kUjCPvG1+7WPVfHfbaE9PHyFnBAaXv4f7kvRJn+QQGRGlBjINYFl8\njj6S9HFQwCqGqTsKabeQ/8auyIK3PeDdXqE9FW0FFyGRGXarfueRQqTU1pCpcc89\n7gwiEmeIIJiruCoqcwGh3gvQo1/6AkAO8JxLKQKBgF0T8P0hRctXFejcFf/4EikS\n2+WA/gNSQITC1m+8nWNnU+bDmRax+pIkzlvjkG5kyNfWvB7i2A5Y5OnCo92y5aDQ\nzbGHLwZj0HXqLFXhbAv/0xZPXlZ71NWpi2BpCJRnzU65ftsjePfydfvN6g4mPQ28\nkHQsYKUZk5HPC8FlPvQe\n-----END PRIVATE KEY-----\n"

[destination]
# all postgres destinations for all examples
postgres.credentials = "postgres://loader:loader@localhost:5432/dlt_data"
# all redshift destinations for all examples
redshift.credentials = "postgres://loader:8P5gyDPNo9zo582rQG6a@chat-analytics.czwteevq7bpe.eu-central-1.redshift.amazonaws.com:5439/chat_analytics_rasa"

# all the bigquery destinations
[destination.bigquery.credentials]
project_id="chat-analytics-317513"
client_email="[email protected]"
private_key="-----BEGIN PRIVATE KEY-----\nMIIEuwIBADANBgkqhkiG9w0BAQEFAASCBKUwggShAgEAAoIBAQCNEN0bL39HmD+S\n7inCg8CdRKEMZ/q7Rv5uUiTyUMjQLNXySOPRSSJBSXBPpLJPbcmfxCYgOPWadA3F\noa54WJFR3Uxd+SjAC848dGz5+JEL5u2rHcjzL1IbjDd5oH9rap/QxYm/R9Q5eSdC\nlGiiFh4zH+U9nhWWUovn+ofixbQkhrMFOfgHt+Jvdh/2m7Sdz47itjWFC258R1Ki\nH9vPVtHxw0LrcUZN7HACV3NICRUkvX8U2+JC25esmPxc/qqmxoFlQ7ono/NeMjQa\nq2TyTyNSh4pDtf30PAW4cU2AUbtaSPhIdRloDuwzW9d88VUUbVelqHHeJRgGjkSX\nQz2MCBuFAgMBAAECgf8zlepWYEgm8xtdim2ydB3mdhR/zoZmZM/Q1NthJ8u/IbdO\nb2HPEXxGbDKIIJzhAA17Un98leBLwYKuLZhOpdB+igyJlTG8XlCRF88XiUosJWR3\niHmiuMkndHA7WyTXDc0n3GpUFYWkGGng2cKLx7V/OFmpMwhC9LEKMNOrBKnf9U6Z\n/9nanIerFZU4m5mWbNW/ZRc+qvd+1zGw/JYM6ntdkKLo/TwNOmOS5FS01yLvx7Xw\nm12f9I3VceGXWyrYEh+UCWk0gsEb8xnGGZKy3op5I6trsXzH8I3HCXvapkeWSaFe\n/gmT3CHZIK9hang6f4yMG+niuNtZE2/METgvcjkCgYEAwTg1SZAYSaL6LoFV92Kq\nyHV0DP8KivDIKrByOym9oikPK/2ZMNi9NivVmSALuR54wj7pFxFmyEj6UTklSeLb\nRvOjcPnZEMbFspRHIzkySfsnfScoHZXOeakjOub1K5FehYsLXQIfe7iwRg/mcd/2\noFVyJrm2aNXcvNuug4scEE0CgYEAuuaRmGY5mKv+viuZ/zzOky7IjDnp4w2BMJt0\noMXznKuLHJpexnQ9A/ZHxpAp6Bi6Glk0XLi2uaI+ggXlEUfNa3DHMQu7xg1RaCqN\n957WGRO0ETtIWdp1BHhWPtT5kdOrjSZy9vRSZ0vh2WnZe5SgKRVCqQsV7ExcEltz\nUc9WlBkCgYA9MaQOzEgk6iz6FZQ4aVNVcX1zsEKShneerYtAGZQpi392mzatNbeX\nNILNoEyWMIRmYK5J1AUNYa+FkeexYtu3uOoGmdqZaZqrWDK/gRngPF7hUElwNUXT\nWjICMatsRPn+qW7L4iQ+dtu9FMQTRK9DUEx6305aHYFvftPibWhR8QKBgQCAd3GG\nNmXKihaMsr2kUjCPvG1+7WPVfHfbaE9PHyFnBAaXv4f7kvRJn+QQGRGlBjINYFl8\njj6S9HFQwCqGqTsKabeQ/8auyIK3PeDdXqE9FW0FFyGRGXarfueRQqTU1pCpcc89\n7gwiEmeIIJiruCoqcwGh3gvQo1/6AkAO8JxLKQKBgF0T8P0hRctXFejcFf/4EikS\n2+WA/gNSQITC1m+8nWNnU+bDmRax+pIkzlvjkG5kyNfWvB7i2A5Y5OnCo92y5aDQ\nzbGHLwZj0HXqLFXhbAv/0xZPXlZ71NWpi2BpCJRnzU65ftsjePfydfvN6g4mPQ28\nkHQsYKUZk5HPC8FlPvQe\n-----END PRIVATE KEY-----\n"


File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@
import dlt
from dlt.destinations import bigquery, postgres

from docs.examples.sources.jsonl import jsonl_files
from docs.examples.sources.rasa import rasa
from .sources.jsonl import jsonl_files
from .sources.rasa import rasa

from docs.examples._helpers import pub_bigquery_credentials
from ._helpers import pub_bigquery_credentials

# let's load to bigquery, here we provide the credentials for our public project
# credentials = pub_bigquery_credentials
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,6 @@ def singer_messages() -> Iterator[TDataItem]:
os.path.abspath(catalog_file_path),
*state_params
)
yield from get_source_from_stream(pipe_iterator, state) # type: ignore
yield from get_source_from_stream(pipe_iterator, state)

return singer_messages
File renamed without changes.
File renamed without changes.
File renamed without changes.
Empty file.
4 changes: 4 additions & 0 deletions docs/examples/incremental_loading/.dlt/secrets.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[sources.zendesk.credentials]
password = ""
subdomain = ""
email = ""
Empty file.
126 changes: 126 additions & 0 deletions docs/examples/incremental_loading/zendesk.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
from typing import Iterator, Optional, Dict, Any, Tuple

import dlt
from dlt.common import pendulum
from dlt.common.time import ensure_pendulum_datetime
from dlt.common.typing import TDataItem, TDataItems, TAnyDateTime
from dlt.extract.source import DltResource
from dlt.sources.helpers.requests import client


@dlt.source(max_table_nesting=2)
def zendesk_support(
credentials: Dict[str, str]=dlt.secrets.value,
start_date: Optional[TAnyDateTime] = pendulum.datetime(year=2000, month=1, day=1), # noqa: B008
end_date: Optional[TAnyDateTime] = None,
):
"""
Retrieves data from Zendesk Support for tickets events.
Args:
credentials: Zendesk credentials (default: dlt.secrets.value)
start_date: Start date for data extraction (default: 2000-01-01)
end_date: End date for data extraction (default: None).
If end time is not provided, the incremental loading will be
enabled, and after the initial run, only new data will be retrieved.
Returns:
DltResource.
"""
# Convert start_date and end_date to Pendulum datetime objects
start_date_obj = ensure_pendulum_datetime(start_date)
end_date_obj = ensure_pendulum_datetime(end_date) if end_date else None

# Convert Pendulum datetime objects to Unix timestamps
start_date_ts = start_date_obj.int_timestamp
end_date_ts: Optional[int] = None
if end_date_obj:
end_date_ts = end_date_obj.int_timestamp

# Extract credentials from secrets dictionary
auth = (credentials["email"], credentials["password"])
subdomain = credentials["subdomain"]
url = f"https://{subdomain}.zendesk.com"

# we use `append` write disposition, because objects in ticket_events endpoint are never updated
# so we do not need to merge
# we set primary_key so allow deduplication of events by the `incremental` below in the rare case
# when two events have the same timestamp
@dlt.resource(primary_key="id", write_disposition="append")
def ticket_events(
timestamp: dlt.sources.incremental[int] = dlt.sources.incremental(
"timestamp",
initial_value=start_date_ts,
end_value=end_date_ts,
allow_external_schedulers=True,
),
):
# URL For ticket events
# 'https://d3v-dlthub.zendesk.com/api/v2/incremental/ticket_events.json?start_time=946684800'
event_pages = get_pages(
url=url,
endpoint="/api/v2/incremental/ticket_events.json",
auth=auth,
data_point_name="ticket_events",
params={"start_time": timestamp.last_value},
)
for page in event_pages:
yield page
# stop loading when using end_value and end is reached.
# unfortunately, Zendesk API does not have the "end_time" parameter, so we stop iterating ourselves
if timestamp.end_out_of_range:
return

return ticket_events


def get_pages(
url: str,
endpoint: str,
auth: Tuple[str, str],
data_point_name: str,
params: Optional[Dict[str, Any]] = None,
):
"""
Makes a request to a paginated endpoint and returns a generator of data items per page.
Args:
url: The base URL.
endpoint: The url to the endpoint, e.g. /api/v2/calls
auth: Credentials for authentication.
data_point_name: The key which data items are nested under in the response object (e.g. calls)
params: Optional dict of query params to include in the request.
Returns:
Generator of pages, each page is a list of dict data items.
"""
# update the page size to enable cursor pagination
params = params or {}
params["per_page"] = 1000
headers = None

# make request and keep looping until there is no next page
get_url = f"{url}{endpoint}"
while get_url:
response = client.get(
get_url, headers=headers, auth=auth, params=params
)
response.raise_for_status()
response_json = response.json()
result = response_json[data_point_name]
yield result

get_url = None
# See https://developer.zendesk.com/api-reference/ticketing/ticket-management/incremental_exports/#json-format
if not response_json["end_of_stream"]:
get_url = response_json["next_page"]


if __name__ == "__main__":
# create dlt pipeline
pipeline = dlt.pipeline(
pipeline_name="zendesk", destination="duckdb", dataset_name="zendesk_data"
)

load_info = pipeline.run(zendesk_support())
print(load_info)
16 changes: 16 additions & 0 deletions docs/examples/transformers/.dlt/config.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
[runtime]
log_level="WARNING"

[extract]
# use 2 workers to extract sources in parallel
worker=2
# allow 10 async items to be processed in parallel
max_parallel_items=10

[normalize]
# use 3 worker processes to process 3 files in parallel
workers=3

[load]
# have 50 concurrent load jobs
workers=50
Empty file.
61 changes: 61 additions & 0 deletions docs/examples/transformers/pokemon.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
import dlt
from dlt.sources.helpers import requests


@dlt.source(max_table_nesting=2)
def source(pokemon_api_url: str):
""""""

# note that we deselect `pokemon_list` - we do not want it to be loaded
@dlt.resource(write_disposition="replace", selected=False)
def pokemon_list():
"""Retrieve a first page of Pokemons and yield it. We do not retrieve all the pages in this example"""
yield requests.get(pokemon_api_url).json()["results"]

# transformer that retrieves a list of objects in parallel
@dlt.transformer
def pokemon(pokemons):
"""Yields details for a list of `pokemons`"""

# @dlt.defer marks a function to be executed in parallel
# in a thread pool
@dlt.defer
def _get_pokemon(_pokemon):
return requests.get(_pokemon["url"]).json()

# call and yield the function result normally, the @dlt.defer takes care of parallelism
for _pokemon in pokemons:
yield _get_pokemon(_pokemon)

# a special case where just one item is retrieved in transformer
# a whole transformer may be marked for parallel execution
@dlt.transformer
@dlt.defer
def species(pokemon_details):
"""Yields species details for a pokemon"""
species_data = requests.get(pokemon_details["species"]["url"]).json()
# link back to pokemon so we have a relation in loaded data
species_data["pokemon_id"] = pokemon_details["id"]
# just return the results, if you yield,
# generator will be evaluated in main thread
return species_data

# create two simple pipelines with | operator
# 1. send list of pokemons into `pokemon` transformer to get pokemon details
# 2. send pokemon details into `species` transformer to get species details
# NOTE: dlt is smart enough to get data from pokemon_list and pokemon details once

return (
pokemon_list | pokemon,
pokemon_list | pokemon | species
)

if __name__ == "__main__":
# build duck db pipeline
pipeline = dlt.pipeline(
pipeline_name="pokemon", destination="duckdb", dataset_name="pokemon_data"
)

# the pokemon_list resource does not need to be loaded
load_info = pipeline.run(source("https://pokeapi.co/api/v2/pokemon"))
print(load_info)
Empty file.
4 changes: 2 additions & 2 deletions docs/technical/create_pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -420,12 +420,12 @@ The Python function that yields is not a function but magical object that `dlt`

```python
def lazy_function(endpoint_name):
# INIT - this will be executed only once when DLT wants!
# INIT - this will be executed only once when dlt wants!
get_configuration()
from_item = dlt.current.state.get("last_item", 0)
l = get_item_list_from_api(api_key, endpoint_name)

# ITERATOR - this will be executed many times also when DLT wants more data!
# ITERATOR - this will be executed many times also when dlt wants more data!
for item in l:
yield requests.get(url, api_key, "%s?id=%s" % (endpoint_name, item["id"])).json()
# CLEANUP
Expand Down
2 changes: 1 addition & 1 deletion docs/website/docs/dlt-ecosystem/destinations/athena.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ keywords: [aws, athena, glue catalog]

# AWS Athena / Glue Catalog

The athena destination stores data as parquet files in s3 buckets and creates [external tables in aws athena](https://docs.aws.amazon.com/athena/latest/ug/creating-tables.html). You can then query those tables with athena sql commands which will then scan the whole folder of parquet files and return the results. This destination works very similar to other sql based destinations, with the exception of the merge write disposition not being supported at this time. DLT metadata will be stored in the same bucket as the parquet files, but as iceberg tables.
The athena destination stores data as parquet files in s3 buckets and creates [external tables in aws athena](https://docs.aws.amazon.com/athena/latest/ug/creating-tables.html). You can then query those tables with athena sql commands which will then scan the whole folder of parquet files and return the results. This destination works very similar to other sql based destinations, with the exception of the merge write disposition not being supported at this time. dlt metadata will be stored in the same bucket as the parquet files, but as iceberg tables.

## Setup Guide
### 1. Initialize the dlt project
Expand Down
2 changes: 1 addition & 1 deletion docs/website/docs/dlt-ecosystem/destinations/bigquery.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ BigQuery supports the following [column hints](https://dlthub.com/docs/general-u

## Staging Support

BigQuery supports gcs as a file staging destination. DLT will upload files in the parquet format to gcs and ask BigQuery to copy their data directly into the db. Please refer to the [Google Storage filesystem documentation](./filesystem.md#google-storage) to learn how to set up your gcs bucket with the bucket_url and credentials. If you use the same service account for gcs and your redshift deployment, you do not need to provide additional authentication for BigQuery to be able to read from your bucket.
BigQuery supports gcs as a file staging destination. dlt will upload files in the parquet format to gcs and ask BigQuery to copy their data directly into the db. Please refer to the [Google Storage filesystem documentation](./filesystem.md#google-storage) to learn how to set up your gcs bucket with the bucket_url and credentials. If you use the same service account for gcs and your redshift deployment, you do not need to provide additional authentication for BigQuery to be able to read from your bucket.
```toml
```

Expand Down
Loading

0 comments on commit 58f8ad1

Please sign in to comment.