dlt
sources are iterators or lists and writing them does not require any additional knowledge beyond basic python. dlt
sources are also pythonic in nature: they are simple, can be chained, pipelined and composed like any other python iterator or a sequence.
-
quickstart
loads a nested json document intoduckdb
and then queries it with built insql_client
demonstrating the parent-child table joins. -
sql_query
source andread_table
example. This source iterates over anySELECT
statement made against database system supported bySqlAlchemy
. The example connects to Redshift and iterates a table containing Ethereum transactions. Shows the inferred schema (which nicely preserves typing). Mind that our source is a one-liner :) -
rasa
example andrasa_tracker_store
source extracts rasa tracker store events to a set of inferred tables. It shows a few common patterns
- shows how to pipeline resources: it depends on a "head" resource that reads base data (ie. events from kafka/postgres/file). the dependent resource is called
transformer
- it shows how to write stream resource which creates table schemas and sends data to those tables depending on the event type
- it stores
last_timestamp_value
in the state
singer_tap
,stdout
andsinger_tap_example
is fully functional wrapper for any singer/meltano source
- clones the desired tap, installs it and runs it in a virtual env
- passes the catalog and config files
- like rasa it is a transformer (on stdio pipe) and
stream
resource - it stores singer state in
dlt
state
-
singer_tap_jsonl_example
like the above but instead of process pipe it reads singer messages from file. it creates a huge hubspot schema. -
google_sheets
a source that returns values from specified sheet. The example takes a sheet, infers a schema, loads it to BigQuery/Redshift and displays inferred schema. it uses thesecrets.toml
to manage credentials and is an example of one-liner pipeline -
chess
an example of a pipeline project with its own config and credential files. it is also an example of how transformers are connected to resources and resource selection. it should be run from examples/chess` folder. It also shows: how to use retry decorator and how to run resources/transformers in parallel with a decorator -
chess/chess_dbt.py
: an example of adbt
transformations package working with a dataset loaded bydlt
. The package is incrementally processing the loaded data following the new loaded packages stored in_dlt_loads
table at the end of every pipeline run. Note the automatic usage of isolated virtual environment to run dbt and sharing of the credentials. -
run_dbt_jaffle
runs dbt's jaffle shop example taken directly from the github repo and queries the results withsql_client
.duckdb
database is used to load and transform the data. The databasewrite
access is passed fromdlt
todbt
and back.
Not yet ported:
-
discord_iterator
an example that load example discord data (messages, channels) into warehouse from supplied files. Shows several auxiliary pipeline functions and an example of pipelining iterators (withmap
function). You can also see that produced schema is quite complicated due to several layers of nesting. -
ethereum
source shows that you can build highly scalable, parallel and robust sources as simple iterators.