Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Document incremental definition via config.toml #1118

Merged
merged 7 commits into from
Mar 25, 2024
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions docs/website/docs/general-usage/incremental-loading.md
Original file line number Diff line number Diff line change
Expand Up @@ -614,6 +614,35 @@ Before `dlt` starts executing incremental resources, it looks for `data_interval
You can run DAGs manually but you must remember to specify the Airflow logical date of the run in the past (use Run with config option). For such run `dlt` will load all data from that past date until now.
If you do not specify the past date, a run with a range (now, now) will happen yielding no data.

### Reading incremental loading parameters from configuration

Let's take the following example to read incremental loading parameters from the configuration file:

1. In "config.toml", define the parameter `id_after` as follows:
```toml
# Configuration snippet for an incremental resource
[incs.sources.pipeline.inc_res.id_after]
cursor_path = "idAfter"
initial_value = 10
```
The `id_after` parameter is defined with a `cursor_path` and an `initial_value`.The `cursor_path` is essential for the resource's progress tracking.

1. The `id_after` parameter is defined as an incremental source, and its value is retrieved from the configuration using `dlt.config.value`.
```py
@dlt.resource(table_name="incremental_records")
def inc_res(id_after: dlt.sources.incremental = dlt.config.value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here we maybe need to change the table_name parameter and then name of the function to something meaningful.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also function name please :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

for i in range(100):
yield {"id": i, "idAfter": i, "name": "name-" + str(i)}


pipeline = dlt.pipeline(
pipeline_name="pipeline_with_incremental",
destination="duckdb",
)

pipeline.run(inc_res)
```
The `inc_res` function generates a range of data, each item being a dictionary with keys `id`, `idAfter`, and `name`. The `idAfter` key is used by the incremental resource to track progress.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in the text as well

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay


## Doing a full refresh

Expand Down
Loading