Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: update the introduction, add the rest_api tutorial #1729

Merged
merged 35 commits into from
Sep 14, 2024
Merged
Changes from 1 commit
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
c5d88e6
Add intro, and rest api tutorial
burnash Aug 22, 2024
8fb941d
Update docs/website/docs/tutorial/rest-api.md
burnash Sep 9, 2024
972d31d
Update docs/website/docs/tutorial/rest-api.md
burnash Sep 9, 2024
e7c82ad
Update docs/website/docs/intro.md
burnash Sep 10, 2024
a9d3c83
Fix a broken link
burnash Sep 10, 2024
9c60f9f
Remove the reference to daily updates.
burnash Sep 10, 2024
4e884ad
Update docs/website/docs/tutorial/rest-api.md
burnash Sep 10, 2024
dd488e0
Update docs/website/docs/tutorial/rest-api.md
burnash Sep 10, 2024
ff43a8a
Update docs/website/docs/tutorial/rest-api.md
burnash Sep 11, 2024
b43cfe9
Rework the "merging" section
burnash Sep 11, 2024
d5305b2
Restructure intro, sidebar and dlt tutorial
burnash Sep 12, 2024
2393f9d
Rework why dlt and getting started sections
burnash Sep 12, 2024
990a546
Bring back google colab link
burnash Sep 12, 2024
b25fc09
Merge branch 'devel' into enh/docs/introduction-rest-sql-file
burnash Sep 12, 2024
8f6a7db
Add a missing comma
burnash Sep 12, 2024
3ba59fd
Fix the docs path
burnash Sep 12, 2024
d1cc062
Fix more links
burnash Sep 12, 2024
fb26b98
Update docs/website/docs/intro.md
burnash Sep 12, 2024
838e29b
Fix links
burnash Sep 12, 2024
65dc303
Update docs/website/docs/intro.md
burnash Sep 12, 2024
ed6153f
Update the custom pipeline tutorial
burnash Sep 12, 2024
1601437
Merge branch 'devel' into enh/docs/introduction-rest-sql-file
burnash Sep 13, 2024
71b3e11
Fix a link
burnash Sep 13, 2024
abcc346
Rename sql database page url, add links
burnash Sep 13, 2024
0bf4f76
Incorporate the groupping resources page into the python ds tutorial
burnash Sep 13, 2024
99ca612
Remove legacy tutorial intro and incorporate it to the ds tutorial
burnash Sep 13, 2024
33dd9ca
Elaborate on pds tutorual
burnash Sep 13, 2024
7444c7d
Remove absolute links and hanging whitespace
burnash Sep 13, 2024
c9378ce
Replace the screenshot
burnash Sep 13, 2024
fca18de
Add description and fix text style
burnash Sep 13, 2024
069e4de
Enable module imports in the intro snippets
burnash Sep 13, 2024
b7fa8ca
Format the snippet code
burnash Sep 13, 2024
fc75ccb
Add active version class to document body
burnash Sep 13, 2024
21c6bbc
Revert "Add active version class to document body"
burnash Sep 13, 2024
3c162ab
Add "stable" icons
burnash Sep 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Rework the "merging" section
burnash committed Sep 11, 2024

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
commit b43cfe9e1d842b8349fa0fbfc294a72f624902d5
16 changes: 12 additions & 4 deletions docs/website/docs/tutorial/rest-api.md
Original file line number Diff line number Diff line change
@@ -4,7 +4,9 @@ description: How to extract data from a REST API using dlt's REST API source
keywords: [tutorial, api, github, duckdb, rest api, source, pagination, authentication]
---

This tutorial demonstrates how to extract data from a REST API using dlt's REST API source and load it into a destination. You will learn how to build a data pipeline that loads data from the [GitHub API](https://docs.github.com/en/) into a local DuckDB database.
This tutorial demonstrates how to extract data from a REST API using dlt's REST API source and load it into a destination. You will learn how to build a data pipeline that loads data from the [Pokemon](https://pokeapi.co/) and the [GitHub API](https://docs.github.com/en/) into a local DuckDB database.

Extracting data from an API is straightforward with dlt: provide the base URL, define the resources you want to fetch, and dlt will handle the pagination, authentication, and data loading.

## What you will learn

@@ -233,23 +235,29 @@ pokemon_source = rest_api_source(
"limit": 1000,
},
},
# For the `berry` and `location` resources, we set
# the write disposition to `replace`
# For the `berry` and `location` resources, we keep
# the`replace` write disposition
"write_disposition": "replace",
},
"resources": [
# We create a specific configuration for the `pokemon` resource
# using a dictionary instead of a string to configure
# the primary key and write disposition
{
"name": "pokemon",
"primary_key": "id", # Setting the primary key for the `pokemon` resource
"primary_key": "id",
"write_disposition": "merge",
},
# The `berry` and `location` resources will use the default
"berry",
"location",
],
}
)
```
burnash marked this conversation as resolved.
Show resolved Hide resolved

Run the pipeline with `python rest_api_pipeline.py`, the data for the `pokemon` resource will be merged with the existing data in the destination table based on the `id` field.

## Loading data incrementally

When working with some APIs, you may need to load data incrementally to avoid fetching the entire dataset every time and to reduce the load time. APIs that support incremental loading usually provide a way to fetch only new or changed data (most often by using a timestamp field like `updated_at`, `created_at`, or incremental IDs).