Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: update the introduction, add the rest_api tutorial #1729

Merged
merged 35 commits into from
Sep 14, 2024
Merged
Changes from 1 commit
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
c5d88e6
Add intro, and rest api tutorial
burnash Aug 22, 2024
8fb941d
Update docs/website/docs/tutorial/rest-api.md
burnash Sep 9, 2024
972d31d
Update docs/website/docs/tutorial/rest-api.md
burnash Sep 9, 2024
e7c82ad
Update docs/website/docs/intro.md
burnash Sep 10, 2024
a9d3c83
Fix a broken link
burnash Sep 10, 2024
9c60f9f
Remove the reference to daily updates.
burnash Sep 10, 2024
4e884ad
Update docs/website/docs/tutorial/rest-api.md
burnash Sep 10, 2024
dd488e0
Update docs/website/docs/tutorial/rest-api.md
burnash Sep 10, 2024
ff43a8a
Update docs/website/docs/tutorial/rest-api.md
burnash Sep 11, 2024
b43cfe9
Rework the "merging" section
burnash Sep 11, 2024
d5305b2
Restructure intro, sidebar and dlt tutorial
burnash Sep 12, 2024
2393f9d
Rework why dlt and getting started sections
burnash Sep 12, 2024
990a546
Bring back google colab link
burnash Sep 12, 2024
b25fc09
Merge branch 'devel' into enh/docs/introduction-rest-sql-file
burnash Sep 12, 2024
8f6a7db
Add a missing comma
burnash Sep 12, 2024
3ba59fd
Fix the docs path
burnash Sep 12, 2024
d1cc062
Fix more links
burnash Sep 12, 2024
fb26b98
Update docs/website/docs/intro.md
burnash Sep 12, 2024
838e29b
Fix links
burnash Sep 12, 2024
65dc303
Update docs/website/docs/intro.md
burnash Sep 12, 2024
ed6153f
Update the custom pipeline tutorial
burnash Sep 12, 2024
1601437
Merge branch 'devel' into enh/docs/introduction-rest-sql-file
burnash Sep 13, 2024
71b3e11
Fix a link
burnash Sep 13, 2024
abcc346
Rename sql database page url, add links
burnash Sep 13, 2024
0bf4f76
Incorporate the groupping resources page into the python ds tutorial
burnash Sep 13, 2024
99ca612
Remove legacy tutorial intro and incorporate it to the ds tutorial
burnash Sep 13, 2024
33dd9ca
Elaborate on pds tutorual
burnash Sep 13, 2024
7444c7d
Remove absolute links and hanging whitespace
burnash Sep 13, 2024
c9378ce
Replace the screenshot
burnash Sep 13, 2024
fca18de
Add description and fix text style
burnash Sep 13, 2024
069e4de
Enable module imports in the intro snippets
burnash Sep 13, 2024
b7fa8ca
Format the snippet code
burnash Sep 13, 2024
fc75ccb
Add active version class to document body
burnash Sep 13, 2024
21c6bbc
Revert "Add active version class to document body"
burnash Sep 13, 2024
3c162ab
Add "stable" icons
burnash Sep 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Add intro, and rest api tutorial
burnash committed Sep 11, 2024

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
commit c5d88e65b081bb564be3b952f54c52cced920bad
158 changes: 96 additions & 62 deletions docs/website/docs/intro.md
Original file line number Diff line number Diff line change
@@ -10,112 +10,146 @@ import snippets from '!!raw-loader!./intro-snippets.py';

![dlt pacman](/img/dlt-pacman.gif)

## What is `dlt`?
## What is dlt?

dlt is a Python library that simplifies how you move data between various sources and destinations. It offers a lightweight interface for extracting data from [REST APIs](./tutorial/rest-api), [SQL databases](./tutorial/sql-database), [cloud storages](./tutorial/filesystem), [Python data structures](getting-started), and more.
burnash marked this conversation as resolved.
Show resolved Hide resolved

dlt is designed to be easy to use, flexible, and scalable:

- dlt infers [schemas](./general-usage/schema) and [data types](./general-usage/schema/#data-types), [normalizes the data](./general-usage/schema/#data-normalizer), and handles nested data structures.
- dlt supports variety of [popular destinations](./dlt-ecosystem/destinations/) and has an interface to add [custom destinations](./dlt-ecosystem/destinations/destination) to create reverse ETL pipelines.
burnash marked this conversation as resolved.
Show resolved Hide resolved
- Use dlt locally or [in the cloud](./walkthroughs/deploy-a-pipeline) to build data pipelines, data lakes, and data warehouses.

To get started with dlt, install the library using pip:

`dlt` is an open-source library that you can add to your Python scripts to load data
from various and often messy data sources into well-structured, live datasets. To get started, install it with:
```sh
pip install dlt
```
:::tip
We recommend using a clean virtual environment for your experiments! Here are [detailed instructions](/reference/installation).
We recommend using a clean virtual environment for your experiments! Here are [detailed instructions](/reference/installation) on how to set up one.
:::

Unlike other solutions, with dlt, there's no need to use any backends or containers. Simply import `dlt` in a Python file or a Jupyter Notebook cell, and create a pipeline to load data into any of the [supported destinations](dlt-ecosystem/destinations/). You can load data from any source that produces Python data structures, including APIs, files, databases, and more. `dlt` also supports building a [custom destination](dlt-ecosystem/destinations/destination.md), which you can use as reverse ETL.

The library will create or update tables, infer data types, and handle nested data automatically. Here are a few example pipelines:
## Load data with dlt from …

<Tabs
groupId="source-type"
defaultValue="api"
defaultValue="rest-api"
values={[
{"label": "Data from an API", "value": "api"},
{"label": "Data from a dlt Source", "value": "source"},
{"label": "Data from CSV/XLS/Pandas", "value": "csv"},
{"label": "Data from a Database", "value":"database"}
{"label": "REST APIs", "value": "rest-api"},
{"label": "SQL databases", "value": "sql-database"},
{"label": "Cloud storages or files", "value": "filesystem"},
{"label": "Python data structures", "value": "python-data"},
]}>
<TabItem value="api">
<TabItem value="rest-api">

:::tip
Looking to use a REST API as a source? Explore our new [REST API generic source](dlt-ecosystem/verified-sources/rest_api) for a declarative way to load data.
:::
Use dlt's [REST API source](tutorial/rest-api) to extract data from any REST API. Define API endpoints you’d like to fetch data from, pagination method and authentication and dlt will handle the rest:

<!--@@@DLT_SNIPPET api-->
```py
# from dlt.sources import rest_api
burnash marked this conversation as resolved.
Show resolved Hide resolved

source = rest_api({
"client": {
"base_url": "https://api.example.com/",
"auth": {
"token": dlt.secrets["your_api_token"],
},
"paginator": {
"type": "json_response",
"next_url_path": "paging.next",
},
},
"resources": [
"posts",
"comments"
]
})

pipeline = dlt.pipeline(
pipeline_name="rest_api_example",
destination="duckdb",
dataset_name="rest_api_data",
)

Copy this example to a file or a Jupyter Notebook and run it. To make it work with the DuckDB destination, you'll need to install the **duckdb** dependency (the default `dlt` installation is really minimal):
```sh
pip install "dlt[duckdb]"
load_info = pipeline.run(source)
```
Now **run** your Python file or Notebook cell.

How it works? The library extracts data from a [source](general-usage/glossary.md#source) (here: **chess.com REST API**), inspects its structure to create a
[schema](general-usage/glossary.md#schema), structures, normalizes, and verifies the data, and then
loads it into a [destination](general-usage/glossary.md#destination) (here: **duckdb**, into a database schema **player_data** and table name **player**).
Follow the [REST API source tutorial](tutorial/rest-api) to learn more about the source configuration and pagination methods.
</TabItem>
<TabItem value="sql-database">

Use the [SQL source](tutorial/sql-database) to extract data from the database like PostgreSQL, MySQL, SQLite, Oracle and more.

</TabItem>
```py
# from dlt.sources.sql import sql_database

<TabItem value="source">
source = sql_database(
"mysql+pymysql://[email protected]:4497/Rfam"
)

Initialize the [Slack source](dlt-ecosystem/verified-sources/slack) with `dlt init` command:
pipeline = dlt.pipeline(
pipeline_name="sql_database_example",
destination="duckdb",
dataset_name="sql_data",
)

```sh
dlt init slack duckdb
load_info = pipeline.run(source)
```

Create and run a pipeline:
Follow the [SQL source tutorial](tutorial/sql-database) to learn more about the source configuration and supported databases.

</TabItem>
<TabItem value="filesystem">

[Filesystem](./tutorial/filesystem) source extracts data from AWS S3, Google Cloud Storage, Google Drive, Azure, or a local file system.

```py
import dlt
# from dlt.sources.filesystem import filesystem

from slack import slack_source
source = filesystem(
bucket_url="s3://example-bucket",
file_glob="*.csv"
)

pipeline = dlt.pipeline(
pipeline_name="slack",
pipeline_name="filesystem_example",
destination="duckdb",
dataset_name="slack_data"
)

source = slack_source(
start_date=datetime(2023, 9, 1),
end_date=datetime(2023, 9, 8),
page_size=100,
dataset_name="filesystem_data",
)

load_info = pipeline.run(source)
print(load_info)
```

</TabItem>
<TabItem value="csv">

Pass anything that you can load with Pandas to `dlt`

<!--@@@DLT_SNIPPET csv-->

Follow the [filesystem source tutorial](./tutorial/filesystem) to learn more about the source configuration and supported storage services.

</TabItem>
<TabItem value="database">
<TabItem value="python-data">

:::tip
Use our verified [SQL database source](dlt-ecosystem/verified-sources/sql_database)
to sync your databases with warehouses, data lakes, or vector stores.
:::
dlt is able to load data from Python generators or directly from Python data structures:

<!--@@@DLT_SNIPPET db-->
```py
import dlt

@dlt.resource
def foo():
for i in range(10):
yield {"id": i, "name": f"This is item {i}"}

Install **pymysql** driver:
```sh
pip install sqlalchemy pymysql
pipeline = dlt.pipeline(
pipeline_name="python_data_example",
destination="duckdb",
)

load_info = pipeline.run(foo)
```

Check out the [getting started guide](getting-started) to learn more about working with Python data.

</TabItem>

</Tabs>


## Why use `dlt`?
## Why use dlt?
burnash marked this conversation as resolved.
Show resolved Hide resolved

- Automated maintenance - with schema inference and evolution and alerts, and with short declarative
code, maintenance becomes simple.
@@ -124,18 +158,18 @@ external APIs, backends, or containers, scales on micro and large infra alike.
- User-friendly, declarative interface that removes knowledge obstacles for beginners
while empowering senior professionals.

## Getting started with `dlt`
1. Dive into our [Getting started guide](getting-started.md) for a quick intro to the essentials of `dlt`.
## Getting started with dlt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note to revisit these steps after we finalise the structure of the docs

1. Dive into our [Getting started guide](getting-started.md) for a quick intro to the essentials of dlt.
2. Play with the
[Google Colab demo](https://colab.research.google.com/drive/1NfSB1DpwbbHX9_t5vlalBTf13utwpMGx?usp=sharing).
This is the simplest way to see `dlt` in action.
This is the simplest way to see dlt in action.
3. Read the [Tutorial](tutorial/intro) to learn how to build a pipeline that loads data from an API.
4. Check out the [How-to guides](walkthroughs/) for recipes on common use cases for creating, running, and deploying pipelines.
5. Ask us on
[Slack](https://dlthub.com/community)
if you have any questions about use cases or the library.

## Join the `dlt` community
## Join the dlt community

1. Give the library a ⭐ and check out the code on [GitHub](https://github.com/dlt-hub/dlt).
1. Ask questions and share how you use the library on
27 changes: 27 additions & 0 deletions docs/website/docs/tutorial/filesystem.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
title: Load data from Filesystem or Cloud Storage
description: How to extract and load data from a filesystem or cloud storage using dlt
keywords: [tutorial, filesystem, cloud storage, dlt, python, data pipeline, incremental loading]
---

## What you will learn

- How to set up a filesystem or cloud storage source
- Configuration basics for filesystems and cloud storage
- Loading methods
- Incremental loading of data from filesystems or cloud storage

## Prerequisites

- Python 3.9 or higher installed
- Virtual environment set up

## Installing dlt

## Setting up a new project
## Installing dependencies
## Running the pipeline
## Configuring filesystem source
## Appending, replacing, and merging loaded data
## Loading data incrementally
## What's next?
2 changes: 1 addition & 1 deletion docs/website/docs/tutorial/intro.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: Tutorial
title: Tutorials
description: Build a data pipeline with dlt
keywords: [tutorial, api, github, duckdb, pipeline]
---
2 changes: 1 addition & 1 deletion docs/website/docs/tutorial/load-data-from-an-api.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: Load data from an API
title: "Building a custom dlt pipeline"
description: quick start with dlt
keywords: [getting started, quick start, basic examples]
---
320 changes: 320 additions & 0 deletions docs/website/docs/tutorial/rest-api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,320 @@
---
title: Load data from a REST API
description: How to extract data from a REST API using dlt's generic REST API source
burnash marked this conversation as resolved.
Show resolved Hide resolved
keywords: [tutorial, api, github, duckdb, rest api, source, pagination, authentication]
---

This tutorial shows how to extract data from a REST API using dlt's generic REST API source. The tutorial will guide you through the basics of setting up and configuring the source to load data from the API into a destination.
burnash marked this conversation as resolved.
Show resolved Hide resolved

As a practical example, we'll build a data pipelines that loads data from the [Pokemon](https://pokeapi.co/) and [GitHub](https://docs.github.com/en/rest) APIs into a [DuckDB](https://duckdb.org) database.
burnash marked this conversation as resolved.
Show resolved Hide resolved

## What you will learn

- How to set up a REST API source
- Configuration basics for API endpoints
- Handling pagination, authentication
burnash marked this conversation as resolved.
Show resolved Hide resolved
- Configuring the destination database
- Relationships between different resources
- How to append, replace, and merge data in the destination
- Loading data incrementally by fetching only new or updated data

## Prerequisites

- Python 3.9 or higher installed
- Virtual environment set up

## Installing dlt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would just put this into the prerequisites category vs having it's own section.


Before we start, make sure you have a Python virtual environment set up. Follow the instructions in the [installation guide](../reference/installation) to create a new virtual environment and install dlt.

Verify that dlt is installed by running the following command in your terminal:

```sh
dlt --version
```

If you see the version number (such as "dlt 0.5.3"), you're ready to proceed.

## Setting up a new project

Initialize a new dlt project with REST API source and DuckDB destination:

```sh
dlt init rest_api duckdb
```

`dlt init` creates multiple files and a directory for your project. Let's take a look at the project structure:

```sh
rest_api_pipeline.py
requirements.txt
.dlt/
config.toml
secrets.toml
```

Here's what each file and directory contains:

- `rest_api_pipeline.py`: This is the main script where you'll define your data pipeline. It contains two basic pipeline examples for Pokemon and GitHub APIs. You can modify or rename this file as needed.
- `requirements.txt`: This file lists all the Python dependencies required for your project.
- `.dlt/`: This directory contains the [configuration files](../general-usage/credentials/) for your project:
- `secrets.toml`: This file stores your API keys, tokens, and other sensitive information.
- `config.toml`: This file contains the configuration settings for your dlt project.

## Installing dependencies

Before we proceed, let's install the required dependencies for this tutorial. Run the following command to install the dependencies listed in the `requirements.txt` file:

```sh
pip install -r requirements.txt
```

## Running the pipeline

Let's verify that the pipeline is working as expected. Run the following command to execute the pipeline:

```sh
python rest_api_pipeline.py
```

You should see the output of the pipeline execution in the terminal. The output will also diplay the location of the DuckDB database file where the data is stored:

```sh
Pipeline rest_api_pokemon load step completed in 1.08 seconds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above this should be the Github API throughout the tutorial

1 load package(s) were loaded to destination duckdb and into dataset rest_api_data
The duckdb destination used duckdb:////home/user-name/quick_start/rest_api_pokemon.duckdb location to store data
Load package 1692364844.9254808 is LOADED and contains no failed jobs
```

## Exploring the data

Now that the pipeline has run successfully, let's explore the data loaded into DuckDB. dlt comes with a built-in browser application that allows you to interact with the data. To enable it, run the following command:

```sh
pip install streamlit
```

Next, run the following command to start the data browser:

```sh
dlt pipeline rest_api_pokemon show
```

The command opens a new browser window with the data browser application. `rest_api_pokemon` is the name of the pipeline defined in the `rest_api_pipeline.py` file.
You can explore the loaded data, run queries and see some pipeline execution details:

![Streamlit Explore data](/img/streamlit-new.png)

**>TODO: Update the image with a Pokemon API example screenshot<**

## Configuring the REST API source

Now that your environment and the project are set up, let's take a closer look at the configuration of the REST API source. Open the `rest_api_pipeline.py` file in your code editor and locate the following code snippet:

```py
def load_pokemon() -> None:
pipeline = dlt.pipeline(
pipeline_name="rest_api_pokemon",
destination="duckdb",
dataset_name="rest_api_data",
)

pokemon_source = rest_api_source(
{
"client": {
"base_url": "https://pokeapi.co/api/v2/"
},
"resource_defaults": {
"endpoint": {
"params": {
"limit": 1000,
},
},
},
"resources": [
"pokemon",
"berry",
"location",
],
}
)

...

load_info = pipeline.run(pokemon_source)
print(load_info)
```

Here what's happening in the code:

1. With `dlt.pipeline()` we define a new pipeline named `rest_api_pokemon` with DuckDB as the destination and `rest_api_data` as the dataset name.
2. The `rest_api_source()` function creates a new REST API source object.
3. We pass this source object to the `pipeline.run()` method to start the pipeline execution. Inside the `run()` method, dlt will fetch data from the API and load it into the DuckDB database.
4. The `print(load_info)` outputs the pipeline execution details to the console.

Let's break down the configuration of the REST API source. It consists of three main parts: `client`, `resource_defaults`, and `resources`.

```py
config: RESTAPIConfig = {
"client": {
...
},
"resource_defaults": {
...
},
"resources": [
...
],
}
```

- The `client` configuration is used to connect to the web server and authenticate if necessary. For our simple example, we only need to specify the `base_url` of the API: `https://pokeapi.co/api/v2/`.
- The `resource_defaults` configuration allows you to set default parameters for all resources. Normally you would set common parameters here, such as pagination limits. In our Pokemon API example, we set the `limit` parameter to 1000 for all resources to retrieve more data in a single request and reduce the number of HTTP API calls.
- The `resources` list contains the names of the resources you want to load from the API. REST API will use some conventions to determine the endpoint URL based on the resource name. For example, the resource name `pokemon` will be translated to the endpoint URL `https://pokeapi.co/api/v2/pokemon`.

:::note
### Pagination
You may have noticed that we didn't specify any pagination configuration in the `rest_api_source()` function. That's because for REST APIs that follow best practices, dlt can automatically detect and handle pagination. Read more about [configuring pagination](../dlt-ecosystem/verified-sources/rest_api.md#pagination) in the REST API source documentation.
:::

## Appending, replacing, and merging loaded data

Try running the pipeline again with `python rest_api_pipeline.py`. You will notice that
all the tables have data duplicated. This happens because by default, dlt appends the data to the destination table. It is very useful, for example, when you have daily data updates and you want to ingest them. In dlt you can control how the data is loaded into the destination table by setting the `write_disposition` parameter in the resource configuration. The possible values are:
burnash marked this conversation as resolved.
Show resolved Hide resolved
- `append`: Appends the data to the destination table. This is the default.
- `replace`: Replaces the data in the destination table with the new data.
- `merge`: Merges the new data with the existing data in the destination table based on the primary key.

### Replacing the data

In our case, we don't want to append the data every time we run the pipeline. Let's start with the simpler `replace` write disposition.

To change the write disposition to `replace`, update the `resource_defaults` configuration in the `rest_api_pipeline.py` file:

```py
...
pokemon_source = rest_api_source(
{
"client": {
"base_url": "https://pokeapi.co/api/v2/",
},
"resource_defaults": {
"endpoint": {
"params": {
"limit": 1000,
},
},
"write_disposition": "replace", # Setting the write disposition to `replace`
},
"resources": [
"pokemon",
"berry",
"location",
],
}
)
...
```

Run the pipeline again with `python rest_api_pipeline.py`. This time, the data will be replaced in the destination table instead of being appended.

### Merging the data

When you want to update the existing data as new data is loaded, you can use the `merge` write disposition. This requires specifying a primary key for the resource. The primary key is used to match the new data with the existing data in the destination table.

Let's update our example to use the `merge` write disposition. We need to specify the primary key for the `pokemon` resource and set the write disposition to `merge`:

```py
...
pokemon_source = rest_api_source(
{
"client": {
"base_url": "https://pokeapi.co/api/v2/",
},
"resource_defaults": {
"endpoint": {
"params": {
"limit": 1000,
},
},
# For the `berry` and `location` resources, we set
# the write disposition to `replace`
"write_disposition": "replace",
},
"resources": [
{
"name": "pokemon",
"primary_key": "id", # Setting the primary key for the `pokemon` resource
"write_disposition": "merge",
},
"berry",
"location",
],
}
)
```
burnash marked this conversation as resolved.
Show resolved Hide resolved

## Loading data incrementally

When working with some APIs, you may need to load data incrementally to avoid fetching the entire dataset every time and to reduce the load time. The API that support incremental loading usually provide a way to fetch only new or changed data (most often by using a timestamp field like `updated_at`, `created_at`, or incremental IDs).
burnash marked this conversation as resolved.
Show resolved Hide resolved

To illustrate incremental loading, let's consider the GitHub API. In the `rest_api_pipeline.py` file, you can find an example of how to load data from the GitHub API incrementally. Let's take a look at the configuration:

```py
pipeline = dlt.pipeline(
pipeline_name="rest_api_github",
destination="duckdb",
dataset_name="rest_api_data",
)

github_source = rest_api_source({
"client": {
"base_url": "https://api.github.com/repos/dlt-hub/dlt/",
},
"resource_defaults": {
"primary_key": "id",
"write_disposition": "merge",
"endpoint": {
"params": {
"per_page": 100,
},
},
},
"resources": [
{
"name": "issues",
"endpoint": {
"path": "issues",
"params": {
"sort": "updated",
"direction": "desc",
"state": "open",
"since": {
"type": "incremental",
"cursor_path": "updated_at",
"initial_value": "2024-01-25T11:21:28Z",
},
},
},
},
],
})

load_info = pipeline.run(github_source())
print(load_info)
```

In this configuration, the `since` parameter is defined as an special incremental parameter. The `cursor_path` field specifies the JSON path to the field that will be used to fetch the updated data and we use the `initial_value` for the initial value for the incremental parameter. This value will be used in the first request to fetch the data.
burnash marked this conversation as resolved.
Show resolved Hide resolved

When the pipeline runs, dlt will automatically update the `since` parameter with the latest value from the response data. This way, you can fetch only the new or updated data from the API.

Read more about [incremental loading](../dlt-ecosystem/verified-sources/rest_api.md#incremental-loading) in the REST API source documentation.

## What's next?

Congratulations on completing the tutorial! You've learned how to set up a REST API source in dlt and run a data pipeline to load the data into DuckDB.

Interested in learning more about dlt? Here are some suggestions:

- Learn more about the REST API source configuration in [REST API source documentation](../dlt-ecosystem/verified-sources/rest_api.md)
- Learn how to [create a custom source](./load-data-from-an-api.md) in the advanced tutorial
27 changes: 27 additions & 0 deletions docs/website/docs/tutorial/sql-database.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
title: Load data from a SQL database
description: How to extract data from a REST API using dlt's generic REST API source
keywords: [tutorial, api, github, duckdb, rest api, source, pagination, authentication]
---

## What you will learn

- How to set up a SQL database source
- Configuration basics for SQL databases
- Loading methods
- Incremental loading of data from SQL databases

## Prerequisites

- Python 3.9 or higher installed
- Virtual environment set up

## Installing dlt

## Setting up a new project
## Installing dependencies
## Running the pipeline
## Configuring filesystem source
## Appending, replacing, and merging loaded data
## Loading data incrementally
## What's next?
5 changes: 4 additions & 1 deletion docs/website/sidebars.js
Original file line number Diff line number Diff line change
@@ -32,12 +32,15 @@ const sidebars = {
'getting-started',
{
type: 'category',
label: 'Tutorial',
label: 'Tutorials',
link: {
type: 'doc',
id: 'tutorial/intro',
},
items: [
'tutorial/rest-api',
'tutorial/sql-database',
'tutorial/filesystem',
'tutorial/load-data-from-an-api',
'tutorial/grouping-resources',
]