Skip to content

Commit

Permalink
Docs: Tutorials formatting + from scratch connector tutorial cleanup (#…
Browse files Browse the repository at this point in the history
…33839)

Co-authored-by: Marcos Marx <[email protected]>
  • Loading branch information
natikgadzhi and marcosmarxm authored Mar 7, 2024
1 parent 4fcff41 commit e4ccffb
Show file tree
Hide file tree
Showing 15 changed files with 1,042 additions and 521 deletions.
183 changes: 127 additions & 56 deletions docs/connector-development/tutorials/building-a-java-destination.md

Large diffs are not rendered by default.

248 changes: 178 additions & 70 deletions docs/connector-development/tutorials/building-a-python-source.md

Large diffs are not rendered by default.

81 changes: 57 additions & 24 deletions docs/connector-development/tutorials/cdk-speedrun.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,11 @@

## CDK Speedrun \(HTTP API Source Creation Any Route\)

This is a blazing fast guide to building an HTTP source connector. Think of it as the TL;DR version of [this tutorial.](cdk-tutorial-python-http/getting-started.md)
This is a blazing fast guide to building an HTTP source connector. Think of it as the TL;DR version
of [this tutorial.](cdk-tutorial-python-http/getting-started.md)

If you are a visual learner and want to see a video version of this guide going over each part in detail, check it out below.
If you are a visual learner and want to see a video version of this guide going over each part in
detail, check it out below.

[A speedy CDK overview.](https://www.youtube.com/watch?v=kJ3hLoNfz_E)

Expand All @@ -19,9 +21,9 @@ If you are a visual learner and want to see a video version of this guide going

```bash
# # clone the repo if you havent already
# git clone --depth 1 https://github.com/airbytehq/airbyte/
# git clone --depth 1 https://github.com/airbytehq/airbyte/
# cd airbyte # start from repo root
cd airbyte-integrations/connector-templates/generator
cd airbyte-integrations/connector-templates/generator
./generate.sh
```

Expand All @@ -40,7 +42,8 @@ poetry install
cd source_python_http_example
```

We're working with the PokeAPI, so we need to define our input schema to reflect that. Open the `spec.yaml` file here and replace it with:
We're working with the PokeAPI, so we need to define our input schema to reflect that. Open the
`spec.yaml` file here and replace it with:

```yaml
documentationUrl: https://docs.airbyte.com/integrations/sources/pokeapi
Expand All @@ -61,9 +64,14 @@ connectionSpecification:
- snorlax
```
As you can see, we have one input to our input schema, which is `pokemon_name`, which is required. Normally, input schemas will contain information such as API keys and client secrets that need to get passed down to all endpoints or streams.
As you can see, we have one input to our input schema, which is `pokemon_name`, which is required.
Normally, input schemas will contain information such as API keys and client secrets that need to
get passed down to all endpoints or streams.

Ok, let's write a function that checks the inputs we just defined. Nuke the `source.py` file. Now add this code to it. For a crucial time skip, we're going to define all the imports we need in the future here. Also note that your `AbstractSource` class name must be a camel-cased version of the name you gave in the generation phase. In our case, this is `SourcePythonHttpExample`.
Ok, let's write a function that checks the inputs we just defined. Nuke the `source.py` file. Now
add this code to it. For a crucial time skip, we're going to define all the imports we need in the
future here. Also note that your `AbstractSource` class name must be a camel-cased version of the
name you gave in the generation phase. In our case, this is `SourcePythonHttpExample`.

```python
from typing import Any, Iterable, List, Mapping, MutableMapping, Optional, Tuple
Expand Down Expand Up @@ -94,7 +102,9 @@ class SourcePythonHttpExample(AbstractSource):
return [Pokemon(pokemon_name=config["pokemon_name"])]
```

Create a new file called `pokemon_list.py` at the same level. This will handle input validation for us so that we don't input invalid Pokemon. Let's start with a very limited list - any Pokemon not included in this list will get rejected.
Create a new file called `pokemon_list.py` at the same level. This will handle input validation for
us so that we don't input invalid Pokemon. Let's start with a very limited list - any Pokemon not
included in this list will get rejected.

```python
"""
Expand Down Expand Up @@ -133,7 +143,8 @@ Expected output:

### Define your Stream

In your `source.py` file, add this `Pokemon` class. This stream represents an endpoint you want to hit, which in our case, is the single [Pokemon endpoint](https://pokeapi.co/docs/v2#pokemon).
In your `source.py` file, add this `Pokemon` class. This stream represents an endpoint you want to
hit, which in our case, is the single [Pokemon endpoint](https://pokeapi.co/docs/v2#pokemon).

```python
class Pokemon(HttpStream):
Expand All @@ -151,7 +162,7 @@ class Pokemon(HttpStream):
return None
def path(
self,
self,
) -> str:
return "" # TODO
Expand All @@ -161,17 +172,25 @@ class Pokemon(HttpStream):
return None # TODO
```

Now download [this file](./cdk-speedrun-assets/pokemon.json). Name it `pokemon.json` and place it in `/source_python_http_example/schemas`.
Now download [this file](./cdk-speedrun-assets/pokemon.json). Name it `pokemon.json` and place it in
`/source_python_http_example/schemas`.

This file defines your output schema for every endpoint that you want to implement. Normally, this will likely be the most time-consuming section of the connector development process, as it requires defining the output of the endpoint exactly. This is really important, as Airbyte needs to have clear expectations for what the stream will output. Note that the name of this stream will be consistent in the naming of the JSON schema and the `HttpStream` class, as `pokemon.json` and `Pokemon` respectively in this case. Learn more about schema creation [here](https://docs.airbyte.com/connector-development/cdk-python/full-refresh-stream#defining-the-streams-schema).
This file defines your output schema for every endpoint that you want to implement. Normally, this
will likely be the most time-consuming section of the connector development process, as it requires
defining the output of the endpoint exactly. This is really important, as Airbyte needs to have
clear expectations for what the stream will output. Note that the name of this stream will be
consistent in the naming of the JSON schema and the `HttpStream` class, as `pokemon.json` and
`Pokemon` respectively in this case. Learn more about schema creation
[here](https://docs.airbyte.com/connector-development/cdk-python/full-refresh-stream#defining-the-streams-schema).

Test your discover function. You should receive a fairly large JSON object in return.

```bash
poetry run source-python-http-example discover --config sample_files/config.json
```

Note that our discover function is using the `pokemon_name` config variable passed in from the `Pokemon` stream when we set it in the `__init__` function.
Note that our discover function is using the `pokemon_name` config variable passed in from the
`Pokemon` stream when we set it in the `__init__` function.

### Reading Data from the Source

Expand Down Expand Up @@ -220,7 +239,13 @@ class Pokemon(HttpStream):
return None
```

We now need a catalog that defines all of our streams. We only have one stream: `Pokemon`. Download that file [here](./cdk-speedrun-assets/configured_catalog_pokeapi.json). Place it in `/sample_files` named as `configured_catalog.json`. More clearly, this is where we tell Airbyte all the streams/endpoints we support for the connector and in which sync modes Airbyte can run the connector on. Learn more about the AirbyteCatalog [here](https://docs.airbyte.com/understanding-airbyte/beginners-guide-to-catalog) and learn more about sync modes [here](https://docs.airbyte.com/understanding-airbyte/connections#sync-modes).
We now need a catalog that defines all of our streams. We only have one stream: `Pokemon`. Download
that file [here](./cdk-speedrun-assets/configured_catalog_pokeapi.json). Place it in `/sample_files`
named as `configured_catalog.json`. More clearly, this is where we tell Airbyte all the
streams/endpoints we support for the connector and in which sync modes Airbyte can run the connector
on. Learn more about the AirbyteCatalog
[here](https://docs.airbyte.com/understanding-airbyte/beginners-guide-to-catalog) and learn more
about sync modes [here](https://docs.airbyte.com/understanding-airbyte/connections#sync-modes).

Let's read some data.

Expand All @@ -230,24 +255,30 @@ poetry run source-python-http-example read --config sample_files/config.json --c

If all goes well, containerize it so you can use it in the UI:


**Option A: Building the docker image with `airbyte-ci`**

This is the preferred method for building and testing connectors.

If you want to open source your connector we encourage you to use our [`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md) tool to build your connector.
It will not use a Dockerfile but will build the connector image from our [base image](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/base_images/README.md) and use our internal build logic to build an image from your Python connector code.
If you want to open source your connector we encourage you to use our
[`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md)
tool to build your connector. It will not use a Dockerfile but will build the connector image from
our
[base image](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/base_images/README.md)
and use our internal build logic to build an image from your Python connector code.

Running `airbyte-ci connectors --name source-<source-name> build` will build your connector image.
Once the command is done, you will find your connector image in your local docker host: `airbyte/source-<source-name>:dev`.


Once the command is done, you will find your connector image in your local docker host:
`airbyte/source-<source-name>:dev`.

**Option B: Building the docker image with a Dockerfile**

If you don't want to rely on `airbyte-ci` to build your connector, you can build the docker image using your own Dockerfile. This method is not preferred, and is not supported for certified connectors.
If you don't want to rely on `airbyte-ci` to build your connector, you can build the docker image
using your own Dockerfile. This method is not preferred, and is not supported for certified
connectors.

Create a `Dockerfile` in the root of your connector directory. The `Dockerfile` should look
something like this:

Create a `Dockerfile` in the root of your connector directory. The `Dockerfile` should look something like this:
```Dockerfile
FROM airbyte/python-connector-base:1.1.0
Expand All @@ -263,13 +294,15 @@ RUN pip install ./airbyte/integration_code
Please use this as an example. This is not optimized.

Build your image:

```bash
docker build . -t airbyte/source-example-python:dev
```


You're done. Stop the clock :\)

## Further reading

If you have enjoyed the above example, and would like to explore the Python CDK in even more detail, you may be interested looking at [how to build a connector to extract data from the Webflow API](https://airbyte.com/tutorials/extract-data-from-the-webflow-api)
If you have enjoyed the above example, and would like to explore the Python CDK in even more detail,
you may be interested looking at
[how to build a connector to extract data from the Webflow API](https://airbyte.com/tutorials/extract-data-from-the-webflow-api)
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,18 @@

The second operation in the Airbyte Protocol that we'll implement is the `check` operation.

This operation verifies that the input configuration supplied by the user can be used to connect to the underlying data source. Note that this user-supplied configuration has the values described in the `spec.yaml` filled in. In other words if the `spec.yaml` said that the source requires a `username` and `password` the config object might be `{ "username": "airbyte", "password": "password123" }`. You should then implement something that returns a json object reporting, given the credentials in the config, whether we were able to connect to the source.

In order to make requests to the API, we need to specify the access.
In our case, this is a fairly trivial check since the API requires no credentials. Instead, let's verify that the user-input `base` currency is a legitimate currency. In `source.py` we'll find the following autogenerated source:
This operation verifies that the input configuration supplied by the user can be used to connect to
the underlying data source. Note that this user-supplied configuration has the values described in
the `spec.yaml` filled in. In other words if the `spec.yaml` said that the source requires a
`username` and `password` the config object might be
`{ "username": "airbyte", "password": "password123" }`. You should then implement something that
returns a json object reporting, given the credentials in the config, whether we were able to
connect to the source.

In order to make requests to the API, we need to specify the access. In our case, this is a fairly
trivial check since the API requires no credentials. Instead, let's verify that the user-input
`base` currency is a legitimate currency. In `source.py` we'll find the following autogenerated
source:

```python
class SourcePythonHttpTutorial(AbstractSource):
Expand All @@ -26,7 +34,8 @@ class SourcePythonHttpTutorial(AbstractSource):
...
```

Following the docstring instructions, we'll change the implementation to verify that the input currency is a real currency:
Following the docstring instructions, we'll change the implementation to verify that the input
currency is a real currency:

```python
def check_connection(self, logger, config) -> Tuple[bool, any]:
Expand All @@ -38,9 +47,19 @@ Following the docstring instructions, we'll change the implementation to verify
return True, None
```

Note: in a real implementation you should write code to connect to the API to validate connectivity and not just validate inputs - for an example see `check_connection` in the [OneSignal source connector implementation](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-onesignal/source_onesignal/source.py)
:::info

In a real implementation you should write code to connect to the API to validate connectivity
and not just validate inputs - for an example see `check_connection` in the
[OneSignal source connector implementation](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-onesignal/source_onesignal/source.py)

:::

Let's test out this implementation by creating two objects: a valid and an invalid config and attempt to give them as input to the connector. For this section, you will need to take the API access key generated earlier and add it to both configs. Because these configs contain secrets, we recommend storing configs which contain secrets in `secrets/config.json` because the `secrets` directory is gitignored by default.
Let's test out this implementation by creating two objects: a valid and an invalid config and
attempt to give them as input to the connector. For this section, you will need to take the API
access key generated earlier and add it to both configs. Because these configs contain secrets, we
recommend storing configs which contain secrets in `secrets/config.json` because the `secrets`
directory is gitignored by default.

```bash
mkdir sample_files
Expand All @@ -60,4 +79,5 @@ You should see output like the following:
{"type": "CONNECTION_STATUS", "connectionStatus": {"status": "FAILED", "message": "Input currency BTC is invalid. Please input one of the following currencies: {'DKK', 'USD', 'CZK', 'BGN', 'JPY'}"}}
```

While developing, we recommend storing configs which contain secrets in `secrets/config.json` because the `secrets` directory is gitignored by default.
While developing, we recommend storing configs which contain secrets in `secrets/config.json`
because the `secrets` directory is gitignored by default.
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,15 @@ $ cd airbyte-integrations/connector-templates/generator # assumes you are starti
$ ./generate.sh
```

This will bring up an interactive helper application. Use the arrow keys to pick a template from the list. Select the `Python HTTP API Source` template and then input the name of your connector. The application will create a new directory in airbyte/airbyte-integrations/connectors/ with the name of your new connector.
This will bring up an interactive helper application. Use the arrow keys to pick a template from the
list. Select the `Python HTTP API Source` template and then input the name of your connector. The
application will create a new directory in airbyte/airbyte-integrations/connectors/ with the name of
your new connector.

For this walk-through we will refer to our source as `python-http-example`. The finalized source code for this tutorial can be found [here](https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/source-python-http-tutorial).

The source we will build in this tutorial will pull data from the [Rates API](https://exchangeratesapi.io/), a free and open API which documents historical exchange rates for fiat currencies.
For this walk-through we will refer to our source as `python-http-example`. The finalized source
code for this tutorial can be found
[here](https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/source-python-http-tutorial).

The source we will build in this tutorial will pull data from the
[Rates API](https://exchangeratesapi.io/), a free and open API which documents historical exchange
rates for fiat currencies.
Loading

0 comments on commit e4ccffb

Please sign in to comment.