diff --git a/docs/connector-development/tutorials/building-a-java-destination.md b/docs/connector-development/tutorials/building-a-java-destination.md index fb91b4f52c1f..d2cb3f1c0bcc 100644 --- a/docs/connector-development/tutorials/building-a-java-destination.md +++ b/docs/connector-development/tutorials/building-a-java-destination.md @@ -2,24 +2,26 @@ ## Summary -This article provides a checklist for how to create a Java destination. Each step in the checklist has a link to a more detailed explanation below. +This article provides a checklist for how to create a Java destination. Each step in the checklist +has a link to a more detailed explanation below. ## Requirements -Docker and Java with the versions listed in the [tech stack section](../../understanding-airbyte/tech-stack.md). +Docker and Java with the versions listed in the +[tech stack section](../../understanding-airbyte/tech-stack.md). ## Checklist ### Creating a destination -* Step 1: Create the destination using the template generator -* Step 2: Build the newly generated destination -* Step 3: Implement `spec` to define the configuration required to run the connector -* Step 4: Implement `check` to provide a way to validate configurations provided to the connector -* Step 5: Implement `write` to write data to the destination -* Step 6: Set up Acceptance Tests -* Step 7: Write unit tests or integration tests -* Step 8: Update the docs \(in `docs/integrations/destinations/.md`\) +- Step 1: Create the destination using the template generator +- Step 2: Build the newly generated destination +- Step 3: Implement `spec` to define the configuration required to run the connector +- Step 4: Implement `check` to provide a way to validate configurations provided to the connector +- Step 5: Implement `write` to write data to the destination +- Step 6: Set up Acceptance Tests +- Step 7: Write unit tests or integration tests +- Step 8: Update the docs \(in `docs/integrations/destinations/.md`\) :::info @@ -29,7 +31,8 @@ All `./gradlew` commands must be run from the root of the airbyte project. :::info -If you need help with any step of the process, feel free to submit a PR with your progress and any questions you have, or ask us on [slack](https://slack.airbyte.io). +If you need help with any step of the process, feel free to submit a PR with your progress and any +questions you have, or ask us on [slack](https://slack.airbyte.io). ::: @@ -44,7 +47,9 @@ $ cd airbyte-integrations/connector-templates/generator # assumes you are starti $ ./generate.sh ``` -Select the `Java Destination` template and then input the name of your connector. We'll refer to the destination as `-destination` in this tutorial, but you should replace `` with the actual name you used for your connector e.g: `BigQueryDestination` or `bigquery-destination`. +Select the `Java Destination` template and then input the name of your connector. We'll refer to the +destination as `-destination` in this tutorial, but you should replace `` with the +actual name you used for your connector e.g: `BigQueryDestination` or `bigquery-destination`. ### Step 2: Build the newly generated destination @@ -55,11 +60,14 @@ You can build the destination by running: ./gradlew :airbyte-integrations:connectors:destination-:build ``` -This compiles the Java code for your destination and builds a Docker image with the connector. At this point, we haven't implemented anything of value yet, but once we do, you'll use this command to compile your code and Docker image. +This compiles the Java code for your destination and builds a Docker image with the connector. At +this point, we haven't implemented anything of value yet, but once we do, you'll use this command to +compile your code and Docker image. :::info -Airbyte uses Gradle to manage Java dependencies. To add dependencies for your connector, manage them in the `build.gradle` file inside your connector's directory. +Airbyte uses Gradle to manage Java dependencies. To add dependencies for your connector, manage them +in the `build.gradle` file inside your connector's directory. ::: @@ -67,38 +75,52 @@ Airbyte uses Gradle to manage Java dependencies. To add dependencies for your co We recommend the following ways of iterating on your connector as you're making changes: -* Test-driven development \(TDD\) in Java -* Test-driven development \(TDD\) using Airbyte's Acceptance Tests -* Directly running the docker image +- Test-driven development \(TDD\) in Java +- Test-driven development \(TDD\) using Airbyte's Acceptance Tests +- Directly running the docker image #### Test-driven development in Java -This should feel like a standard flow for a Java developer: you make some code changes then run java tests against them. You can do this directly in your IDE, but you can also run all unit tests via Gradle by running the command to build the connector: +This should feel like a standard flow for a Java developer: you make some code changes then run java +tests against them. You can do this directly in your IDE, but you can also run all unit tests via +Gradle by running the command to build the connector: ```text ./gradlew :airbyte-integrations:connectors:destination-:build ``` -This will build the code and run any unit tests. This approach is great when you are testing local behaviors and writing unit tests. +This will build the code and run any unit tests. This approach is great when you are testing local +behaviors and writing unit tests. #### TDD using acceptance tests & integration tests -Airbyte provides a standard test suite \(dubbed "Acceptance Tests"\) that runs against every destination connector. They are "free" baseline tests to ensure the basic functionality of the destination. When developing a connector, you can simply run the tests between each change and use the feedback to guide your development. +Airbyte provides a standard test suite \(dubbed "Acceptance Tests"\) that runs against every +destination connector. They are "free" baseline tests to ensure the basic functionality of the +destination. When developing a connector, you can simply run the tests between each change and use +the feedback to guide your development. -If you want to try out this approach, check out Step 6 which describes what you need to do to set up the acceptance Tests for your destination. +If you want to try out this approach, check out Step 6 which describes what you need to do to set up +the acceptance Tests for your destination. -The nice thing about this approach is that you are running your destination exactly as Airbyte will run it in the CI. The downside is that the tests do not run very quickly. As such, we recommend this iteration approach only once you've implemented most of your connector and are in the finishing stages of implementation. Note that Acceptance Tests are required for every connector supported by Airbyte, so you should make sure to run them a couple of times while iterating to make sure your connector is compatible with Airbyte. +The nice thing about this approach is that you are running your destination exactly as Airbyte will +run it in the CI. The downside is that the tests do not run very quickly. As such, we recommend this +iteration approach only once you've implemented most of your connector and are in the finishing +stages of implementation. Note that Acceptance Tests are required for every connector supported by +Airbyte, so you should make sure to run them a couple of times while iterating to make sure your +connector is compatible with Airbyte. #### Directly running the destination using Docker -If you want to run your destination exactly as it will be run by Airbyte \(i.e. within a docker container\), you can use the following commands from the connector module directory \(`airbyte-integrations/connectors/destination-`\): +If you want to run your destination exactly as it will be run by Airbyte \(i.e. within a docker +container\), you can use the following commands from the connector module directory +\(`airbyte-integrations/connectors/destination-`\): ```text # First build the container ./gradlew :airbyte-integrations:connectors:destination-:build # Then use the following commands to run it -# Runs the "spec" command, used to find out what configurations are needed to run a connector +# Runs the "spec" command, used to find out what configurations are needed to run a connector docker run --rm airbyte/destination-:dev spec # Runs the "check" command, used to validate if the input configurations are valid @@ -108,54 +130,72 @@ docker run --rm -v $(pwd)/secrets:/secrets airbyte/destination-:dev check docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/sample_files:/sample_files airbyte/destination-:dev write --config /secrets/config.json --catalog /sample_files/configured_catalog.json ``` -Note: Each time you make a change to your implementation you need to re-build the connector image via `./gradlew :airbyte-integrations:connectors:destination-:build`. +Note: Each time you make a change to your implementation you need to re-build the connector image +via `./gradlew :airbyte-integrations:connectors:destination-:build`. -The nice thing about this approach is that you are running your destination exactly as it will be run by Airbyte. The tradeoff is that iteration is slightly slower, because you need to re-build the connector between each change. +The nice thing about this approach is that you are running your destination exactly as it will be +run by Airbyte. The tradeoff is that iteration is slightly slower, because you need to re-build the +connector between each change. #### Handling Exceptions -In order to best propagate user-friendly error messages and log error information to the platform, the [Airbyte Protocol](../../understanding-airbyte/airbyte-protocol.md#The Airbyte Protocol) implements AirbyteTraceMessage. +In order to best propagate user-friendly error messages and log error information to the platform, +the [Airbyte Protocol](../../understanding-airbyte/airbyte-protocol.md#The Airbyte Protocol) +implements AirbyteTraceMessage. -We recommend using AirbyteTraceMessages for known errors, as in these cases you can likely offer the user a helpful message as to what went wrong and suggest how they can resolve it. +We recommend using AirbyteTraceMessages for known errors, as in these cases you can likely offer the +user a helpful message as to what went wrong and suggest how they can resolve it. + +Airbyte provides a static utility class, `io.airbyte.integrations.base.AirbyteTraceMessageUtility`, +to give you a clear and straight-forward way to emit these AirbyteTraceMessages. Example usage: -Airbyte provides a static utility class, `io.airbyte.integrations.base.AirbyteTraceMessageUtility`, to give you a clear and straight-forward way to emit these AirbyteTraceMessages. Example usage: ```java try { // some connector code responsible for doing X -} +} catch (ExceptionIndicatingIncorrectCredentials credErr) { AirbyteTraceMessageUtility.emitConfigErrorTrace( credErr, "Connector failed due to incorrect credentials while doing X. Please check your connection is using valid credentials.") throw credErr -} +} catch (ExceptionIndicatingKnownErrorY knownErr) { AirbyteTraceMessageUtility.emitSystemErrorTrace( knownErr, "Connector failed because of reason Y while doing X. Please check/do/make ... to resolve this.") throw knownErr -} +} catch (Exception e) { AirbyteTraceMessageUtility.emitSystemErrorTrace( e, "Connector failed while doing X. Possible reasons for this could be ...") - throw e + throw e } ``` Note the two different error trace methods. -- Where possible `emitConfigErrorTrace` should be used when we are certain the issue arises from a problem with the user's input configuration, e.g. invalid credentials. + +- Where possible `emitConfigErrorTrace` should be used when we are certain the issue arises from a + problem with the user's input configuration, e.g. invalid credentials. - For everything else or if unsure, use `emitSystemErrorTrace`. ### Step 3: Implement `spec` -Each destination contains a specification written in JsonSchema that describes its inputs. Defining the specification is a good place to start when developing your destination. Check out the documentation [here](https://json-schema.org/) to learn the syntax. Here's [an example](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-postgres/src/main/resources/spec.json) of what the `spec.json` looks like for the postgres destination. +Each destination contains a specification written in JsonSchema that describes its inputs. Defining +the specification is a good place to start when developing your destination. Check out the +documentation [here](https://json-schema.org/) to learn the syntax. Here's +[an example](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-postgres/src/main/resources/spec.json) +of what the `spec.json` looks like for the postgres destination. -Your generated template should have the spec file in `airbyte-integrations/connectors/destination-/src/main/resources/spec.json`. The generated connector will take care of reading this file and converting it to the correct output. Edit it and you should be done with this step. +Your generated template should have the spec file in +`airbyte-integrations/connectors/destination-/src/main/resources/spec.json`. The generated +connector will take care of reading this file and converting it to the correct output. Edit it and +you should be done with this step. -For more details on what the spec is, you can read about the Airbyte Protocol [here](../../understanding-airbyte/airbyte-protocol.md). +For more details on what the spec is, you can read about the Airbyte Protocol +[here](../../understanding-airbyte/airbyte-protocol.md). See the `spec` operation in action: ```bash -# First build the connector +# First build the connector ./gradlew :airbyte-integrations:connectors:destination-:build # Run the spec operation @@ -164,11 +204,17 @@ docker run --rm airbyte/destination-:dev spec ### Step 4: Implement `check` -The check operation accepts a JSON object conforming to the `spec.json`. In other words if the `spec.json` said that the destination requires a `username` and `password` the config object might be `{ "username": "airbyte", "password": "password123" }`. It returns a json object that reports, given the credentials in the config, whether we were able to connect to the destination. +The check operation accepts a JSON object conforming to the `spec.json`. In other words if the +`spec.json` said that the destination requires a `username` and `password` the config object might +be `{ "username": "airbyte", "password": "password123" }`. It returns a json object that reports, +given the credentials in the config, whether we were able to connect to the destination. -While developing, we recommend storing any credentials in `secrets/config.json`. Any `secrets` directory in the Airbyte repo is gitignored by default. +While developing, we recommend storing any credentials in `secrets/config.json`. Any `secrets` +directory in the Airbyte repo is gitignored by default. -Implement the `check` method in the generated file `Destination.java`. Here's an [example implementation](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-bigquery/src/main/java/io/airbyte/integrations/destination/bigquery/BigQueryDestination.java#L94) from the BigQuery destination. +Implement the `check` method in the generated file `Destination.java`. Here's an +[example implementation](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-bigquery/src/main/java/io/airbyte/integrations/destination/bigquery/BigQueryDestination.java#L94) +from the BigQuery destination. Verify that the method is working by placing your config in `secrets/config.json` then running: @@ -182,41 +228,66 @@ docker run -v $(pwd)/secrets:/secrets --rm airbyte/destination-:dev check ### Step 5: Implement `write` -The `write` operation is the main workhorse of a destination connector: it reads input data from the source and writes it to the underlying destination. It takes as input the config file used to run the connector as well as the configured catalog: the file used to describe the schema of the incoming data and how it should be written to the destination. Its "output" is two things: +The `write` operation is the main workhorse of a destination connector: it reads input data from the +source and writes it to the underlying destination. It takes as input the config file used to run +the connector as well as the configured catalog: the file used to describe the schema of the +incoming data and how it should be written to the destination. Its "output" is two things: 1. Data written to the underlying destination -2. `AirbyteMessage`s of type `AirbyteStateMessage`, written to stdout to indicate which records have been written so far during a sync. It's important to output these messages when possible in order to avoid re-extracting messages from the source. See the [write operation protocol reference](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol#write) for more information. +2. `AirbyteMessage`s of type `AirbyteStateMessage`, written to stdout to indicate which records have + been written so far during a sync. It's important to output these messages when possible in order + to avoid re-extracting messages from the source. See the + [write operation protocol reference](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol#write) + for more information. -To implement the `write` Airbyte operation, implement the `getConsumer` method in your generated `Destination.java` file. Here are some example implementations from different destination conectors: +To implement the `write` Airbyte operation, implement the `getConsumer` method in your generated +`Destination.java` file. Here are some example implementations from different destination +conectors: -* [BigQuery](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-bigquery/src/main/java/io/airbyte/integrations/destination/bigquery/BigQueryDestination.java#L188) -* [Google Pubsub](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-pubsub/src/main/java/io/airbyte/integrations/destination/pubsub/PubsubDestination.java#L98) -* [Local CSV](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-csv/src/main/java/io/airbyte/integrations/destination/csv/CsvDestination.java#L90) -* [Postgres](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-postgres/src/main/java/io/airbyte/integrations/destination/postgres/PostgresDestination.java) +- [BigQuery](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-bigquery/src/main/java/io/airbyte/integrations/destination/bigquery/BigQueryDestination.java#L188) +- [Google Pubsub](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-pubsub/src/main/java/io/airbyte/integrations/destination/pubsub/PubsubDestination.java#L98) +- [Local CSV](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-csv/src/main/java/io/airbyte/integrations/destination/csv/CsvDestination.java#L90) +- [Postgres](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-postgres/src/main/java/io/airbyte/integrations/destination/postgres/PostgresDestination.java) :::info -The Postgres destination leverages the `AbstractJdbcDestination` superclass which makes it extremely easy to create a destination for a database or data warehouse if it has a compatible JDBC driver. If the destination you are implementing has a JDBC driver, be sure to check out `AbstractJdbcDestination`. +The Postgres destination leverages the `AbstractJdbcDestination` superclass which makes it extremely +easy to create a destination for a database or data warehouse if it has a compatible JDBC driver. If +the destination you are implementing has a JDBC driver, be sure to check out +`AbstractJdbcDestination`. ::: -For a brief overview on the Airbyte catalog check out [the Beginner's Guide to the Airbyte Catalog](../../understanding-airbyte/beginners-guide-to-catalog.md). +For a brief overview on the Airbyte catalog check out +[the Beginner's Guide to the Airbyte Catalog](../../understanding-airbyte/beginners-guide-to-catalog.md). ### Step 6: Set up Acceptance Tests -The Acceptance Tests are a set of tests that run against all destinations. These tests are run in the Airbyte CI to prevent regressions and verify a baseline of functionality. The test cases are contained and documented in the [following file](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/bases/standard-destination-test/src/main/java/io/airbyte/integrations/standardtest/destination/DestinationAcceptanceTest.java). +The Acceptance Tests are a set of tests that run against all destinations. These tests are run in +the Airbyte CI to prevent regressions and verify a baseline of functionality. The test cases are +contained and documented in the +[following file](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/bases/standard-destination-test/src/main/java/io/airbyte/integrations/standardtest/destination/DestinationAcceptanceTest.java). -To setup acceptance Tests for your connector, follow the `TODO`s in the generated file `DestinationAcceptanceTest.java`. Once setup, you can run the tests using `./gradlew :airbyte-integrations:connectors:destination-:integrationTest`. Make sure to run this command from the Airbyte repository root. +To setup acceptance Tests for your connector, follow the `TODO`s in the generated file +`DestinationAcceptanceTest.java`. Once setup, you can run the tests using +`./gradlew :airbyte-integrations:connectors:destination-:integrationTest`. Make sure to run +this command from the Airbyte repository root. ### Step 7: Write unit tests and/or integration tests -The Acceptance Tests are meant to cover the basic functionality of a destination. Think of it as the bare minimum required for us to add a destination to Airbyte. You should probably add some unit testing or custom integration testing in case you need to test additional functionality of your destination. +The Acceptance Tests are meant to cover the basic functionality of a destination. Think of it as the +bare minimum required for us to add a destination to Airbyte. You should probably add some unit +testing or custom integration testing in case you need to test additional functionality of your +destination. #### Step 8: Update the docs -Each connector has its own documentation page. By convention, that page should have the following path: in `docs/integrations/destinations/.md`. For the documentation to get packaged with the docs, make sure to add a link to it in `docs/SUMMARY.md`. You can pattern match doing that from existing connectors. +Each connector has its own documentation page. By convention, that page should have the following +path: in `docs/integrations/destinations/.md`. For the documentation to get +packaged with the docs, make sure to add a link to it in `docs/SUMMARY.md`. You can pattern match +doing that from existing connectors. ## Wrapping up -Well done on making it this far! If you'd like your connector to ship with Airbyte by default, create a PR against the Airbyte repo and we'll work with you to get it across the finish line. - +Well done on making it this far! If you'd like your connector to ship with Airbyte by default, +create a PR against the Airbyte repo and we'll work with you to get it across the finish line. diff --git a/docs/connector-development/tutorials/building-a-python-source.md b/docs/connector-development/tutorials/building-a-python-source.md index 49a12872363b..e83aeec9d0ac 100644 --- a/docs/connector-development/tutorials/building-a-python-source.md +++ b/docs/connector-development/tutorials/building-a-python-source.md @@ -2,15 +2,20 @@ ## Summary -This article provides a checklist for how to create a python source. Each step in the checklist has a link to a more detailed explanation below. +This article provides a checklist for how to create a python source. Each step in the checklist has +a link to a more detailed explanation below. ## Requirements -Docker, Python, and Java with the versions listed in the [tech stack section](../../understanding-airbyte/tech-stack.md). +Docker, Python, and Java with the versions listed in the +[tech stack section](../../understanding-airbyte/tech-stack.md). :::info -All the commands below assume that `python` points to a version of python >3.7. On some systems, `python` points to a Python2 installation and `python3` points to Python3. If this is the case on your machine, substitute all `python` commands in this guide with `python3` . Otherwise, make sure to install Python 3 before beginning. +All the commands below assume that `python` points to a version of python >3.7. On some systems, +`python` points to a Python2 installation and `python3` points to Python3. If this is the case on +your machine, substitute all `python` commands in this guide with `python3` . Otherwise, make sure +to install Python 3 before beginning. ::: @@ -18,18 +23,21 @@ All the commands below assume that `python` points to a version of python >3. ### Creating a Source -* Step 1: Create the source using template -* Step 2: Build the newly generated source -* Step 3: Set up your Airbyte development environment -* Step 4: Implement `spec` \(and define the specification for the source `airbyte-integrations/connectors/source-/spec.yaml`\) -* Step 5: Implement `check` -* Step 6: Implement `discover` -* Step 7: Implement `read` -* Step 8: Set up Connector Acceptance Tests -* Step 9: Write unit tests or integration tests -* Step 10: Update the `README.md` \(If API credentials are required to run the integration, please document how they can be obtained or link to a how-to guide.\) -* Step 11: Update the `metadata.yaml` file with accurate information about your connector. These metadata will be used to add the connector to Airbyte's connector registry. -* Step 12: Add docs \(in `docs/integrations/sources/.md`\) +- Step 1: Create the source using template +- Step 2: Build the newly generated source +- Step 3: Set up your Airbyte development environment +- Step 4: Implement `spec` \(and define the specification for the source + `airbyte-integrations/connectors/source-/spec.yaml`\) +- Step 5: Implement `check` +- Step 6: Implement `discover` +- Step 7: Implement `read` +- Step 8: Set up Connector Acceptance Tests +- Step 9: Write unit tests or integration tests +- Step 10: Update the `README.md` \(If API credentials are required to run the integration, please + document how they can be obtained or link to a how-to guide.\) +- Step 11: Update the `metadata.yaml` file with accurate information about your connector. These + metadata will be used to add the connector to Airbyte's connector registry. +- Step 12: Add docs \(in `docs/integrations/sources/.md`\) :::info Each step of the Creating a Source checklist is explained in more detail below. @@ -41,14 +49,24 @@ All `./gradlew` commands must be run from the root of the airbyte project. ### Submitting a Source to Airbyte -* If you need help with any step of the process, feel free to submit a PR with your progress and any questions you have. -* Submit a PR. -* To run integration tests, Airbyte needs access to a test account/environment. Coordinate with an Airbyte engineer \(via the PR\) to add test credentials so that we can run tests for the integration in the CI. \(We will create our own test account once you let us know what source we need to create it for.\) -* Once the config is stored in Github Secrets, edit `.github/workflows/test-command.yml` and `.github/workflows/publish-command.yml` to inject the config into the build environment. -* Edit the `airbyte/tools/bin/ci_credentials.sh` script to pull the script from the build environment and write it to `secrets/config.json` during the build. +- If you need help with any step of the process, feel free to submit a PR with your progress and any + questions you have. +- Submit a PR. +- To run integration tests, Airbyte needs access to a test account/environment. Coordinate with an + Airbyte engineer \(via the PR\) to add test credentials so that we can run tests for the + integration in the CI. \(We will create our own test account once you let us know what source we + need to create it for.\) +- Once the config is stored in Github Secrets, edit `.github/workflows/test-command.yml` and + `.github/workflows/publish-command.yml` to inject the config into the build environment. +- Edit the `airbyte/tools/bin/ci_credentials.sh` script to pull the script from the build + environment and write it to `secrets/config.json` during the build. :::info -If you have a question about a step the Submitting a Source to Airbyte checklist include it in your PR or ask it on [#help-connector-development channel on Slack](https://airbytehq.slack.com/archives/C027KKE4BCZ). + +If you have a question about a step the Submitting a Source to Airbyte checklist include it +in your PR or ask it on +[#help-connector-development channel on Slack](https://airbytehq.slack.com/archives/C027KKE4BCZ). + ::: ## Explaining Each Step @@ -62,7 +80,8 @@ $ cd airbyte-integrations/connector-templates/generator # assumes you are starti $ ./generate.sh ``` -Select the `python` template and then input the name of your connector. For this walk through we will refer to our source as `example-python` +Select the `python` template and then input the name of your connector. For this walk through we +will refer to our source as `example-python` ### Step 2: Install the newly generated source @@ -73,40 +92,58 @@ cd airbyte-integrations/connectors/source- poetry install ``` -This step sets up the initial python environment. - ### Step 3: Set up your Airbyte development environment -The generator creates a file `source_/source.py`. This will be where you implement the logic for your source. The templated `source.py` contains extensive comments explaining each method that needs to be implemented. Briefly here is an overview of each of these methods. +The generator creates a file `source_/source.py`. This will be where you implement the +logic for your source. The templated `source.py` contains extensive comments explaining each method +that needs to be implemented. Briefly here is an overview of each of these methods. 1. `spec`: declares the user-provided credentials or configuration needed to run the connector -2. `check`: tests if with the user-provided configuration the connector can connect with the underlying data source. +2. `check`: tests if with the user-provided configuration the connector can connect with the + underlying data source. 3. `discover`: declares the different streams of data that this connector can output 4. `read`: reads data from the underlying data source \(The stock ticker API\) #### Dependencies -Python dependencies for your source should be declared in `airbyte-integrations/connectors/source-/setup.py` in the `install_requires` field. You will notice that a couple of Airbyte dependencies are already declared there. Do not remove these; they give your source access to the helper interface that is provided by the generator. +Python dependencies for your source should be declared in +`airbyte-integrations/connectors/source-/setup.py` in the `install_requires` field. You +will notice that a couple of Airbyte dependencies are already declared there. Do not remove these; +they give your source access to the helper interface that is provided by the generator. -You may notice that there is a `requirements.txt` in your source's directory as well. Do not touch this. It is autogenerated and used to provide Airbyte dependencies. All your dependencies should be declared in `setup.py`. +You may notice that there is a `requirements.txt` in your source's directory as well. Do not touch +this. It is autogenerated and used to provide Airbyte dependencies. All your dependencies should be +declared in `setup.py`. #### Development Environment -The commands we ran above created a virtual environment for your source. If you want your IDE to auto complete and resolve dependencies properly, point it at the virtual env `airbyte-integrations/connectors/source-/.venv`. Also anytime you change the dependencies in the `setup.py` make sure to re-run the build command. The build system will handle installing all dependencies in the `setup.py` into the virtual environment. +The commands we ran above created a virtual environment for your source. If you want your IDE to +auto complete and resolve dependencies properly, point it at the virtual env +`airbyte-integrations/connectors/source-/.venv`. Also anytime you change the +dependencies in the `setup.py` make sure to re-run the build command. The build system will handle +installing all dependencies in the `setup.py` into the virtual environment. -Pretty much all it takes to create a source is to implement the `Source` interface. The template fills in a lot of information for you and has extensive docstrings describing what you need to do to implement each method. The next 4 steps are just implementing that interface. +Pretty much all it takes to create a source is to implement the `Source` interface. The template +fills in a lot of information for you and has extensive docstrings describing what you need to do to +implement each method. The next 4 steps are just implementing that interface. :::info -All logging should be done through the `logger` object passed into each method. Otherwise, logs will not be shown in the Airbyte UI. + +All logging should be done through the `logger` object passed into each method. Otherwise, +logs will not be shown in the Airbyte UI. + ::: #### Iterating on your implementation -Everyone develops differently but here are 3 ways that we recommend iterating on a source. Consider using whichever one matches your style. +Everyone develops differently but here are 3 ways that we recommend iterating on a source. Consider +using whichever one matches your style. **Run the source using python** -You'll notice in your source's directory that there is a python file called `main.py`. This file exists as convenience for development. You can call it from within the virtual environment mentioned above `. ./.venv/bin/activate` to test out that your source works. +You'll notice in your source's directory that there is a python file called `main.py`. This file +exists as convenience for development. You can call it from within the virtual environment mentioned +above `. ./.venv/bin/activate` to test out that your source works. ```bash # from airbyte-integrations/connectors/source- @@ -116,30 +153,38 @@ poetry run source- discover --config secrets/config.json poetry run source- read --config secrets/config.json --catalog sample_files/configured_catalog.json ``` -The nice thing about this approach is that you can iterate completely within in python. The downside is that you are not quite running your source as it will actually be run by Airbyte. Specifically you're not running it from within the docker container that will house it. - +The nice thing about this approach is that you can iterate completely within in python. The downside +is that you are not quite running your source as it will actually be run by Airbyte. Specifically +you're not running it from within the docker container that will house it. **Build the source docker image** -You have to build a docker image for your connector if you want to run your source exactly as it will be run by Airbyte. +You have to build a docker image for your connector if you want to run your source exactly as it +will be run by Airbyte. **Option A: Building the docker image with `airbyte-ci`** This is the preferred method for building and testing connectors. -If you want to open source your connector we encourage you to use our [`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md) tool to build your connector. -It will not use a Dockerfile but will build the connector image from our [base image](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/base_images/README.md) and use our internal build logic to build an image from your Python connector code. +If you want to open source your connector we encourage you to use our +[`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md) +tool to build your connector. It will not use a Dockerfile but will build the connector image from +our +[base image](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/base_images/README.md) +and use our internal build logic to build an image from your Python connector code. Running `airbyte-ci connectors --name source- build` will build your connector image. -Once the command is done, you will find your connector image in your local docker host: `airbyte/source-:dev`. - - +Once the command is done, you will find your connector image in your local docker host: +`airbyte/source-:dev`. **Option B: Building the docker image with a Dockerfile** -If you don't want to rely on `airbyte-ci` to build your connector, you can build the docker image using your own Dockerfile. This method is not preferred, and is not supported for certified connectors. +If you don't want to rely on `airbyte-ci` to build your connector, you can build the docker image +using your own Dockerfile. This method is not preferred, and is not supported for certified +connectors. -Create a `Dockerfile` in the root of your connector directory. The `Dockerfile` should look something like this: +Create a `Dockerfile` in the root of your connector directory. The `Dockerfile` should look +something like this: ```Dockerfile @@ -156,6 +201,7 @@ RUN pip install ./airbyte/integration_code Please use this as an example. This is not optimized. Build your image: + ```bash docker build . -t airbyte/source-example-python:dev ``` @@ -170,20 +216,28 @@ docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/sample_files:/sample_files ``` :::info -Each time you make a change to your implementation you need to re-build the connector image. This ensures the new python code is added into the docker container. + +Each time you make a change to your implementation you need to re-build the connector image. +This ensures the new python code is added into the docker container. + ::: -The nice thing about this approach is that you are running your source exactly as it will be run by Airbyte. The tradeoff is that iteration is slightly slower, because you need to re-build the connector between each change. +The nice thing about this approach is that you are running your source exactly as it will be run by +Airbyte. The tradeoff is that iteration is slightly slower, because you need to re-build the +connector between each change. **Detailed Debug Messages** -During development of your connector, you can enable the printing of detailed debug information during a sync by specifying the `--debug` flag. This will allow you to get a better picture of what is happening during each step of your sync. +During development of your connector, you can enable the printing of detailed debug information +during a sync by specifying the `--debug` flag. This will allow you to get a better picture of what +is happening during each step of your sync. ```bash poetry run source- read --config secrets/config.json --catalog sample_files/configured_catalog.json --debug ``` -In addition to the preset CDK debug statements, you can also emit custom debug information from your connector by introducing your own debug statements: +In addition to the preset CDK debug statements, you can also emit custom debug information from your +connector by introducing your own debug statements: ```python self.logger.debug( @@ -197,50 +251,87 @@ self.logger.debug( **TDD using acceptance tests & integration tests** -Airbyte provides an acceptance test suite that is run against every source. The objective of these tests is to provide some "free" tests that can sanity check that the basic functionality of the source works. One approach to developing your connector is to simply run the tests between each change and use the feedback from them to guide your development. +Airbyte provides an acceptance test suite that is run against every source. The objective of these +tests is to provide some "free" tests that can sanity check that the basic functionality of the +source works. One approach to developing your connector is to simply run the tests between each +change and use the feedback from them to guide your development. -If you want to try out this approach, check out Step 8 which describes what you need to do to set up the standard tests for your source. +If you want to try out this approach, check out Step 8 which describes what you need to do to set up +the standard tests for your source. -The nice thing about this approach is that you are running your source exactly as Airbyte will run it in the CI. The downside is that the tests do not run very quickly. +The nice thing about this approach is that you are running your source exactly as Airbyte will run +it in the CI. The downside is that the tests do not run very quickly. ### Step 4: Implement `spec` -Each source contains a specification that describes what inputs it needs in order for it to pull data. This file can be found in `airbyte-integrations/connectors/source-/spec.yaml`. This is a good place to start when developing your source. Using JsonSchema define what the inputs are \(e.g. username and password\). Here's [an example](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-stripe/source_stripe/spec.yaml) of what the `spec.yaml` looks like for the stripe source. +Each source contains a specification that describes what inputs it needs in order for it to pull +data. This file can be found in `airbyte-integrations/connectors/source-/spec.yaml`. +This is a good place to start when developing your source. Using JsonSchema define what the inputs +are \(e.g. username and password\). Here's +[an example](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-stripe/source_stripe/spec.yaml) +of what the `spec.yaml` looks like for the stripe source. -For more details on what the spec is, you can read about the Airbyte Protocol [here](../../understanding-airbyte/airbyte-protocol.md). +For more details on what the spec is, you can read about the Airbyte Protocol +[here](../../understanding-airbyte/airbyte-protocol.md). -The generated code that Airbyte provides, handles implementing the `spec` method for you. It assumes that there will be a file called `spec.yaml` in the same directory as `source.py`. If you have declared the necessary JsonSchema in `spec.yaml` you should be done with this step. +The generated code that Airbyte provides, handles implementing the `spec` method for you. It assumes +that there will be a file called `spec.yaml` in the same directory as `source.py`. If you have +declared the necessary JsonSchema in `spec.yaml` you should be done with this step. ### Step 5: Implement `check` -As described in the template code, this method takes in a json object called config that has the values described in the `spec.yaml` filled in. In other words if the `spec.yaml` said that the source requires a `username` and `password` the config object might be `{ "username": "airbyte", "password": "password123" }`. It returns a json object that reports, given the credentials in the config, whether we were able to connect to the source. For example, with the given credentials could the source connect to the database server. +As described in the template code, this method takes in a json object called config that has the +values described in the `spec.yaml` filled in. In other words if the `spec.yaml` said that the +source requires a `username` and `password` the config object might be +`{ "username": "airbyte", "password": "password123" }`. It returns a json object that reports, given +the credentials in the config, whether we were able to connect to the source. For example, with the +given credentials could the source connect to the database server. -While developing, we recommend storing this object in `secrets/config.json`. The `secrets` directory is gitignored by default. +While developing, we recommend storing this object in `secrets/config.json`. The `secrets` directory +is gitignored by default. ### Step 6: Implement `discover` -As described in the template code, this method takes in the same config object as `check`. It then returns a json object called a `catalog` that describes what data is available and metadata on what options are available for how to replicate it. +As described in the template code, this method takes in the same config object as `check`. It then +returns a json object called a `catalog` that describes what data is available and metadata on what +options are available for how to replicate it. -For a brief overview on the catalog check out [Beginner's Guide to the Airbyte Catalog](../../understanding-airbyte/beginners-guide-to-catalog.md). +For a brief overview on the catalog check out +[Beginner's Guide to the Airbyte Catalog](../../understanding-airbyte/beginners-guide-to-catalog.md). ### Step 7: Implement `read` -As described in the template code, this method takes in the same config object as the previous methods. It also takes in a "configured catalog". This object wraps the catalog emitted by the `discover` step and includes configuration on how the data should be replicated. For a brief overview on the configured catalog check out [Beginner's Guide to the Airbyte Catalog](../../understanding-airbyte/beginners-guide-to-catalog.md). It then returns a generator which returns each record in the stream. +As described in the template code, this method takes in the same config object as the previous +methods. It also takes in a "configured catalog". This object wraps the catalog emitted by the +`discover` step and includes configuration on how the data should be replicated. For a brief +overview on the configured catalog check out +[Beginner's Guide to the Airbyte Catalog](../../understanding-airbyte/beginners-guide-to-catalog.md). +It then returns a generator which returns each record in the stream. ### Step 8: Set up Connector Acceptance Tests (CATs) -The Connector Acceptance Tests are a set of tests that run against all sources. These tests are run in the Airbyte CI to prevent regressions. They also can help you sanity check that your source works as expected. The following [article](../testing-connectors/connector-acceptance-tests-reference.md) explains Connector Acceptance Tests and how to run them. +The Connector Acceptance Tests are a set of tests that run against all sources. These tests are run +in the Airbyte CI to prevent regressions. They also can help you sanity check that your source works +as expected. The following [article](../testing-connectors/connector-acceptance-tests-reference.md) +explains Connector Acceptance Tests and how to run them. You can run the tests using [`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md): `airbyte-ci connectors --name source- test --only-step=acceptance` :::info -In some rare cases we make exceptions and allow a source to not need to pass all the standard tests. If for some reason you think your source cannot reasonably pass one of the tests cases, reach out to us on github or slack, and we can determine whether there's a change we can make so that the test will pass or if we should skip that test for your source. + +In some rare cases we make exceptions and allow a source to not need to pass all the +standard tests. If for some reason you think your source cannot reasonably pass one of the tests +cases, reach out to us on github or slack, and we can determine whether there's a change we can make +so that the test will pass or if we should skip that test for your source. + ::: ### Step 9: Write unit tests and/or integration tests -The connector acceptance tests are meant to cover the basic functionality of a source. Think of it as the bare minimum required for us to add a source to Airbyte. In case you need to test additional functionality of your source, write unit or integration tests. +The connector acceptance tests are meant to cover the basic functionality of a source. Think of it +as the bare minimum required for us to add a source to Airbyte. In case you need to test additional +functionality of your source, write unit or integration tests. #### Unit Tests @@ -250,32 +341,49 @@ You can run the tests using `poetry run pytest tests/unit_tests` #### Integration Tests -Place any integration tests in the `integration_tests` directory such that they can be [discovered by pytest](https://docs.pytest.org/en/6.2.x/goodpractices.html#conventions-for-python-test-discovery). +Place any integration tests in the `integration_tests` directory such that they can be +[discovered by pytest](https://docs.pytest.org/en/6.2.x/goodpractices.html#conventions-for-python-test-discovery). You can run the tests using `poetry run pytest tests/integration_tests` ### Step 10: Update the `README.md` -The template fills in most of the information for the readme for you. Unless there is a special case, the only piece of information you need to add is how one can get the credentials required to run the source. e.g. Where one can find the relevant API key, etc. +The template fills in most of the information for the readme for you. Unless there is a special +case, the only piece of information you need to add is how one can get the credentials required to +run the source. e.g. Where one can find the relevant API key, etc. ### Step 11: Add the connector to the API/UI + There are multiple ways to use the connector you have built. -If you are self hosting Airbyte (OSS) you are able to use the Custom Connector feature. This feature allows you to run any Docker container that implements the Airbye protocol. You can read more about it [here](https://docs.airbyte.com/integrations/custom-connectors/). +If you are self hosting Airbyte (OSS) you are able to use the Custom Connector feature. This feature +allows you to run any Docker container that implements the Airbye protocol. You can read more about +it [here](https://docs.airbyte.com/integrations/custom-connectors/). -If you are using Airbyte Cloud (or OSS), you can submit a PR to add your connector to the Airbyte repository. Once the PR is merged, the connector will be available to all Airbyte Cloud users. You can read more about it [here](https://docs.airbyte.com/contributing-to-airbyte/submit-new-connector). +If you are using Airbyte Cloud (or OSS), you can submit a PR to add your connector to the Airbyte +repository. Once the PR is merged, the connector will be available to all Airbyte Cloud users. You +can read more about it +[here](https://docs.airbyte.com/contributing-to-airbyte/submit-new-connector). Note that when submitting an Airbyte connector, you will need to ensure that -1. The connector passes the CAT suite. See [Set up Connector Acceptance Tests](#step-8-set-up-connector-acceptance-tests-\(cats\)). -2. The metadata.yaml file (created by our generator) is filed out and valid. See [Connector Metadata File](https://docs.airbyte.com/connector-development/connector-metadata-file). -3. You have created appropriate documentation for the connector. See [Add docs](#step-12-add-docs). +1. The connector passes the CAT suite. See + [Set up Connector Acceptance Tests](<#step-8-set-up-connector-acceptance-tests-(cats)>). +2. The metadata.yaml file (created by our generator) is filed out and valid. See + [Connector Metadata File](https://docs.airbyte.com/connector-development/connector-metadata-file). +3. You have created appropriate documentation for the connector. See [Add docs](#step-12-add-docs). ### Step 12: Add docs -Each connector has its own documentation page. By convention, that page should have the following path: in `docs/integrations/sources/.md`. For the documentation to get packaged with the docs, make sure to add a link to it in `docs/SUMMARY.md`. You can pattern match doing that from existing connectors. +Each connector has its own documentation page. By convention, that page should have the following +path: in `docs/integrations/sources/.md`. For the documentation to get packaged with +the docs, make sure to add a link to it in `docs/SUMMARY.md`. You can pattern match doing that from +existing connectors. ## Related tutorials -For additional examples of how to use the Python CDK to build an Airbyte source connector, see the following tutorials: + +For additional examples of how to use the Python CDK to build an Airbyte source connector, see the +following tutorials: + - [Python CDK Speedrun: Creating a Source](https://docs.airbyte.com/connector-development/tutorials/cdk-speedrun) - [Build a connector to extract data from the Webflow API](https://airbyte.com/tutorials/extract-data-from-the-webflow-api) diff --git a/docs/connector-development/tutorials/cdk-speedrun.md b/docs/connector-development/tutorials/cdk-speedrun.md index d9fc6bc82ffd..35a9543d2e53 100644 --- a/docs/connector-development/tutorials/cdk-speedrun.md +++ b/docs/connector-development/tutorials/cdk-speedrun.md @@ -2,9 +2,11 @@ ## CDK Speedrun \(HTTP API Source Creation Any Route\) -This is a blazing fast guide to building an HTTP source connector. Think of it as the TL;DR version of [this tutorial.](cdk-tutorial-python-http/getting-started.md) +This is a blazing fast guide to building an HTTP source connector. Think of it as the TL;DR version +of [this tutorial.](cdk-tutorial-python-http/getting-started.md) -If you are a visual learner and want to see a video version of this guide going over each part in detail, check it out below. +If you are a visual learner and want to see a video version of this guide going over each part in +detail, check it out below. [A speedy CDK overview.](https://www.youtube.com/watch?v=kJ3hLoNfz_E) @@ -19,9 +21,9 @@ If you are a visual learner and want to see a video version of this guide going ```bash # # clone the repo if you havent already -# git clone --depth 1 https://github.com/airbytehq/airbyte/ +# git clone --depth 1 https://github.com/airbytehq/airbyte/ # cd airbyte # start from repo root -cd airbyte-integrations/connector-templates/generator +cd airbyte-integrations/connector-templates/generator ./generate.sh ``` @@ -40,7 +42,8 @@ poetry install cd source_python_http_example ``` -We're working with the PokeAPI, so we need to define our input schema to reflect that. Open the `spec.yaml` file here and replace it with: +We're working with the PokeAPI, so we need to define our input schema to reflect that. Open the +`spec.yaml` file here and replace it with: ```yaml documentationUrl: https://docs.airbyte.com/integrations/sources/pokeapi @@ -61,9 +64,14 @@ connectionSpecification: - snorlax ``` -As you can see, we have one input to our input schema, which is `pokemon_name`, which is required. Normally, input schemas will contain information such as API keys and client secrets that need to get passed down to all endpoints or streams. +As you can see, we have one input to our input schema, which is `pokemon_name`, which is required. +Normally, input schemas will contain information such as API keys and client secrets that need to +get passed down to all endpoints or streams. -Ok, let's write a function that checks the inputs we just defined. Nuke the `source.py` file. Now add this code to it. For a crucial time skip, we're going to define all the imports we need in the future here. Also note that your `AbstractSource` class name must be a camel-cased version of the name you gave in the generation phase. In our case, this is `SourcePythonHttpExample`. +Ok, let's write a function that checks the inputs we just defined. Nuke the `source.py` file. Now +add this code to it. For a crucial time skip, we're going to define all the imports we need in the +future here. Also note that your `AbstractSource` class name must be a camel-cased version of the +name you gave in the generation phase. In our case, this is `SourcePythonHttpExample`. ```python from typing import Any, Iterable, List, Mapping, MutableMapping, Optional, Tuple @@ -94,7 +102,9 @@ class SourcePythonHttpExample(AbstractSource): return [Pokemon(pokemon_name=config["pokemon_name"])] ``` -Create a new file called `pokemon_list.py` at the same level. This will handle input validation for us so that we don't input invalid Pokemon. Let's start with a very limited list - any Pokemon not included in this list will get rejected. +Create a new file called `pokemon_list.py` at the same level. This will handle input validation for +us so that we don't input invalid Pokemon. Let's start with a very limited list - any Pokemon not +included in this list will get rejected. ```python """ @@ -133,7 +143,8 @@ Expected output: ### Define your Stream -In your `source.py` file, add this `Pokemon` class. This stream represents an endpoint you want to hit, which in our case, is the single [Pokemon endpoint](https://pokeapi.co/docs/v2#pokemon). +In your `source.py` file, add this `Pokemon` class. This stream represents an endpoint you want to +hit, which in our case, is the single [Pokemon endpoint](https://pokeapi.co/docs/v2#pokemon). ```python class Pokemon(HttpStream): @@ -151,7 +162,7 @@ class Pokemon(HttpStream): return None def path( - self, + self, ) -> str: return "" # TODO @@ -161,9 +172,16 @@ class Pokemon(HttpStream): return None # TODO ``` -Now download [this file](./cdk-speedrun-assets/pokemon.json). Name it `pokemon.json` and place it in `/source_python_http_example/schemas`. +Now download [this file](./cdk-speedrun-assets/pokemon.json). Name it `pokemon.json` and place it in +`/source_python_http_example/schemas`. -This file defines your output schema for every endpoint that you want to implement. Normally, this will likely be the most time-consuming section of the connector development process, as it requires defining the output of the endpoint exactly. This is really important, as Airbyte needs to have clear expectations for what the stream will output. Note that the name of this stream will be consistent in the naming of the JSON schema and the `HttpStream` class, as `pokemon.json` and `Pokemon` respectively in this case. Learn more about schema creation [here](https://docs.airbyte.com/connector-development/cdk-python/full-refresh-stream#defining-the-streams-schema). +This file defines your output schema for every endpoint that you want to implement. Normally, this +will likely be the most time-consuming section of the connector development process, as it requires +defining the output of the endpoint exactly. This is really important, as Airbyte needs to have +clear expectations for what the stream will output. Note that the name of this stream will be +consistent in the naming of the JSON schema and the `HttpStream` class, as `pokemon.json` and +`Pokemon` respectively in this case. Learn more about schema creation +[here](https://docs.airbyte.com/connector-development/cdk-python/full-refresh-stream#defining-the-streams-schema). Test your discover function. You should receive a fairly large JSON object in return. @@ -171,7 +189,8 @@ Test your discover function. You should receive a fairly large JSON object in re poetry run source-python-http-example discover --config sample_files/config.json ``` -Note that our discover function is using the `pokemon_name` config variable passed in from the `Pokemon` stream when we set it in the `__init__` function. +Note that our discover function is using the `pokemon_name` config variable passed in from the +`Pokemon` stream when we set it in the `__init__` function. ### Reading Data from the Source @@ -220,7 +239,13 @@ class Pokemon(HttpStream): return None ``` -We now need a catalog that defines all of our streams. We only have one stream: `Pokemon`. Download that file [here](./cdk-speedrun-assets/configured_catalog_pokeapi.json). Place it in `/sample_files` named as `configured_catalog.json`. More clearly, this is where we tell Airbyte all the streams/endpoints we support for the connector and in which sync modes Airbyte can run the connector on. Learn more about the AirbyteCatalog [here](https://docs.airbyte.com/understanding-airbyte/beginners-guide-to-catalog) and learn more about sync modes [here](https://docs.airbyte.com/understanding-airbyte/connections#sync-modes). +We now need a catalog that defines all of our streams. We only have one stream: `Pokemon`. Download +that file [here](./cdk-speedrun-assets/configured_catalog_pokeapi.json). Place it in `/sample_files` +named as `configured_catalog.json`. More clearly, this is where we tell Airbyte all the +streams/endpoints we support for the connector and in which sync modes Airbyte can run the connector +on. Learn more about the AirbyteCatalog +[here](https://docs.airbyte.com/understanding-airbyte/beginners-guide-to-catalog) and learn more +about sync modes [here](https://docs.airbyte.com/understanding-airbyte/connections#sync-modes). Let's read some data. @@ -230,24 +255,30 @@ poetry run source-python-http-example read --config sample_files/config.json --c If all goes well, containerize it so you can use it in the UI: - **Option A: Building the docker image with `airbyte-ci`** This is the preferred method for building and testing connectors. -If you want to open source your connector we encourage you to use our [`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md) tool to build your connector. -It will not use a Dockerfile but will build the connector image from our [base image](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/base_images/README.md) and use our internal build logic to build an image from your Python connector code. +If you want to open source your connector we encourage you to use our +[`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md) +tool to build your connector. It will not use a Dockerfile but will build the connector image from +our +[base image](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/base_images/README.md) +and use our internal build logic to build an image from your Python connector code. Running `airbyte-ci connectors --name source- build` will build your connector image. -Once the command is done, you will find your connector image in your local docker host: `airbyte/source-:dev`. - - +Once the command is done, you will find your connector image in your local docker host: +`airbyte/source-:dev`. **Option B: Building the docker image with a Dockerfile** -If you don't want to rely on `airbyte-ci` to build your connector, you can build the docker image using your own Dockerfile. This method is not preferred, and is not supported for certified connectors. +If you don't want to rely on `airbyte-ci` to build your connector, you can build the docker image +using your own Dockerfile. This method is not preferred, and is not supported for certified +connectors. + +Create a `Dockerfile` in the root of your connector directory. The `Dockerfile` should look +something like this: -Create a `Dockerfile` in the root of your connector directory. The `Dockerfile` should look something like this: ```Dockerfile FROM airbyte/python-connector-base:1.1.0 @@ -263,13 +294,15 @@ RUN pip install ./airbyte/integration_code Please use this as an example. This is not optimized. Build your image: + ```bash docker build . -t airbyte/source-example-python:dev ``` - You're done. Stop the clock :\) ## Further reading -If you have enjoyed the above example, and would like to explore the Python CDK in even more detail, you may be interested looking at [how to build a connector to extract data from the Webflow API](https://airbyte.com/tutorials/extract-data-from-the-webflow-api) +If you have enjoyed the above example, and would like to explore the Python CDK in even more detail, +you may be interested looking at +[how to build a connector to extract data from the Webflow API](https://airbyte.com/tutorials/extract-data-from-the-webflow-api) diff --git a/docs/connector-development/tutorials/cdk-tutorial-python-http/connection-checking.md b/docs/connector-development/tutorials/cdk-tutorial-python-http/connection-checking.md index 2e34eb1adf30..984082e7a60b 100644 --- a/docs/connector-development/tutorials/cdk-tutorial-python-http/connection-checking.md +++ b/docs/connector-development/tutorials/cdk-tutorial-python-http/connection-checking.md @@ -2,10 +2,18 @@ The second operation in the Airbyte Protocol that we'll implement is the `check` operation. -This operation verifies that the input configuration supplied by the user can be used to connect to the underlying data source. Note that this user-supplied configuration has the values described in the `spec.yaml` filled in. In other words if the `spec.yaml` said that the source requires a `username` and `password` the config object might be `{ "username": "airbyte", "password": "password123" }`. You should then implement something that returns a json object reporting, given the credentials in the config, whether we were able to connect to the source. - -In order to make requests to the API, we need to specify the access. -In our case, this is a fairly trivial check since the API requires no credentials. Instead, let's verify that the user-input `base` currency is a legitimate currency. In `source.py` we'll find the following autogenerated source: +This operation verifies that the input configuration supplied by the user can be used to connect to +the underlying data source. Note that this user-supplied configuration has the values described in +the `spec.yaml` filled in. In other words if the `spec.yaml` said that the source requires a +`username` and `password` the config object might be +`{ "username": "airbyte", "password": "password123" }`. You should then implement something that +returns a json object reporting, given the credentials in the config, whether we were able to +connect to the source. + +In order to make requests to the API, we need to specify the access. In our case, this is a fairly +trivial check since the API requires no credentials. Instead, let's verify that the user-input +`base` currency is a legitimate currency. In `source.py` we'll find the following autogenerated +source: ```python class SourcePythonHttpTutorial(AbstractSource): @@ -26,7 +34,8 @@ class SourcePythonHttpTutorial(AbstractSource): ... ``` -Following the docstring instructions, we'll change the implementation to verify that the input currency is a real currency: +Following the docstring instructions, we'll change the implementation to verify that the input +currency is a real currency: ```python def check_connection(self, logger, config) -> Tuple[bool, any]: @@ -38,9 +47,19 @@ Following the docstring instructions, we'll change the implementation to verify return True, None ``` -Note: in a real implementation you should write code to connect to the API to validate connectivity and not just validate inputs - for an example see `check_connection` in the [OneSignal source connector implementation](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-onesignal/source_onesignal/source.py) +:::info + +In a real implementation you should write code to connect to the API to validate connectivity +and not just validate inputs - for an example see `check_connection` in the +[OneSignal source connector implementation](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-onesignal/source_onesignal/source.py) + +::: -Let's test out this implementation by creating two objects: a valid and an invalid config and attempt to give them as input to the connector. For this section, you will need to take the API access key generated earlier and add it to both configs. Because these configs contain secrets, we recommend storing configs which contain secrets in `secrets/config.json` because the `secrets` directory is gitignored by default. +Let's test out this implementation by creating two objects: a valid and an invalid config and +attempt to give them as input to the connector. For this section, you will need to take the API +access key generated earlier and add it to both configs. Because these configs contain secrets, we +recommend storing configs which contain secrets in `secrets/config.json` because the `secrets` +directory is gitignored by default. ```bash mkdir sample_files @@ -60,4 +79,5 @@ You should see output like the following: {"type": "CONNECTION_STATUS", "connectionStatus": {"status": "FAILED", "message": "Input currency BTC is invalid. Please input one of the following currencies: {'DKK', 'USD', 'CZK', 'BGN', 'JPY'}"}} ``` -While developing, we recommend storing configs which contain secrets in `secrets/config.json` because the `secrets` directory is gitignored by default. +While developing, we recommend storing configs which contain secrets in `secrets/config.json` +because the `secrets` directory is gitignored by default. diff --git a/docs/connector-development/tutorials/cdk-tutorial-python-http/creating-the-source.md b/docs/connector-development/tutorials/cdk-tutorial-python-http/creating-the-source.md index bead7be49423..ed4ff875bc38 100644 --- a/docs/connector-development/tutorials/cdk-tutorial-python-http/creating-the-source.md +++ b/docs/connector-development/tutorials/cdk-tutorial-python-http/creating-the-source.md @@ -8,9 +8,15 @@ $ cd airbyte-integrations/connector-templates/generator # assumes you are starti $ ./generate.sh ``` -This will bring up an interactive helper application. Use the arrow keys to pick a template from the list. Select the `Python HTTP API Source` template and then input the name of your connector. The application will create a new directory in airbyte/airbyte-integrations/connectors/ with the name of your new connector. +This will bring up an interactive helper application. Use the arrow keys to pick a template from the +list. Select the `Python HTTP API Source` template and then input the name of your connector. The +application will create a new directory in airbyte/airbyte-integrations/connectors/ with the name of +your new connector. -For this walk-through we will refer to our source as `python-http-example`. The finalized source code for this tutorial can be found [here](https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/source-python-http-tutorial). - -The source we will build in this tutorial will pull data from the [Rates API](https://exchangeratesapi.io/), a free and open API which documents historical exchange rates for fiat currencies. +For this walk-through we will refer to our source as `python-http-example`. The finalized source +code for this tutorial can be found +[here](https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/source-python-http-tutorial). +The source we will build in this tutorial will pull data from the +[Rates API](https://exchangeratesapi.io/), a free and open API which documents historical exchange +rates for fiat currencies. diff --git a/docs/connector-development/tutorials/cdk-tutorial-python-http/declare-schema.md b/docs/connector-development/tutorials/cdk-tutorial-python-http/declare-schema.md index b97aeb1b587b..54f15a72e5c3 100644 --- a/docs/connector-development/tutorials/cdk-tutorial-python-http/declare-schema.md +++ b/docs/connector-development/tutorials/cdk-tutorial-python-http/declare-schema.md @@ -1,15 +1,26 @@ # Step 5: Declare the Schema -The `discover` method of the Airbyte Protocol returns an `AirbyteCatalog`: an object which declares all the streams output by a connector and their schemas. It also declares the sync modes supported by the stream \(full refresh or incremental\). See the [catalog tutorial](https://docs.airbyte.com/understanding-airbyte/beginners-guide-to-catalog) for more information. +The `discover` method of the Airbyte Protocol returns an `AirbyteCatalog`: an object which declares +all the streams output by a connector and their schemas. It also declares the sync modes supported +by the stream \(full refresh or incremental\). See the +[catalog tutorial](https://docs.airbyte.com/understanding-airbyte/beginners-guide-to-catalog) for +more information. -This is a simple task with the Airbyte CDK. For each stream in our connector we'll need to: +This is a simple task with the Airbyte CDK. For each stream in our connector we'll need to: -1. Create a python `class` in `source.py` which extends `HttpStream`. -2. Place a `.json` file in the `source_/schemas/` directory. The name of the file should be the snake\_case name of the stream whose schema it describes, and its contents should be the JsonSchema describing the output from that stream. +1. Create a python `class` in `source.py` which extends `HttpStream`. +2. Place a `.json` file in the `source_/schemas/` directory. The name of the file + should be the snake_case name of the stream whose schema it describes, and its contents should be + the JsonSchema describing the output from that stream. -Let's create a class in `source.py` which extends `HttpStream`. You'll notice there are classes with extensive comments describing what needs to be done to implement various connector features. Feel free to read these classes as needed. But for the purposes of this tutorial, let's assume that we are adding classes from scratch either by deleting those generated classes or editing them to match the implementation below. +Let's create a class in `source.py` which extends `HttpStream`. You'll notice there are classes with +extensive comments describing what needs to be done to implement various connector features. Feel +free to read these classes as needed. But for the purposes of this tutorial, let's assume that we +are adding classes from scratch either by deleting those generated classes or editing them to match +the implementation below. -We'll begin by creating a stream to represent the data that we're pulling from the Exchange Rates API: +We'll begin by creating a stream to represent the data that we're pulling from the Exchange Rates +API: ```python class ExchangeRates(HttpStream): @@ -23,9 +34,9 @@ class ExchangeRates(HttpStream): return None def path( - self, - stream_state: Mapping[str, Any] = None, - stream_slice: Mapping[str, Any] = None, + self, + stream_state: Mapping[str, Any] = None, + stream_slice: Mapping[str, Any] = None, next_page_token: Mapping[str, Any] = None ) -> str: return "" # TODO @@ -40,7 +51,9 @@ class ExchangeRates(HttpStream): return None # TODO ``` -Note that this implementation is entirely empty -- we haven't actually done anything. We'll come back to this in the next step. But for now we just want to declare the schema of this stream. We'll declare this as a stream that the connector outputs by returning it from the `streams` method: +Note that this implementation is entirely empty -- we haven't actually done anything. We'll come +back to this in the next step. But for now we just want to declare the schema of this stream. We'll +declare this as a stream that the connector outputs by returning it from the `streams` method: ```python from airbyte_cdk.sources.streams.http.auth import NoAuth @@ -53,26 +66,32 @@ class SourcePythonHttpTutorial(AbstractSource): def streams(self, config: Mapping[str, Any]) -> List[Stream]: # NoAuth just means there is no authentication required for this API and is included for completeness. # Skip passing an authenticator if no authentication is required. - # Other authenticators are available for API token-based auth and Oauth2. - auth = NoAuth() + # Other authenticators are available for API token-based auth and Oauth2. + auth = NoAuth() return [ExchangeRates(authenticator=auth)] ``` -Having created this stream in code, we'll put a file `exchange_rates.json` in the `schemas/` folder. You can download the JSON file describing the output schema [here](./exchange_rates_schema.json) for convenience and place it in `schemas/`. +Having created this stream in code, we'll put a file `exchange_rates.json` in the `schemas/` folder. +You can download the JSON file describing the output schema [here](./exchange_rates_schema.json) for +convenience and place it in `schemas/`. -With `.json` schema file in place, let's see if the connector can now find this schema and produce a valid catalog: +With `.json` schema file in place, let's see if the connector can now find this schema and produce a +valid catalog: -```text +```bash poetry run source-python-http-example discover --config secrets/config.json # this is not a mistake, the schema file is found by naming snake_case naming convention as specified above ``` you should see some output like: -```text +```json {"type": "CATALOG", "catalog": {"streams": [{"name": "exchange_rates", "json_schema": {"$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": {"base": {"type": "string"}, "rates": {"type": "object", "properties": {"GBP": {"type": "number"}, "HKD": {"type": "number"}, "IDR": {"type": "number"}, "PHP": {"type": "number"}, "LVL": {"type": "number"}, "INR": {"type": "number"}, "CHF": {"type": "number"}, "MXN": {"type": "number"}, "SGD": {"type": "number"}, "CZK": {"type": "number"}, "THB": {"type": "number"}, "BGN": {"type": "number"}, "EUR": {"type": "number"}, "MYR": {"type": "number"}, "NOK": {"type": "number"}, "CNY": {"type": "number"}, "HRK": {"type": "number"}, "PLN": {"type": "number"}, "LTL": {"type": "number"}, "TRY": {"type": "number"}, "ZAR": {"type": "number"}, "CAD": {"type": "number"}, "BRL": {"type": "number"}, "RON": {"type": "number"}, "DKK": {"type": "number"}, "NZD": {"type": "number"}, "EEK": {"type": "number"}, "JPY": {"type": "number"}, "RUB": {"type": "number"}, "KRW": {"type": "number"}, "USD": {"type": "number"}, "AUD": {"type": "number"}, "HUF": {"type": "number"}, "SEK": {"type": "number"}}}, "date": {"type": "string"}}}, "supported_sync_modes": ["full_refresh"]}]}} ``` -It's that simple! Now the connector knows how to declare your connector's stream's schema. We declare only one stream since our source is simple, but the principle is exactly the same if you had many streams. - -You can also dynamically define schemas, but that's beyond the scope of this tutorial. See the [schema docs](../../cdk-python/full-refresh-stream.md#defining-the-streams-schema) for more information. +It's that simple! Now the connector knows how to declare your connector's stream's schema. We +declare only one stream since our source is simple, but the principle is exactly the same if you had +many streams. +You can also dynamically define schemas, but that's beyond the scope of this tutorial. See the +[schema docs](../../cdk-python/full-refresh-stream.md#defining-the-streams-schema) for more +information. diff --git a/docs/connector-development/tutorials/cdk-tutorial-python-http/define-inputs.md b/docs/connector-development/tutorials/cdk-tutorial-python-http/define-inputs.md index 0cbe0bce93c9..956a45219430 100644 --- a/docs/connector-development/tutorials/cdk-tutorial-python-http/define-inputs.md +++ b/docs/connector-development/tutorials/cdk-tutorial-python-http/define-inputs.md @@ -1,14 +1,25 @@ # Step 3: Define Inputs -Each connector declares the inputs it needs to read data from the underlying data source. This is the Airbyte Protocol's `spec` operation. +Each connector declares the inputs it needs to read data from the underlying data source. This is +the Airbyte Protocol's `spec` operation. -The simplest way to implement this is by creating a `spec.yaml` file in `source_/spec.yaml` which describes your connector's inputs according to the [ConnectorSpecification](https://github.com/airbytehq/airbyte/blob/master/docs/understanding-airbyte/airbyte-protocol.md#spec) schema. This is a good place to start when developing your source. Using JsonSchema, define what the inputs are \(e.g. username and password\). Here's [an example](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-stripe/source_stripe/spec.yaml) of what the `spec.yaml` looks like for the Stripe API source. +The simplest way to implement this is by creating a `spec.yaml` file in `source_/spec.yaml` +which describes your connector's inputs according to the +[ConnectorSpecification](https://github.com/airbytehq/airbyte/blob/master/docs/understanding-airbyte/airbyte-protocol.md#spec) +schema. This is a good place to start when developing your source. Using JsonSchema, define what the +inputs are \(e.g. username and password\). Here's +[an example](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-stripe/source_stripe/spec.yaml) +of what the `spec.yaml` looks like for the Stripe API source. -For more details on what the spec is, you can read about the Airbyte Protocol [here](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol). +For more details on what the spec is, you can read about the Airbyte Protocol +[here](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol). -The generated code that Airbyte provides, handles implementing the `spec` method for you. It assumes that there will be a file called `spec.yaml` in the same directory as `source.py`. If you have declared the necessary JsonSchema in `spec.yaml` you should be done with this step. +The generated code that Airbyte provides, handles implementing the `spec` method for you. It assumes +that there will be a file called `spec.yaml` in the same directory as `source.py`. If you have +declared the necessary JsonSchema in `spec.yaml` you should be done with this step. -Given that we'll pulling currency data for our example source, we'll define the following `spec.yaml`: +Given that we'll pulling currency data for our example source, we'll define the following +`spec.yaml`: ```yaml documentationUrl: https://docs.airbyte.com/integrations/sources/exchangeratesapi @@ -36,12 +47,13 @@ connectionSpecification: examples: - USD - EUR - description: "ISO reference currency. See here." + description: + 'ISO reference currency. See here.' ``` In addition to metadata, we define three inputs: -* `apikey`: The API access key used to authenticate requests to the API -* `start_date`: The beginning date to start tracking currency exchange rates from -* `base`: The currency whose rates we're interested in tracking - +- `apikey`: The API access key used to authenticate requests to the API +- `start_date`: The beginning date to start tracking currency exchange rates from +- `base`: The currency whose rates we're interested in tracking diff --git a/docs/connector-development/tutorials/cdk-tutorial-python-http/getting-started.md b/docs/connector-development/tutorials/cdk-tutorial-python-http/getting-started.md index 57b2fb4624f9..f97c65bd6352 100644 --- a/docs/connector-development/tutorials/cdk-tutorial-python-http/getting-started.md +++ b/docs/connector-development/tutorials/cdk-tutorial-python-http/getting-started.md @@ -2,30 +2,37 @@ ## Summary -This is a step-by-step guide for how to create an Airbyte source in Python to read data from an HTTP API. We'll be using the Exchange Rates API as an example since it is simple and demonstrates a lot of the capabilities of the CDK. +This is a step-by-step guide for how to create an Airbyte source in Python to read data from an HTTP +API. We'll be using the Exchange Rates API as an example since it is simple and demonstrates a lot +of the capabilities of the CDK. ## Requirements - * Python >= 3.9 * [Poetry](https://python-poetry.org/) * Docker -All the commands below assume that `python` points to a version of python >=3.9.0. On some systems, `python` points to a Python2 installation and `python3` points to Python3. If this is the case on your machine, substitute all `python` commands in this guide with `python3`. +All the commands below assume that `python` points to a version of python >=3.9.0. On some +systems, `python` points to a Python2 installation and `python3` points to Python3. If this is the +case on your machine, substitute all `python` commands in this guide with `python3`. ## Exchange Rates API Setup -For this guide we will be making API calls to the Exchange Rates API. In order to generate the API access key that will be used by the new connector, you will have to follow steps on the [Exchange Rates Data API](https://apilayer.com/marketplace/exchangerates_data-api/) by signing up for the Free tier plan. Once you have an API access key, you can continue with the guide. +For this guide we will be making API calls to the Exchange Rates API. In order to generate the API +access key that will be used by the new connector, you will have to follow steps on the +[Exchange Rates Data API](https://apilayer.com/marketplace/exchangerates_data-api/) by signing up +for the Free tier plan. Once you have an API access key, you can continue with the guide. ## Checklist -* Step 1: Create the source using the template -* Step 2: Install dependencies for the new source -* Step 3: Define the inputs needed by your connector -* Step 4: Implement connection checking -* Step 5: Declare the schema of your streams -* Step 6: Implement functionality for reading your streams -* Step 7: Use the connector in Airbyte -* Step 8: Write unit tests or integration tests - -Each step of the Creating a Source checklist is explained in more detail in the following steps. We also mention how you can submit the connector to be included with the general Airbyte release at the end of the tutorial. - +- Step 1: Create the source using the template +- Step 2: Install dependencies for the new source +- Step 3: Define the inputs needed by your connector +- Step 4: Implement connection checking +- Step 5: Declare the schema of your streams +- Step 6: Implement functionality for reading your streams +- Step 7: Use the connector in Airbyte +- Step 8: Write unit tests or integration tests + +Each step of the Creating a Source checklist is explained in more detail in the following steps. We +also mention how you can submit the connector to be included with the general Airbyte release at the +end of the tutorial. diff --git a/docs/connector-development/tutorials/cdk-tutorial-python-http/install-dependencies.md b/docs/connector-development/tutorials/cdk-tutorial-python-http/install-dependencies.md index 3d7e50e22377..04a835a3c783 100644 --- a/docs/connector-development/tutorials/cdk-tutorial-python-http/install-dependencies.md +++ b/docs/connector-development/tutorials/cdk-tutorial-python-http/install-dependencies.md @@ -7,7 +7,6 @@ cd ../../connectors/source- poetry install ``` - Let's verify everything is working as intended. Run: ```bash @@ -16,32 +15,43 @@ poetry run source- spec You should see some output: -```text +```json {"type": "SPEC", "spec": {"documentationUrl": "https://docsurl.com", "connectionSpecification": {"$schema": "http://json-schema.org/draft-07/schema#", "title": "Python Http Tutorial Spec", "type": "object", "required": ["TODO"], "properties": {"TODO: This schema defines the configuration required for the source. This usually involves metadata such as database and/or authentication information.": {"type": "string", "description": "describe me"}}}}} ``` -We just ran Airbyte Protocol's `spec` command! We'll talk more about this later, but this is a simple sanity check to make sure everything is wired up correctly. - +We just ran Airbyte Protocol's `spec` command! We'll talk more about this later, but this is a +simple sanity check to make sure everything is wired up correctly. ## Notes on iteration cycle ### Dependencies -Python dependencies for your source should be declared in `airbyte-integrations/connectors/source-/setup.py` in the `install_requires` field. You will notice that a couple of Airbyte dependencies are already declared there. Do not remove these; they give your source access to the helper interfaces provided by the generator. +Python dependencies for your source should be declared in +`airbyte-integrations/connectors/source-/setup.py` in the `install_requires` field. You +will notice that a couple of Airbyte dependencies are already declared there. Do not remove these; +they give your source access to the helper interfaces provided by the generator. -You may notice that there is a `requirements.txt` in your source's directory as well. Don't edit this. It is autogenerated and used to provide Airbyte dependencies. All your dependencies should be declared in `setup.py`. +You may notice that there is a `requirements.txt` in your source's directory as well. Don't edit +this. It is autogenerated and used to provide Airbyte dependencies. All your dependencies should be +declared in `setup.py`. ### Development Environment -The commands we ran above created a [Python virtual environment](https://docs.python.org/3/tutorial/venv.html) for your source. If you want your IDE to auto complete and resolve dependencies properly, point it at the virtual env `airbyte-integrations/connectors/source-/.venv`. Also anytime you change the dependencies in the `setup.py` make sure to re-run `pip install -r requirements.txt`. +The commands we ran above created a +[Python virtual environment](https://docs.python.org/3/tutorial/venv.html) for your source. If you +want your IDE to auto complete and resolve dependencies properly, point it at the virtual env +`airbyte-integrations/connectors/source-/.venv`. Also anytime you change the +dependencies in the `setup.py` make sure to re-run `pip install -r requirements.txt`. ### Iterating on your implementation -There are two ways we recommend iterating on a source. Consider using whichever one matches your style. +There are two ways we recommend iterating on a source. Consider using whichever one matches your +style. **Run the source using python** -You'll notice in your source's directory that there is a python file called `main.py`. This file exists as convenience for development. You run it to test that your source works: +You'll notice in your source's directory that there is a python file called `main.py`. This file +exists as convenience for development. You run it to test that your source works: ```bash # from airbyte-integrations/connectors/source- @@ -51,11 +61,15 @@ poetry run source- discover --config secrets/config.json poetry run source- read --config secrets/config.json --catalog sample_files/configured_catalog.json ``` -The nice thing about this approach is that you can iterate completely within python. The downside is that you are not quite running your source as it will actually be run by Airbyte. Specifically, you're not running it from within the docker container that will house it. +The nice thing about this approach is that you can iterate completely within python. The downside is +that you are not quite running your source as it will actually be run by Airbyte. Specifically, +you're not running it from within the docker container that will house it. **Run the source using docker** -If you want to run your source exactly as it will be run by Airbyte \(i.e. within a docker container\), you can use the following commands from the connector module directory \(`airbyte-integrations/connectors/source-python-http-example`\): +If you want to run your source exactly as it will be run by Airbyte \(i.e. within a docker +container\), you can use the following commands from the connector module directory +\(`airbyte-integrations/connectors/source-python-http-example`\): ```bash # First build the container @@ -68,7 +82,14 @@ docker run --rm -v $(pwd)/secrets:/secrets airbyte/source-:dev discover -- docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/sample_files:/sample_files airbyte/source-:dev read --config /secrets/config.json --catalog /sample_files/configured_catalog.json ``` -Note: Each time you make a change to your implementation you need to re-build the connector image via `docker build . -t airbyte/source-:dev`. This ensures the new python code is added into the docker container. +:::info + +Each time you make a change to your implementation you need to re-build the connector image +via `docker build . -t airbyte/source-:dev`. This ensures the new python code is added into +the docker container. -The nice thing about this approach is that you are running your source exactly as it will be run by Airbyte. The tradeoff is iteration is slightly slower, as the connector is re-built between each change. +::: +The nice thing about this approach is that you are running your source exactly as it will be run by +Airbyte. The tradeoff is iteration is slightly slower, as the connector is re-built between each +change. diff --git a/docs/connector-development/tutorials/cdk-tutorial-python-http/read-data.md b/docs/connector-development/tutorials/cdk-tutorial-python-http/read-data.md index 0417bcdbde25..a2bcfee77562 100644 --- a/docs/connector-development/tutorials/cdk-tutorial-python-http/read-data.md +++ b/docs/connector-development/tutorials/cdk-tutorial-python-http/read-data.md @@ -1,36 +1,45 @@ # Step 6: Read Data -Describing schemas is good and all, but at some point we have to start reading data! So let's get to work. But before, let's describe what we're about to do: +Describing schemas is good and all, but at some point we have to start reading data! So let's get to +work. But before, let's describe what we're about to do: -The `HttpStream` superclass, like described in the [concepts documentation](../../cdk-python/http-streams.md), is facilitating reading data from HTTP endpoints. It contains built-in functions or helpers for: +The `HttpStream` superclass, like described in the +[concepts documentation](../../cdk-python/http-streams.md), is facilitating reading data from HTTP +endpoints. It contains built-in functions or helpers for: -* authentication -* pagination -* handling rate limiting or transient errors -* and other useful functionality +- authentication +- pagination +- handling rate limiting or transient errors +- and other useful functionality In order for it to be able to do this, we have to provide it with a few inputs: -* the URL base and path of the endpoint we'd like to hit -* how to parse the response from the API -* how to perform pagination +- the URL base and path of the endpoint we'd like to hit +- how to parse the response from the API +- how to perform pagination Optionally, we can provide additional inputs to customize requests: -* request parameters and headers -* how to recognize rate limit errors, and how long to wait \(by default it retries 429 and 5XX errors using exponential backoff\) -* HTTP method and request body if applicable -* configure exponential backoff policy +- request parameters and headers +- how to recognize rate limit errors, and how long to wait \(by default it retries 429 and 5XX + errors using exponential backoff\) +- HTTP method and request body if applicable +- configure exponential backoff policy Backoff policy options: -* `retry_factor` Specifies factor for exponential backoff policy \(by default is 5\) -* `max_retries` Specifies maximum amount of retries for backoff policy \(by default is 5\) -* `raise_on_http_errors` If set to False, allows opting-out of raising HTTP code exception \(by default is True\) +- `retry_factor` Specifies factor for exponential backoff policy \(by default is 5\) +- `max_retries` Specifies maximum amount of retries for backoff policy \(by default is 5\) +- `raise_on_http_errors` If set to False, allows opting-out of raising HTTP code exception \(by + default is True\) -There are many other customizable options - you can find them in the [`airbyte_cdk.sources.streams.http.HttpStream`](https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/airbyte_cdk/sources/streams/http/http.py) class. +There are many other customizable options - you can find them in the +[`airbyte_cdk.sources.streams.http.HttpStream`](https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/airbyte_cdk/sources/streams/http/http.py) +class. -So in order to read data from the exchange rates API, we'll fill out the necessary information for the stream to do its work. First, we'll implement a basic read that just reads the last day's exchange rates, then we'll implement incremental sync using stream slicing. +So in order to read data from the exchange rates API, we'll fill out the necessary information for +the stream to do its work. First, we'll implement a basic read that just reads the last day's +exchange rates, then we'll implement incremental sync using stream slicing. Let's begin by pulling data for the last day's rates by using the `/latest` endpoint: @@ -47,13 +56,13 @@ class ExchangeRates(HttpStream): def path( - self, - stream_state: Mapping[str, Any] = None, - stream_slice: Mapping[str, Any] = None, + self, + stream_state: Mapping[str, Any] = None, + stream_slice: Mapping[str, Any] = None, next_page_token: Mapping[str, Any] = None ) -> str: # The "/latest" path gives us the latest currency exchange rates - return "latest" + return "latest" def request_headers( self, stream_state: Mapping[str, Any], stream_slice: Mapping[str, Any] = None, next_page_token: Mapping[str, Any] = None @@ -77,23 +86,30 @@ class ExchangeRates(HttpStream): stream_slice: Mapping[str, Any] = None, next_page_token: Mapping[str, Any] = None, ) -> Iterable[Mapping]: - # The response is a simple JSON whose schema matches our stream's schema exactly, + # The response is a simple JSON whose schema matches our stream's schema exactly, # so we just return a list containing the response return [response.json()] def next_page_token(self, response: requests.Response) -> Optional[Mapping[str, Any]]: - # The API does not offer pagination, + # The API does not offer pagination, # so we return None to indicate there are no more pages in the response return None ``` -This may look big, but that's just because there are lots of \(unused, for now\) parameters in these methods \(those can be hidden with Python's `**kwargs`, but don't worry about it for now\). Really we just added a few lines of "significant" code: - -1. Added a constructor `__init__` which stores the `base` currency to query for and the `apikey` used for authentication. -2. `return {'base': self.base}` to add the `?base=` query parameter to the request based on the `base` input by the user. -3. `return {'apikey': self.apikey}` to add the header `apikey=` to the request based on the `apikey` input by the user. -4. `return [response.json()]` to parse the response from the API to match the schema of our schema `.json` file. -5. `return "latest"` to indicate that we want to hit the `/latest` endpoint of the API to get the latest exchange rate data. +This may look big, but that's just because there are lots of \(unused, for now\) parameters in these +methods \(those can be hidden with Python's `**kwargs`, but don't worry about it for now\). Really +we just added a few lines of "significant" code: + +1. Added a constructor `__init__` which stores the `base` currency to query for and the `apikey` + used for authentication. +2. `return {'base': self.base}` to add the `?base=` query parameter to the request based + on the `base` input by the user. +3. `return {'apikey': self.apikey}` to add the header `apikey=` to the request based + on the `apikey` input by the user. +4. `return [response.json()]` to parse the response from the API to match the schema of our schema + `.json` file. +5. `return "latest"` to indicate that we want to hit the `/latest` endpoint of the API to get the + latest exchange rate data. Let's also pass the config specified by the user to the stream class: @@ -105,7 +121,11 @@ Let's also pass the config specified by the user to the stream class: We're now ready to query the API! -To do this, we'll need a [ConfiguredCatalog](../../../understanding-airbyte/beginners-guide-to-catalog.md). We've prepared one [here](https://github.com/airbytehq/airbyte/blob/master/docs/connector-development/tutorials/cdk-tutorial-python-http/configured_catalog.json) -- download this and place it in `sample_files/configured_catalog.json`. Then run: +To do this, we'll need a +[ConfiguredCatalog](../../../understanding-airbyte/beginners-guide-to-catalog.md). We've prepared +one +[here](https://github.com/airbytehq/airbyte/blob/master/docs/connector-development/tutorials/cdk-tutorial-python-http/configured_catalog.json) +-- download this and place it in `sample_files/configured_catalog.json`. Then run: ```bash poetry run source- --config secrets/config.json --catalog sample_files/configured_catalog.json @@ -119,20 +139,25 @@ you should see some output lines, one of which is a record from the API: There we have it - a stream which reads data in just a few lines of code! -We theoretically _could_ stop here and call it a connector. But let's give adding incremental sync a shot. +We theoretically _could_ stop here and call it a connector. But let's give adding incremental sync a +shot. ## Adding incremental sync -To add incremental sync, we'll do a few things: -1. Pass the `start_date` param input by the user into the stream. -2. Declare the stream's `cursor_field`. +To add incremental sync, we'll do a few things: + +1. Pass the `start_date` param input by the user into the stream. +2. Declare the stream's `cursor_field`. 3. Declare the stream's property `_cursor_value` to hold the state value -4. Add `IncrementalMixin` to the list of the ancestors of the stream and implement setter and getter of the `state`. -5. Implement the `stream_slices` method. -6. Update the `path` method to specify the date to pull exchange rates for. +4. Add `IncrementalMixin` to the list of the ancestors of the stream and implement setter and getter + of the `state`. +5. Implement the `stream_slices` method. +6. Update the `path` method to specify the date to pull exchange rates for. 7. Update the configured catalog to use `incremental` sync when we're testing the stream. -We'll describe what each of these methods do below. Before we begin, it may help to familiarize yourself with how incremental sync works in Airbyte by reading the [docs on incremental](/using-airbyte/core-concepts/sync-modes/incremental-append.md). +We'll describe what each of these methods do below. Before we begin, it may help to familiarize +yourself with how incremental sync works in Airbyte by reading the +[docs on incremental](/using-airbyte/core-concepts/sync-modes/incremental-append.md). To keep things concise, we'll only show functions as we edit them one by one. @@ -166,11 +191,18 @@ class ExchangeRates(HttpStream, IncrementalMixin): self._cursor_value = None ``` -Declaring the `cursor_field` informs the framework that this stream now supports incremental sync. The next time you run `python main_dev.py discover --config secrets/config.json` you'll find that the `supported_sync_modes` field now also contains `incremental`. +Declaring the `cursor_field` informs the framework that this stream now supports incremental sync. +The next time you run `python main_dev.py discover --config secrets/config.json` you'll find that +the `supported_sync_modes` field now also contains `incremental`. -But we're not quite done with supporting incremental, we have to actually emit state! We'll structure our state object very simply: it will be a `dict` whose single key is `'date'` and value is the date of the last day we synced data from. For example, `{'date': '2021-04-26'}` indicates the connector previously read data up until April 26th and therefore shouldn't re-read anything before April 26th. +But we're not quite done with supporting incremental, we have to actually emit state! We'll +structure our state object very simply: it will be a `dict` whose single key is `'date'` and value +is the date of the last day we synced data from. For example, `{'date': '2021-04-26'}` indicates the +connector previously read data up until April 26th and therefore shouldn't re-read anything before +April 26th. -Let's do this by implementing the getter and setter for the `state` inside the `ExchangeRates` class. +Let's do this by implementing the getter and setter for the `state` inside the `ExchangeRates` +class. ```python @property @@ -179,7 +211,7 @@ Let's do this by implementing the getter and setter for the `state` inside the ` return {self.cursor_field: self._cursor_value.strftime('%Y-%m-%d')} else: return {self.cursor_field: self.start_date.strftime('%Y-%m-%d')} - + @state.setter def state(self, value: Mapping[str, Any]): self._cursor_value = datetime.strptime(value[self.cursor_field], '%Y-%m-%d') @@ -197,9 +229,11 @@ Update internal state `cursor_value` inside `read_records` method ``` -This implementation compares the date from the latest record with the date in the current state and takes the maximum as the "new" state object. +This implementation compares the date from the latest record with the date in the current state and +takes the maximum as the "new" state object. -We'll implement the `stream_slices` method to return a list of the dates for which we should pull data based on the stream state if it exists: +We'll implement the `stream_slices` method to return a list of the dates for which we should pull +data based on the stream state if it exists: ```python def _chunk_date_range(self, start_date: datetime) -> List[Mapping[str, Any]]: @@ -218,18 +252,24 @@ We'll implement the `stream_slices` method to return a list of the dates for whi return self._chunk_date_range(start_date) ``` -Each slice will cause an HTTP request to be made to the API. We can then use the information present in the `stream_slice` parameter \(a single element from the list we constructed in `stream_slices` above\) to set other configurations for the outgoing request like `path` or `request_params`. For more info about stream slicing, see [the slicing docs](../../cdk-python/stream-slices.md). +Each slice will cause an HTTP request to be made to the API. We can then use the information present +in the `stream_slice` parameter \(a single element from the list we constructed in `stream_slices` +above\) to set other configurations for the outgoing request like `path` or `request_params`. For +more info about stream slicing, see [the slicing docs](../../cdk-python/stream-slices.md). -In order to pull data for a specific date, the Exchange Rates API requires that we pass the date as the path component of the URL. Let's override the `path` method to achieve this: +In order to pull data for a specific date, the Exchange Rates API requires that we pass the date as +the path component of the URL. Let's override the `path` method to achieve this: ```python def path(self, stream_state: Mapping[str, Any] = None, stream_slice: Mapping[str, Any] = None, next_page_token: Mapping[str, Any] = None) -> str: return stream_slice['date'] ``` -With these changes, your implementation should look like the file [here](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-python-http-tutorial/source_python_http_tutorial/source.py). +With these changes, your implementation should look like the file +[here](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-python-http-tutorial/source_python_http_tutorial/source.py). -The last thing we need to do is change the `sync_mode` field in the `sample_files/configured_catalog.json` to `incremental`: +The last thing we need to do is change the `sync_mode` field in the +`sample_files/configured_catalog.json` to `incremental`: ```text "sync_mode": "incremental", @@ -243,7 +283,8 @@ Let's try it out: poetry run source- --config secrets/config.json --catalog sample_files/configured_catalog.json ``` -You should see a bunch of `RECORD` messages and `STATE` messages. To verify that incremental sync is working, pass the input state back to the connector and run it again: +You should see a bunch of `RECORD` messages and `STATE` messages. To verify that incremental sync is +working, pass the input state back to the connector and run it again: ```bash # Save the latest state to sample_files/state.json @@ -253,7 +294,7 @@ poetry run source- --config secrets/config.json --catalog sample_files/con poetry run source- --config secrets/config.json --catalog sample_files/configured_catalog.json --state sample_files/state.json ``` -You should see that only the record from the last date is being synced! This is acceptable behavior, since Airbyte requires at-least-once delivery of records, so repeating the last record twice is OK. +You should see that only the record from the last date is being synced! This is acceptable behavior, +since Airbyte requires at-least-once delivery of records, so repeating the last record twice is OK. With that, we've implemented incremental sync for our connector! - diff --git a/docs/connector-development/tutorials/cdk-tutorial-python-http/test-your-connector.md b/docs/connector-development/tutorials/cdk-tutorial-python-http/test-your-connector.md index 521d8b05821f..c6fe41cc6265 100644 --- a/docs/connector-development/tutorials/cdk-tutorial-python-http/test-your-connector.md +++ b/docs/connector-development/tutorials/cdk-tutorial-python-http/test-your-connector.md @@ -1,4 +1,4 @@ -# Step 8: Test Connector +# Step 8: Test the Connector ## Unit Tests @@ -8,15 +8,21 @@ You can run the tests using `poetry run pytest tests/unit_tests`. ## Integration Tests -Place any integration tests in the `integration_tests` directory such that they can be [discovered by pytest](https://docs.pytest.org/en/6.2.x/goodpractices.html#conventions-for-python-test-discovery). +Place any integration tests in the `integration_tests` directory such that they can be +[discovered by pytest](https://docs.pytest.org/en/6.2.x/goodpractices.html#conventions-for-python-test-discovery). You can run the tests using `poetry run pytest tests/integration_tests`. -More information on integration testing can be found on [the Testing Connectors doc](https://docs.airbyte.com/connector-development/testing-connectors/#running-integration-tests). +More information on integration testing can be found on +[the Testing Connectors doc](https://docs.airbyte.com/connector-development/testing-connectors/#running-integration-tests). -## Standard Tests +## Connector Acceptance Tests -Standard tests are a fixed set of tests Airbyte provides that every Airbyte source connector must pass. While they're only required if you intend to submit your connector to Airbyte, you might find them helpful in any case. See [Testing your connectors](../../testing-connectors/) - -If you want to submit this connector to become a default connector within Airbyte, follow steps 8 onwards from the [Python source checklist](../building-a-python-source.md#step-8-set-up-standard-tests) +Connector Acceptance Tests (CATs) are a fixed set of tests Airbyte provides that every Airbyte +source connector must pass. While they're only required if you intend to submit your connector +to Airbyte, you might find them helpful in any case. See +[Testing your connectors](../../testing-connectors/) +If you want to submit this connector to become a default connector within Airbyte, follow steps 8 +onwards from the +[Python source checklist](../building-a-python-source.md#step-8-set-up-standard-tests) diff --git a/docs/connector-development/tutorials/cdk-tutorial-python-http/use-connector-in-airbyte.md b/docs/connector-development/tutorials/cdk-tutorial-python-http/use-connector-in-airbyte.md index db190ea87d3e..7772bcbebc1d 100644 --- a/docs/connector-development/tutorials/cdk-tutorial-python-http/use-connector-in-airbyte.md +++ b/docs/connector-development/tutorials/cdk-tutorial-python-http/use-connector-in-airbyte.md @@ -1,26 +1,32 @@ # Step 7: Use the Connector in Airbyte -To use your connector in your own installation of Airbyte you have to build the docker image for your connector. - - +To use your connector in your own installation of Airbyte you have to build the docker image for +your connector. **Option A: Building the docker image with `airbyte-ci`** This is the preferred method for building and testing connectors. -If you want to open source your connector we encourage you to use our [`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md) tool to build your connector. -It will not use a Dockerfile but will build the connector image from our [base image](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/base_images/README.md) and use our internal build logic to build an image from your Python connector code. +If you want to open source your connector we encourage you to use our +[`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md) +tool to build your connector. It will not use a Dockerfile but will build the connector image from +our +[base image](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/base_images/README.md) +and use our internal build logic to build an image from your Python connector code. Running `airbyte-ci connectors --name source- build` will build your connector image. -Once the command is done, you will find your connector image in your local docker host: `airbyte/source-:dev`. - - +Once the command is done, you will find your connector image in your local docker host: +`airbyte/source-:dev`. **Option B: Building the docker image with a Dockerfile** -If you don't want to rely on `airbyte-ci` to build your connector, you can build the docker image using your own Dockerfile. This method is not preferred, and is not supported for certified connectors. +If you don't want to rely on `airbyte-ci` to build your connector, you can build the docker image +using your own Dockerfile. This method is not preferred, and is not supported for certified +connectors. + +Create a `Dockerfile` in the root of your connector directory. The `Dockerfile` should look +something like this: -Create a `Dockerfile` in the root of your connector directory. The `Dockerfile` should look something like this: ```Dockerfile FROM airbyte/python-connector-base:1.1.0 @@ -36,11 +42,15 @@ RUN pip install ./airbyte/integration_code Please use this as an example. This is not optimized. Build your image: + ```bash docker build . -t airbyte/source-example-python:dev ``` -Then, follow the instructions from the [building a Python source tutorial](../building-a-python-source.md#step-11-add-the-connector-to-the-api-ui) for using the connector in the Airbyte UI, replacing the name as appropriate. - -Note: your built docker image must be accessible to the `docker` daemon running on the Airbyte node. If you're doing this tutorial locally, these instructions are sufficient. Otherwise you may need to push your Docker image to Dockerhub. +Then, follow the instructions from the +[building a Python source tutorial](../building-a-python-source.md#step-11-add-the-connector-to-the-api-ui) +for using the connector in the Airbyte UI, replacing the name as appropriate. +Note: your built docker image must be accessible to the `docker` daemon running on the Airbyte node. +If you're doing this tutorial locally, these instructions are sufficient. Otherwise you may need to +push your Docker image to Dockerhub. diff --git a/docs/connector-development/tutorials/profile-java-connector-memory.md b/docs/connector-development/tutorials/profile-java-connector-memory.md index e18eb9f21bd1..608e234f6b68 100644 --- a/docs/connector-development/tutorials/profile-java-connector-memory.md +++ b/docs/connector-development/tutorials/profile-java-connector-memory.md @@ -1,97 +1,119 @@ # Profile Java Connector Memory Usage -This tutorial demos how to profile the memory usage of a Java connector with Visual VM. Such profiling can be useful when we want to debug memory leaks, or optimize the connector's memory footprint. +This tutorial demos how to profile the memory usage of a Java connector with Visual VM. Such +profiling can be useful when we want to debug memory leaks, or optimize the connector's memory +footprint. -The example focuses on docker deployment, because it is more straightforward. It is also possible to apply the same procedure to Kubernetes deployments. +The example focuses on docker deployment, because it is more straightforward. It is also possible to +apply the same procedure to Kubernetes deployments. ## Prerequisite + - [Docker](https://www.docker.com/products/personal) running locally. - [VisualVM](https://visualvm.github.io/) preinstalled. ## Step-by-Step -1. Enable JMX in `airbyte-integrations/connectors//build.gradle`, and expose it on port 6000. The port is chosen arbitrary, and can be port number that's available. - - `` examples: `source-mysql`, `source-github`, `destination-snowflake`. - - ```groovy - application { - mainClass = 'io.airbyte.integrations.' - applicationDefaultJvmArgs = [ - '-XX:+ExitOnOutOfMemoryError', - '-XX:MaxRAMPercentage=75.0', - - // add the following JVM arguments to enable JMX: - '-XX:NativeMemoryTracking=detail', - '-XX:+UsePerfData', - '-Djava.rmi.server.hostname=localhost', - '-Dcom.sun.management.jmxremote=true', - '-Dcom.sun.management.jmxremote.port=6000', - "-Dcom.sun.management.jmxremote.rmi.port=6000", - '-Dcom.sun.management.jmxremote.local.only=false', - '-Dcom.sun.management.jmxremote.authenticate=false', - '-Dcom.sun.management.jmxremote.ssl=false', - - // optionally, add a max heap size to limit the memory usage - '-Xmx2000m', - ] + +1. Enable JMX in `airbyte-integrations/connectors//build.gradle`, and expose it on + port 6000. The port is chosen arbitrary, and can be port number that's available. + + - `` examples: `source-mysql`, `source-github`, `destination-snowflake`. + + ```groovy + application { + mainClass = 'io.airbyte.integrations.' + applicationDefaultJvmArgs = [ + '-XX:+ExitOnOutOfMemoryError', + '-XX:MaxRAMPercentage=75.0', + + // add the following JVM arguments to enable JMX: + '-XX:NativeMemoryTracking=detail', + '-XX:+UsePerfData', + '-Djava.rmi.server.hostname=localhost', + '-Dcom.sun.management.jmxremote=true', + '-Dcom.sun.management.jmxremote.port=6000', + "-Dcom.sun.management.jmxremote.rmi.port=6000", + '-Dcom.sun.management.jmxremote.local.only=false', + '-Dcom.sun.management.jmxremote.authenticate=false', + '-Dcom.sun.management.jmxremote.ssl=false', + + // optionally, add a max heap size to limit the memory usage + '-Xmx2000m', + ] } ``` 2. Modify `airbyte-integrations/connectors//Dockerfile` to expose the JMX port. - ```dockerfile - // optionally install procps to enable the ps command in the connector container - RUN apt-get update && apt-get install -y procps && rm -rf /var/lib/apt/lists/* + ```dockerfile + // optionally install procps to enable the ps command in the connector container + RUN apt-get update && apt-get install -y procps && rm -rf /var/lib/apt/lists/* - // expose the same JMX port specified in the previous step - EXPOSE 6000 - ``` + // expose the same JMX port specified in the previous step + EXPOSE 6000 + ``` -3. Expose the same port in `airbyte-workers/src/main/java/io/airbyte/workers/process/DockerProcessFactory.java`. +3. Expose the same port in + `airbyte-workers/src/main/java/io/airbyte/workers/process/DockerProcessFactory.java`. - ```java - // map local 6000 to the JMX port from the container - if (imageName.startsWith("airbyte/")) { - LOGGER.info("Exposing image {} port 6000", imageName); - cmd.add("-p"); - cmd.add("6000:6000"); - } - ``` + ```java + // map local 6000 to the JMX port from the container + if (imageName.startsWith("airbyte/")) { + LOGGER.info("Exposing image {} port 6000", imageName); + cmd.add("-p"); + cmd.add("6000:6000"); + } + ``` - Disable the [`host` network mode](https://docs.docker.com/network/host/) by _removing_ the following code block in the same file. This is necessary because under the `host` network mode, published ports are discarded. + Disable the [`host` network mode](https://docs.docker.com/network/host/) by _removing_ the + following code block in the same file. This is necessary because under the `host` network mode, + published ports are discarded. - ```java - if (networkName != null) { - cmd.add("--network"); - cmd.add(networkName); - } - ``` + ```java + if (networkName != null) { + cmd.add("--network"); + cmd.add(networkName); + } + ``` - (This [commit](https://github.com/airbytehq/airbyte/pull/10394/commits/097ec57869a64027f5b7858aa8bb9575844e8b76) can be used as a reference. It reverts them. So just do the opposite.) + (This + [commit](https://github.com/airbytehq/airbyte/pull/10394/commits/097ec57869a64027f5b7858aa8bb9575844e8b76) + can be used as a reference. It reverts them. So just do the opposite.) -4. Build and launch Airbyte locally. It is necessary to build it because we have modified the `DockerProcessFactory.java`. +4. Build and launch Airbyte locally. It is necessary to build it because we have modified the + `DockerProcessFactory.java`. - ```sh - SUB_BUILD=PLATFORM ./gradlew build -x test - VERSION=dev docker compose up - ``` + ```sh + SUB_BUILD=PLATFORM ./gradlew build -x test + VERSION=dev docker compose up + ``` -5. Build the connector to be profiled locally. It will create a `dev` version local image: `airbyte/:dev`. +5. Build the connector to be profiled locally. It will create a `dev` version local image: + `airbyte/:dev`. - ```sh - ./gradlew :airbyte-integrations:connectors::airbyteDocker - ``` + ```sh + ./gradlew :airbyte-integrations:connectors::airbyteDocker + ``` -6. Connect to the launched local Airbyte server at `localhost:8000`, go to the `Settings` page, and change the version of the connector to be profiled to `dev` which was just built in the previous step. +6. Connect to the launched local Airbyte server at `localhost:8000`, go to the `Settings` page, and + change the version of the connector to be profiled to `dev` which was just built in the previous + step. 7. Create a connection using the connector to be profiled. - - The `Replication frequency` of this connector should be `manual` so that we can control when it starts. - - We can use the e2e test connectors as either the source or destination for convenience. - - The e2e test connectors are usually very reliable, and requires little configuration. - - For example, if we are profiling a source connector, create an e2e test destination at the other end of the connection. + + - The `Replication frequency` of this connector should be `manual` so that we can control when it + starts. + - We can use the e2e test connectors as either the source or destination for convenience. + - The e2e test connectors are usually very reliable, and requires little configuration. + - For example, if we are profiling a source connector, create an e2e test destination at the + other end of the connection. 8. Profile the connector in question. - - Launch a data sync run. - - After the run starts, open Visual VM, and click `File` / `Add JMX Connection...`. A modal will show up. Type in `localhost:6000`, and click `OK`. - - Now we can see a new connection shows up under the `Local` category on the left, and the information about the connector's JVM gets retrieved. - ![visual vm screenshot](https://visualvm.github.io/images/visualvm_screenshot_20.png) + - Launch a data sync run. + - After the run starts, open Visual VM, and click `File` / `Add JMX Connection...`. A modal will + show up. Type in `localhost:6000`, and click `OK`. + - Now we can see a new connection shows up under the `Local` category on the left, and the + information about the connector's JVM gets retrieved. + + ![visual vm screenshot](https://visualvm.github.io/images/visualvm_screenshot_20.png) diff --git a/docs/connector-development/tutorials/adding-incremental-sync.md b/docs/connector-development/tutorials/the-hard-way/adding-incremental-sync.md similarity index 78% rename from docs/connector-development/tutorials/adding-incremental-sync.md rename to docs/connector-development/tutorials/the-hard-way/adding-incremental-sync.md index 8a454049a7dd..f3d3be401ed6 100644 --- a/docs/connector-development/tutorials/adding-incremental-sync.md +++ b/docs/connector-development/tutorials/the-hard-way/adding-incremental-sync.md @@ -2,13 +2,26 @@ ## Overview -This tutorial will assume that you already have a working source. If you do not, feel free to refer to the [Building a Toy Connector](build-a-connector-the-hard-way.md) tutorial. This tutorial will build directly off the example from that article. We will also assume that you have a basic understanding of how Airbyte's Incremental-Append replication strategy works. We have a brief explanation of it [here](/using-airbyte/core-concepts/sync-modes/incremental-append.md). +This tutorial will assume that you already have a working source. If you do not, feel free to refer +to the [Building a Toy Connector](build-a-connector-the-hard-way.md) tutorial. This tutorial will +build directly off the example from that article. We will also assume that you have a basic +understanding of how Airbyte's Incremental-Append replication strategy works. We have a brief +explanation of it [here](../../../using-airbyte/core-concepts/sync-modes/incremental-append.md). ## Update Catalog in `discover` -First we need to identify a given stream in the Source as supporting incremental. This information is declared in the catalog that the `discover` method returns. You will notice in the stream object contains a field called `supported_sync_modes`. If we are adding incremental to an existing stream, we just need to add `"incremental"` to that array. This tells Airbyte that this stream can either be synced in an incremental fashion. In practice, this will mean that in the UI, a user will have the ability to configure this type of sync. +First we need to identify a given stream in the Source as supporting incremental. This information +is declared in the catalog that the `discover` method returns. You will notice in the stream object +contains a field called `supported_sync_modes`. If we are adding incremental to an existing stream, +we just need to add `"incremental"` to that array. This tells Airbyte that this stream can either be +synced in an incremental fashion. In practice, this will mean that in the UI, a user will have the +ability to configure this type of sync. -In the example we used in the Toy Connector tutorial, the `discover` method would not look like this. Note: that "incremental" has been added to the `supported_sync_modes` array. We also set `source_defined_cursor` to `True` and `default_cursor_field` to `["date"]` to declare that the Source knows what field to use for the cursor, in this case the date field, and does not require user input. Nothing else has changed. +In the example we used in the Toy Connector tutorial, the `discover` method would not look like +this. Note: that "incremental" has been added to the `supported_sync_modes` array. We also set +`source_defined_cursor` to `True` and `default_cursor_field` to `["date"]` to declare that the +Source knows what field to use for the cursor, in this case the date field, and does not require +user input. Nothing else has changed. ```python def discover(): @@ -38,6 +51,7 @@ def discover(): ``` Also, create a file called `incremental_configured_catalog.json` with the following content: + ```javascript { "streams": [ @@ -73,7 +87,11 @@ Also, create a file called `incremental_configured_catalog.json` with the follow Next we will adapt the `read` method that we wrote previously. We need to change three things. -First, we need to pass it information about what data was replicated in the previous sync. In Airbyte this is called a `state` object. The structure of the state object is determined by the Source. This means that each Source can construct a state object that makes sense to it and does not need to worry about adhering to any other convention. That being said, a pretty typical structure for a state object is a map of stream name to the last value in the cursor field for that stream. +First, we need to pass it information about what data was replicated in the previous sync. In +Airbyte this is called a `state` object. The structure of the state object is determined by the +Source. This means that each Source can construct a state object that makes sense to it and does not +need to worry about adhering to any other convention. That being said, a pretty typical structure +for a state object is a map of stream name to the last value in the cursor field for that stream. In this case we might choose something like this: @@ -85,9 +103,11 @@ In this case we might choose something like this: } ``` -The second change we need to make to the `read` method is to use the state object so that we only emit new records. +The second change we need to make to the `read` method is to use the state object so that we only +emit new records. -Lastly, we need to emit an updated state object, so that the next time this Source runs we do not resend messages that we have already sent. +Lastly, we need to emit an updated state object, so that the next time this Source runs we do not +resend messages that we have already sent. Here's what our updated `read` method would look like. @@ -150,12 +170,14 @@ def read(config, catalog, state): ``` That code requires to add a new library import in the `source.py` file: + ```python from datetime import timezone ``` -We will also need to parse `state` argument in the `run` method. In order to do that, we will modify the code that -calls `read` method from `run` method: +We will also need to parse `state` argument in the `run` method. In order to do that, we will modify +the code that calls `read` method from `run` method: + ```python elif command == "read": config = read_json(get_input_file_path(parsed_args.config)) @@ -166,19 +188,25 @@ calls `read` method from `run` method: read(config, configured_catalog, state) ``` -Finally, we need to pass more arguments to our `_call_api` method in order to fetch only new prices for incremental sync: + +Finally, we need to pass more arguments to our `_call_api` method in order to fetch only new prices +for incremental sync: + ```python def _call_api(ticker, token, from_day, to_day): return requests.get(f"https://api.polygon.io/v2/aggs/ticker/{ticker}/range/1/day/{from_day}/{to_day}?sort=asc&limit=120&apiKey={token}") ``` -You will notice that in order to test these changes you need a `state` object. If you run an incremental sync -without passing a state object, the new code will output a state object that you can use with the next sync. If you run this: +You will notice that in order to test these changes you need a `state` object. If you run an +incremental sync without passing a state object, the new code will output a state object that you +can use with the next sync. If you run this: + ```bash python source.py read --config secrets/valid_config.json --catalog incremental_configured_catalog.json ``` The output will look like following: + ```bash {"type": "RECORD", "record": {"stream": "stock_prices", "data": {"date": "2022-03-07", "stock_ticker": "TSLA", "price": 804.58}, "emitted_at": 1647294277000}} {"type": "RECORD", "record": {"stream": "stock_prices", "data": {"date": "2022-03-08", "stock_ticker": "TSLA", "price": 824.4}, "emitted_at": 1647294277000}} @@ -189,25 +217,30 @@ The output will look like following: ``` Notice that the last line of output is the state object. Copy the state object: + ```json -{"stock_prices": {"date": "2022-03-11"}} +{ "stock_prices": { "date": "2022-03-11" } } ``` + and paste it into a new file (i.e. `state.json`). Now you can run an incremental sync: + ```bash -python source.py read --config secrets/valid_config.json --catalog incremental_configured_catalog.json --state state.json +python source.py read --config secrets/valid_config.json --catalog incremental_configured_catalog.json --state state.json ``` ## Run the incremental tests -The [Source Acceptance Test (SAT) suite](https://docs.airbyte.com/connector-development/testing-connectors/connector-acceptance-tests-reference) also includes test cases to ensure that incremental mode is working correctly. +The +[Connector Acceptance Test (CAT) suite](../../testing-connectors/connector-acceptance-tests-reference) +also includes test cases to ensure that incremental mode is working correctly. To enable these tests, modify the existing `acceptance-test-config.yml` by adding the following: ```yaml - incremental: - - config_path: "secrets/valid_config.json" - configured_catalog_path: "incremental_configured_catalog.json" - future_state_path: "abnormal_state.json" +incremental: + - config_path: "secrets/valid_config.json" + configured_catalog_path: "incremental_configured_catalog.json" + future_state_path: "abnormal_state.json" ``` Your full `acceptance-test-config.yml` should look something like this: @@ -240,13 +273,16 @@ tests: future_state_path: "abnormal_state.json" ``` -You will also need to create an `abnormal_state.json` file with a date in the future, which should not produce any records: +You will also need to create an `abnormal_state.json` file with a date in the future, which should +not produce any records: -``` +```javascript {"stock_prices": {"date": "2121-01-01"}} ``` -And lastly you need to modify the `check` function call to include the new parameters `from_day` and `to_day` in `source.py`: +And lastly you need to modify the `check` function call to include the new parameters `from_day` and +`to_day` in `source.py`: + ```python def check(config): # Validate input configuration by attempting to get the daily closing prices of the input stock ticker @@ -272,8 +308,8 @@ Run the tests once again: And finally, you should see a successful test summary: ``` -collecting ... - test_core.py ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓ 86% ████████▋ +collecting ... + test_core.py ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓ 86% ████████▋ test_full_refresh.py ✓ 91% █████████▏ test_incremental.py ✓✓ 100% ██████████ @@ -285,14 +321,15 @@ Results (8.90s): That's all you need to do to add incremental functionality to the stock ticker Source. You can deploy the new version of your connector simply by running: + ```bash ./gradlew clean :airbyte-integrations:connectors:source-stock-ticker-api:build ``` Bonus points: go to Airbyte UI and reconfigure the connection to use incremental sync. -Incremental definitely requires more configurability than full refresh, so your implementation may deviate slightly depending on whether your cursor -field is source defined or user-defined. If you think you are running into one of those cases, check out -our [incremental](/using-airbyte/core-concepts/sync-modes/incremental-append.md) documentation for more information on different types of -configuration. - +Incremental definitely requires more configurability than full refresh, so your implementation may +deviate slightly depending on whether your cursor field is source defined or user-defined. If you +think you are running into one of those cases, check out our +[incremental](/using-airbyte/core-concepts/sync-modes/incremental-append.md) documentation for more +information on different types of configuration. diff --git a/docs/connector-development/tutorials/build-a-connector-the-hard-way.md b/docs/connector-development/tutorials/the-hard-way/build-a-connector-the-hard-way.md similarity index 76% rename from docs/connector-development/tutorials/build-a-connector-the-hard-way.md rename to docs/connector-development/tutorials/the-hard-way/build-a-connector-the-hard-way.md index 5f9edd2d0d58..e1713854eb4f 100644 --- a/docs/connector-development/tutorials/build-a-connector-the-hard-way.md +++ b/docs/connector-development/tutorials/the-hard-way/build-a-connector-the-hard-way.md @@ -1,38 +1,48 @@ --- -description: Building a source connector without using any helpers to learn the Airbyte Specification for sources +description: + Building a source connector without using any helpers to learn the Airbyte Specification for + sources --- # Building a Source Connector: The Hard Way -This tutorial walks you through building a simple Airbyte source without using any helpers to demonstrate the following concepts in action: +This tutorial walks you through building a simple Airbyte source without using any helpers to +demonstrate the following concepts in action: -- [The Airbyte Specification](../../understanding-airbyte/airbyte-protocol.md) and the interface implemented by a source connector -- [The AirbyteCatalog](../../understanding-airbyte/beginners-guide-to-catalog.md) +- [The Airbyte Specification](../../../understanding-airbyte/airbyte-protocol.md) and the interface + implemented by a source connector +- [The AirbyteCatalog](../../../understanding-airbyte/beginners-guide-to-catalog.md) - [Packaging your connector](https://docs.airbyte.com/connector-development#1.-implement-and-package-the-connector) -- [Testing your connector](../testing-connectors/connector-acceptance-tests-reference.md) +- [Testing your connector](../../testing-connectors/connector-acceptance-tests-reference.md) :::warning -**This tutorial is meant for those interested in learning how the Airbyte Specification works in detail, -not for creating production connectors**. -If you're building a real source, you should start with using the [Connector Builder](../connector-builder-ui/overview), or -the [Connector Development Kit](https://github.com/airbytehq/airbyte/tree/master/airbyte-cdk/python/docs/tutorials). + +**This tutorial is meant for those interested in learning how the Airbyte Specification +works in detail, not for creating production connectors**. If you're building a real source, you +should start with using the [Connector Builder](../../connector-builder-ui/overview), or the +[Connector Development Kit](https://github.com/airbytehq/airbyte/tree/master/airbyte-cdk/python/docs/tutorials). + ::: ## Requirements To run this tutorial, you'll need: -- Docker, Python, and Java with the versions listed in the [tech stack section](../../understanding-airbyte/tech-stack.md). -- The `requests` Python package installed via `pip install requests` \(or `pip3` if `pip` is linked to a Python2 installation on your system\) +- Docker, Python, and Java with the versions listed in the + [tech stack section](../../../understanding-airbyte/tech-stack.md). +- The `requests` Python package installed via `pip install requests` \(or `pip3` if `pip` is linked + to a Python2 installation on your system\) ## Our connector: a stock ticker API -The connector will output the daily price of a stock since a given date. -We'll leverage [Polygon.io API](https://polygon.io/) for this. +The connector will output the daily price of a stock since a given date. We'll leverage +[Polygon.io API](https://polygon.io/) for this. :::info -We'll use Python to implement the connector, but you could build an Airbyte -connector in any language. + +We'll use Python to implement the connector, but you could build an Airbyte connector in any +language. + ::: Here's the outline of what we'll do to build the connector: @@ -40,7 +50,8 @@ Here's the outline of what we'll do to build the connector: 1. Use the Airbyte connector template to bootstrap the connector package 2. Implement the methods required by the Airbyte Specification for our connector: 1. `spec`: declares the user-provided credentials or configuration needed to run the connector - 2. `check`: tests if the connector can connect with the underlying data source with the user-provided configuration + 2. `check`: tests if the connector can connect with the underlying data source with the + user-provided configuration 3. `discover`: declares the different streams of data that this connector can output 4. `read`: reads data from the underlying data source \(The stock ticker API\) 3. Package the connector in a Docker image @@ -49,10 +60,10 @@ Here's the outline of what we'll do to build the connector: [Part 2 of this article](adding-incremental-sync.md) covers: -- Support [incremental sync](../../using-airbyte/core-concepts/sync-modes/incremental-append.md) +- Support [incremental sync](../../../using-airbyte/core-concepts/sync-modes/incremental-append.md) - Add custom integration tests -Let's get started! +Let's get started! --- @@ -65,7 +76,8 @@ $ pwd /Users/sherifnada/code/airbyte ``` -Airbyte provides a code generator which bootstraps the scaffolding for our connector. Let's use it by running: +Airbyte provides a code generator which bootstraps the scaffolding for our connector. Let's use it +by running: ```bash $ cd airbyte-integrations/connector-templates/generator @@ -74,14 +86,15 @@ $ ./generate.sh Select the `Generic Source` template and call the connector `stock-ticker-api`: -![](../../.gitbook/assets/newsourcetutorial_plop.gif) +![](../../../.gitbook/assets/newsourcetutorial_plop.gif) :::info -This tutorial uses the bare-bones `Generic Source` template to illustrate how all the pieces of a connector -work together. For real connectors, the generator provides `Python` and `Python HTTP API` source templates, they use -[Airbyte CDK](../cdk-python/README.md). -::: +This tutorial uses the bare-bones `Generic Source` template to illustrate how all the pieces +of a connector work together. For real connectors, the generator provides `Python` and +`Python HTTP API` source templates, they use [Airbyte CDK](../../cdk-python/README.md). + +::: ```bash $ cd ../../connectors/source-stock-ticker-api @@ -91,7 +104,8 @@ Dockerfile README.md acceptance-test-config.yml ### 2. Implement the connector in line with the Airbyte Specification -In the connector package directory, create a single Python file `source.py` that will hold our implementation: +In the connector package directory, create a single Python file `source.py` that will hold our +implementation: ```bash touch source.py @@ -99,20 +113,27 @@ touch source.py #### Implement the spec operation -The `spec` operation is described in the [Airbyte Protocol](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol/#spec). -It's a way for the connector to tell Airbyte what user inputs it needs in order to connecto to the source (the stock -ticker API in our case). Airbyte expects the command to output a connector specification in `AirbyteMessage` format. +The `spec` operation is described in the +[Airbyte Protocol](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol/#spec). It's a +way for the connector to tell Airbyte what user inputs it needs in order to connecto to the source +(the stock ticker API in our case). Airbyte expects the command to output a connector specification +in `AirbyteMessage` format. To contact the stock ticker API, we need two things: 1. Which stock ticker we're interested in -2. The API key to use when contacting the API \(you can obtain a free API token from [Polygon.io](https://polygon.io/dashboard/signup) free plan\) +2. The API key to use when contacting the API \(you can obtain a free API token from + [Polygon.io](https://polygon.io/dashboard/signup) free plan\) + +:::info + +For reference, the API docs we'll be using +[can be found here](https://polygon.io/docs/stocks/get_v2_aggs_ticker__stocksticker__range__multiplier___timespan___from___to). -:::info -For reference, the API docs we'll be using [can be found here](https://polygon.io/docs/stocks/get_v2_aggs_ticker__stocksticker__range__multiplier___timespan___from___to). ::: -Let's create a [JSONSchema](http://json-schema.org/) file `spec.json` encoding these two requirements: +Let's create a [JSONSchema](http://json-schema.org/) file `spec.json` encoding these two +requirements: ```javascript { @@ -139,11 +160,15 @@ Let's create a [JSONSchema](http://json-schema.org/) file `spec.json` encoding t } ``` -- `documentationUrl` is the URL that will appear in the UI for the user to gain more info about this connector. Typically this points to `docs.airbyte.com/integrations/sources/source-` but to keep things simple we won't show adding documentation -- `title` is the "human readable" title displayed in the UI. Without this field, The Stock Ticker field will have the title `stock_ticker` in the UI +- `documentationUrl` is the URL that will appear in the UI for the user to gain more info about this + connector. Typically this points to + `docs.airbyte.com/integrations/sources/source-` but to keep things simple we won't + show adding documentation +- `title` is the "human readable" title displayed in the UI. Without this field, The Stock Ticker + field will have the title `stock_ticker` in the UI - `description` will be shown in the Airbyte UI under each field to help the user understand it -- `airbyte_secret` used by Airbyte to determine if the field should be displayed as a password \(e.g: `********`\) in the UI and not readable from the API - +- `airbyte_secret` used by Airbyte to determine if the field should be displayed as a password + \(e.g: `********`\) in the UI and not readable from the API ```bash $ ls -1 @@ -155,7 +180,8 @@ metadata.yaml spec.json ``` -Now, let's edit `source.py` to detect if the program was invoked with the `spec` argument and if so, output the connector specification: +Now, let's edit `source.py` to detect if the program was invoked with the `spec` argument and if so, +output the connector specification: ```python # source.py @@ -228,10 +254,13 @@ if __name__ == "__main__": Some notes on the above code: -1. As described in the [specification](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol/#key-takeaways), - Airbyte connectors are CLIs which communicate via stdout, so the output of the command is simply a JSON string - formatted according to the Airbyte Specification. So to "return" a value we use `print` to output the return value to stdout. -2. All Airbyte commands can output log messages that take the form `{"type":"LOG", "log":"message"}`, so we create a helper method `log(message)` to allow logging. +1. As described in the + [specification](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol/#key-takeaways), + Airbyte connectors are CLIs which communicate via stdout, so the output of the command is simply + a JSON string formatted according to the Airbyte Specification. So to "return" a value we use + `print` to output the return value to stdout. +2. All Airbyte commands can output log messages that take the form + `{"type":"LOG", "log":"message"}`, so we create a helper method `log(message)` to allow logging. 3. All Airbyte commands can output error messages that take the form `{"type":"TRACE", "trace": {"type": "ERROR", "emitted_at": current_time_in_ms, "error": {"message": error_message}}}}`, so we create a helper method `log_error(message)` to allow error messages. @@ -245,17 +274,21 @@ python source.py spec #### Implementing check connection -The second command to implement is the [check operation](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol/#check) `check --config `, -which tells the user whether a config file they gave us is correct. In our case, "correct" means they input a valid -stock ticker and a correct API key like we declare via the `spec` operation. +The second command to implement is the +[check operation](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol/#check) +`check --config `, which tells the user whether a config file they gave us is correct. +In our case, "correct" means they input a valid stock ticker and a correct API key like we declare +via the `spec` operation. To achieve this, we'll: -1. Create valid and invalid configuration files to test the success and failure cases with our connector. - We'll place config files in the `secrets/` directory which is gitignored everywhere in the Airbyte monorepo by - default to avoid accidentally checking in API keys. -2. Add a `check` method which calls the Polygon.io API to verify if the provided token & stock ticker are correct and output the correct airbyte message. -3. Extend the argument parser to recognize the `check --config ` command and call the `check` method when the `check` command is invoked. +1. Create valid and invalid configuration files to test the success and failure cases with our + connector. We'll place config files in the `secrets/` directory which is gitignored everywhere in + the Airbyte monorepo by default to avoid accidentally checking in API keys. +2. Add a `check` method which calls the Polygon.io API to verify if the provided token & stock + ticker are correct and output the correct airbyte message. +3. Extend the argument parser to recognize the `check --config ` command and call the + `check` method when the `check` command is invoked. Let's first add the configuration files: @@ -265,7 +298,8 @@ $ echo '{"api_key": "put_your_key_here", "stock_ticker": "TSLA"}' > secrets/vali $ echo '{"api_key": "not_a_real_key", "stock_ticker": "TSLA"}' > secrets/invalid_config.json ``` -Make sure to add your actual API key instead of the placeholder value `` when following the tutorial. +Make sure to add your actual API key instead of the placeholder value `` when +following the tutorial. Then we'll add the `check` method: @@ -297,8 +331,8 @@ def check(config): print(json.dumps(output_message)) ``` -In Airbyte, the contract for input files is that they will be available in the current working directory if they are not provided as an absolute path. -This method helps us achieve that: +In Airbyte, the contract for input files is that they will be available in the current working +directory if they are not provided as an absolute path. This method helps us achieve that: ```python def get_input_file_path(path): @@ -352,19 +386,30 @@ $ python source.py check --config secrets/invalid_config.json {'type': 'CONNECTION_STATUS', 'connectionStatus': {'status': 'FAILED', 'message': 'API Key is incorrect.'}} ``` -Our connector is able to detect valid and invalid configs correctly. Two methods down, two more to go! +Our connector is able to detect valid and invalid configs correctly. Two methods down, two more to +go! #### Implementing Discover -The `discover` command outputs a Catalog, a struct that declares the Streams and Fields \(Airbyte's equivalents of tables and columns\) output by the connector. It also includes metadata around which features a connector supports \(e.g. which sync modes\). In other words it describes what data is available in the source. If you'd like to read a bit more about this concept check out our [Beginner's Guide to the Airbyte Catalog](../../understanding-airbyte/beginners-guide-to-catalog.md) or for a more detailed treatment read the [Airbyte Specification](../../understanding-airbyte/airbyte-protocol.md). +The `discover` command outputs a Catalog, a struct that declares the Streams and Fields \(Airbyte's +equivalents of tables and columns\) output by the connector. It also includes metadata around which +features a connector supports \(e.g. which sync modes\). In other words it describes what data is +available in the source. If you'd like to read a bit more about this concept check out our +[Beginner's Guide to the Airbyte Catalog](../../../understanding-airbyte/beginners-guide-to-catalog.md) +or for a more detailed treatment read the +[Airbyte Specification](../../../understanding-airbyte/airbyte-protocol.md). -The stock ticker connector outputs records belonging to exactly one Stream \(table\). -Each record contains three Fields \(columns\): `date`, `price`, and `stock_ticker`, corresponding to the price of a stock on a given day. +The stock ticker connector outputs records belonging to exactly one Stream \(table\). Each record +contains three Fields \(columns\): `date`, `price`, and `stock_ticker`, corresponding to the price +of a stock on a given day. To implement `discover`, we'll: -1. Add a method `discover` in `source.py` which outputs the Catalog. To better understand what a catalog is, check out our [Beginner's Guide to the AirbyteCatalog](../../understanding-airbyte/beginners-guide-to-catalog.md) -2. Extend the arguments parser to use detect the `discover --config ` command and call the `discover` method +1. Add a method `discover` in `source.py` which outputs the Catalog. To better understand what a + catalog is, check out our + [Beginner's Guide to the AirbyteCatalog](../../../understanding-airbyte/beginners-guide-to-catalog.md) +2. Extend the arguments parser to use detect the `discover --config ` command and call + the `discover` method Let's implement `discover` by adding the following in `source.py`: @@ -416,8 +461,15 @@ We need to update our list of available commands: ```python log("Invalid command. Allowable commands: [spec, check, discover]") ``` + :::info -You may be wondering why `config` is a required input to `discover` if it's not used. This is done for consistency: the Airbyte Specification requires `--config` as an input to `discover` because many sources require it \(e.g: to discover the tables available in a Postgres database, you must supply a password\). So instead of guessing whether the flag is required depending on the connector, we always assume it is required, and the connector can choose whether to use it. + +You may be wondering why `config` is a required input to `discover` if it's not used. This +is done for consistency: the Airbyte Specification requires `--config` as an input to `discover` +because many sources require it \(e.g: to discover the tables available in a Postgres database, you +must supply a password\). So instead of guessing whether the flag is required depending on the +connector, we always assume it is required, and the connector can choose whether to use it. + ::: The full run method is now below: @@ -473,27 +525,43 @@ With that, we're done implementing the `discover` command. #### Implementing the read operation -We've done a lot so far, but a connector ultimately exists to read data! This is where the [`read` command](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol/#read) comes in. The format of the command is: +We've done a lot so far, but a connector ultimately exists to read data! This is where the +[`read` command](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol/#read) comes in. +The format of the command is: ```bash python source.py read --config --catalog [--state ] ``` -Each of these are described in the Airbyte Specification in detail, but we'll give a quick description of the two options we haven't seen so far: - -- `--catalog` points to a Configured Catalog. The Configured Catalog contains the contents for the Catalog \(remember the Catalog we output from discover?\). It also contains some configuration information that describes how the data will by replicated. For example, we had `supported_sync_modes` in the Catalog. In the Configured Catalog, we select which of the `supported_sync_modes` we want to use by specifying the `sync_mode` field. \(This is the most complicated concept when working Airbyte, so if it is still not making sense that's okay for now. If you're just dying to understand how the Configured Catalog works checkout the [Beginner's Guide to the Airbyte Catalog](../../understanding-airbyte/beginners-guide-to-catalog.md)\). -- `--state` points to a state file. The state file is only relevant when some Streams are synced with the sync mode `incremental`, so we'll cover the state file in more detail in the incremental section below. - -Our connector only supports one Stream, `stock_prices`, so we'd expect the input catalog to contain that stream configured to sync in full refresh. -Since our connector doesn't support incremental sync yet, we'll ignore the state option for now. +Each of these are described in the Airbyte Specification in detail, but we'll give a quick +description of the two options we haven't seen so far: + +- `--catalog` points to a Configured Catalog. The Configured Catalog contains the contents for the + Catalog \(remember the Catalog we output from discover?\). It also contains some configuration + information that describes how the data will by replicated. For example, we had + `supported_sync_modes` in the Catalog. In the Configured Catalog, we select which of the + `supported_sync_modes` we want to use by specifying the `sync_mode` field. \(This is the most + complicated concept when working Airbyte, so if it is still not making sense that's okay for now. + If you're just dying to understand how the Configured Catalog works checkout the + [Beginner's Guide to the Airbyte Catalog](../../../understanding-airbyte/beginners-guide-to-catalog.md)\). +- `--state` points to a state file. The state file is only relevant when some Streams are synced + with the sync mode `incremental`, so we'll cover the state file in more detail in the incremental + section below. + +Our connector only supports one Stream, `stock_prices`, so we'd expect the input catalog to contain +that stream configured to sync in full refresh. Since our connector doesn't support incremental sync +yet, we'll ignore the state option for now. To read data in our connector, we'll: -1. Create a configured catalog which tells our connector that we want to sync the `stock_prices` stream -2. Implement a method `read` in `source.py`. For now we'll always read the last 7 days of a stock price's data +1. Create a configured catalog which tells our connector that we want to sync the `stock_prices` + stream +2. Implement a method `read` in `source.py`. For now we'll always read the last 7 days of a stock + price's data 3. Extend the arguments parser to recognize the `read` command and its arguments -First, let's create a configured catalog `fullrefresh_configured_catalog.json` to use as test input for the read operation: +First, let's create a configured catalog `fullrefresh_configured_catalog.json` to use as test input +for the read operation: ```javascript { @@ -573,7 +641,9 @@ def read(config, catalog): print(json.dumps(output_message)) ``` -After doing some input validation, the code above calls the API to obtain daily prices for the input stock ticker, then outputs the prices. As always, our output is formatted according to the Airbyte Specification. Let's update our args parser with the following blocks: +After doing some input validation, the code above calls the API to obtain daily prices for the input +stock ticker, then outputs the prices. As always, our output is formatted according to the Airbyte +Specification. Let's update our args parser with the following blocks: ```python # Accept the read command @@ -667,7 +737,8 @@ $ python source.py read --config secrets/valid_config.json --catalog fullrefresh {'type': 'RECORD', 'record': {'stream': 'stock_prices', 'data': {'date': '2020-12-21', 'stock_ticker': 'TSLA', 'price': 649.86}, 'emitted_at': 1608626365000}} ``` -With this method, we now have a fully functioning connector! Let's pat ourselves on the back for getting there. +With this method, we now have a fully functioning connector! Let's pat ourselves on the back for +getting there. For reference, the full `source.py` file now looks like this: @@ -868,13 +939,15 @@ if __name__ == "__main__": main() ``` -A full connector in about 200 lines of code. Not bad! We're now ready to package & test our connector then use it in the Airbyte UI. +A full connector in about 200 lines of code. Not bad! We're now ready to package & test our +connector then use it in the Airbyte UI. --- ### 3. Package the connector in a Docker image -Our connector is very lightweight, so the Dockerfile needed to run it is very light as well. Edit the `Dockerfile` as follows: +Our connector is very lightweight, so the Dockerfile needed to run it is very light as well. Edit +the `Dockerfile` as follows: ```Dockerfile FROM python:3.9-slim @@ -905,8 +978,10 @@ Once we save the `Dockerfile`, we can build the image by running: docker build . -t airbyte/source-stock-ticker-api:dev ``` -To run any of our commands, we'll need to mount all the inputs into the Docker container first, then refer to their _mounted_ paths when invoking the connector. -This allows the connector to access your secrets without having to build them into the container. For example, we'd run `check` or `read` as follows: +To run any of our commands, we'll need to mount all the inputs into the Docker container first, then +refer to their _mounted_ paths when invoking the connector. This allows the connector to access your +secrets without having to build them into the container. For example, we'd run `check` or `read` as +follows: ```bash $ docker run airbyte/source-stock-ticker-api:dev spec @@ -930,11 +1005,17 @@ $ docker run -v $(pwd)/secrets/valid_config.json:/data/config.json -v $(pwd)/ful ### 4. Test the connector -The minimum requirement for testing your connector is to pass the [Connector Acceptance Test](https://docs.airbyte.com/connector-development/testing-connectors/connector-acceptance-tests-reference) suite. The connector acceptence test is a blackbox test suite containing a number of tests that validate your connector behaves as intended by the Airbyte Specification. You're encouraged to add custom test cases for your connector where it makes sense to do so e.g: to test edge cases that are not covered by the standard suite. But at the very least, your connector must pass Airbyte's acceptance test suite. +The minimum requirement for testing your connector is to pass the +[Connector Acceptance Test](https://docs.airbyte.com/connector-development/testing-connectors/connector-acceptance-tests-reference) +suite. The connector acceptence test is a blackbox test suite containing a number of tests that +validate your connector behaves as intended by the Airbyte Specification. You're encouraged to add +custom test cases for your connector where it makes sense to do so e.g: to test edge cases that are +not covered by the standard suite. But at the very least, your connector must pass Airbyte's +acceptance test suite. -The code generator makes a minimal acceptance test configuration. Let's modify it as follows to setup -tests for each operation with valid and invalid credentials. Edit `acceptance-test-config.yaml` to look -as follows: +The code generator makes a minimal acceptance test configuration. Let's modify it as follows to +setup tests for each operation with valid and invalid credentials. Edit +`acceptance-test-config.yaml` to look as follows: ```yaml # See [Connector Acceptance Tests](https://docs.airbyte.com/connector-development/testing-connectors/connector-acceptance-tests-reference) @@ -969,8 +1050,11 @@ acceptance_tests: # configured_catalog_path: "integration_tests/configured_catalog.json" # future_state_path: "integration_tests/abnormal_state.json" ``` -To run the test suite, we'll use [`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md). -You can build and install `airbyte-ci` locally from Airbyte repository root by running `make`. Assuming you have it already: + +To run the test suite, we'll use +[`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md). +You can build and install `airbyte-ci` locally from Airbyte repository root by running `make`. +Assuming you have it already: ```shell airbyte-ci connectors --name= --use-remote-secrets=false test @@ -978,7 +1062,8 @@ airbyte-ci connectors --name= - `airbyte-ci` will build and then test your connector, and provide a report on the test results. -That's it! We've created a fully functioning connector. Now let's get to the exciting part: using it from the Airbyte UI. +That's it! We've created a fully functioning connector. Now let's get to the exciting part: using it +from the Airbyte UI. --- @@ -992,22 +1077,28 @@ Let's recap what we've achieved so far: To use it from the Airbyte UI, we need to: -1. Publish our connector's Docker image somewhere accessible by Airbyte Core \(Airbyte's server, scheduler, workers, and webapp infrastructure\) -2. Add the connector via the Airbyte UI and setup a connection from our new connector to a local CSV file for illustration purposes +1. Publish our connector's Docker image somewhere accessible by Airbyte Core \(Airbyte's server, + scheduler, workers, and webapp infrastructure\) +2. Add the connector via the Airbyte UI and setup a connection from our new connector to a local CSV + file for illustration purposes 3. Run a sync and inspect the output #### 1. Publish the Docker image -Since we're running this tutorial locally, Airbyte will have access to any Docker images available to your local `docker` daemon. So all we need to do is build & tag our connector. -For real production connectors to be available on Airbyte Cloud, you'd need to publish them on DockerHub. +Since we're running this tutorial locally, Airbyte will have access to any Docker images available +to your local `docker` daemon. So all we need to do is build & tag our connector. For real +production connectors to be available on Airbyte Cloud, you'd need to publish them on DockerHub. -Airbyte's build system builds and tags your connector's image correctly by default as part of the connector's standard `build` process. **From the Airbyte repo root**, run: +Airbyte's build system builds and tags your connector's image correctly by default as part of the +connector's standard `build` process. **From the Airbyte repo root**, run: ```bash ./gradlew clean :airbyte-integrations:connectors:source-stock-ticker-api:build ``` -This is the equivalent of running `docker build . -t airbyte/source-stock-ticker-api:dev` from the connector root, where the tag `airbyte/source-stock-ticker-api` is extracted from the label `LABEL io.airbyte.name` inside your `Dockerfile`. +This is the equivalent of running `docker build . -t airbyte/source-stock-ticker-api:dev` from the +connector root, where the tag `airbyte/source-stock-ticker-api` is extracted from the label +`LABEL io.airbyte.name` inside your `Dockerfile`. Verify the image was built by running: @@ -1020,17 +1111,20 @@ $ docker images | head 1caf57c72afd 3 hours ago 121MB ``` -`airbyte/source-stock-ticker-api` was built and tagged with the `dev` tag. Now let's head to the last step. +`airbyte/source-stock-ticker-api` was built and tagged with the `dev` tag. Now let's head to the +last step. #### 2. Add the connector via the Airbyte UI -If the Airbyte server isn't already running, start it by running **from the Airbyte repository root**: +If the Airbyte server isn't already running, start it by running **from the Airbyte repository +root**: ```bash docker compose up ``` -When Airbyte server is done starting up, it prints the following banner in the log output \(it can take 10-20 seconds for the server to start\): +When Airbyte server is done starting up, it prints the following banner in the log output \(it can +take 10-20 seconds for the server to start\): ```bash airbyte-server | 2022-03-11 18:38:33 INFO i.a.s.ServerApp(start):121 - @@ -1047,79 +1141,90 @@ airbyte-server | Version: dev airbyte-server | ``` -After you see the above banner printed out in the terminal window where you are running `docker compose up`, visit [http://localhost:8000](http://localhost:8000) in your browser and log in with the default credentials: username `airbyte` and password `password`. +After you see the above banner printed out in the terminal window where you are running +`docker compose up`, visit [http://localhost:8000](http://localhost:8000) in your browser and log in +with the default credentials: username `airbyte` and password `password`. -If this is the first time using the Airbyte UI, then you will be prompted to go through a first-time wizard. To skip it, click the "Skip Onboarding" button. +If this is the first time using the Airbyte UI, then you will be prompted to go through a first-time +wizard. To skip it, click the "Skip Onboarding" button. In the UI, click the "Settings" button in the left side bar: -![](../../.gitbook/assets/newsourcetutorial_sidebar_settings.png) +![](../../../.gitbook/assets/newsourcetutorial_sidebar_settings.png) Then on the Settings page, select Sources -![](../../.gitbook/assets/newsourcetutorial_settings_page.png) +![](../../../.gitbook/assets/newsourcetutorial_settings_page.png) Then on the Settings/Sources page, click "+ New Connector" button at the top right: -![](../../.gitbook/assets/newsourcetutorial_settings_sources_newconnector.png) +![](../../../.gitbook/assets/newsourcetutorial_settings_sources_newconnector.png) On the modal that pops up, enter the following information then click "Add" -![](../../.gitbook/assets/newsourcetutorial_new_connector_modal.png) +![](../../../.gitbook/assets/newsourcetutorial_new_connector_modal.png) -After you click "Add", the modal will close and you will be back at the Settings page. -Now click "Sources" in the navigation bar on the left: +After you click "Add", the modal will close and you will be back at the Settings page. Now click +"Sources" in the navigation bar on the left: -![](../../.gitbook/assets/newsourcetutorial_sources_navbar.png) +![](../../../.gitbook/assets/newsourcetutorial_sources_navbar.png) -You will be redirected to Sources page, which, if you have not set up any connections, will be empty. -On the Sources page click "+ new source" in the top right corner: +You will be redirected to Sources page, which, if you have not set up any connections, will be +empty. On the Sources page click "+ new source" in the top right corner: -![](../../.gitbook/assets/newsourcetutorial_sources_page.png) +![](../../../.gitbook/assets/newsourcetutorial_sources_page.png) A new modal will prompt you for details of the new source. Type "Stock Ticker" in the Name field. -Then, find your connector in the Source type dropdown. We have lots of connectors already, so it might be easier -to find your connector by typing part of its name: +Then, find your connector in the Source type dropdown. We have lots of connectors already, so it +might be easier to find your connector by typing part of its name: -![](../../.gitbook/assets/newsourcetutorial_find_your_connector.png) +![](../../../.gitbook/assets/newsourcetutorial_find_your_connector.png) -After you select your connector in the Source type dropdown, the modal will show two more fields: API Key and Stock Ticker. -Remember that `spec.json` file you created at the very beginning of this tutorial? These fields should correspond to the `properties` -section of that file. Copy-paste your Polygon.io API key and a stock ticker into these fields and then click "Set up source" -button at the bottom right of the modal. +After you select your connector in the Source type dropdown, the modal will show two more fields: +API Key and Stock Ticker. Remember that `spec.json` file you created at the very beginning of this +tutorial? These fields should correspond to the `properties` section of that file. Copy-paste your +Polygon.io API key and a stock ticker into these fields and then click "Set up source" button at the +bottom right of the modal. -![](../../.gitbook/assets/newsourcetutorial_source_config.png) +![](../../../.gitbook/assets/newsourcetutorial_source_config.png) -Once you click "Set up source", Airbyte will spin up your connector and run "check" method to verify the configuration. -You will see a progress bar briefly and if the configuration is valid, you will see a success message, -the modal will close and you will see your connector on the updated Sources page. +Once you click "Set up source", Airbyte will spin up your connector and run "check" method to verify +the configuration. You will see a progress bar briefly and if the configuration is valid, you will +see a success message, the modal will close and you will see your connector on the updated Sources +page. -![](../../.gitbook/assets/newsourcetutorial_sources_stock_ticker.png) +![](../../../.gitbook/assets/newsourcetutorial_sources_stock_ticker.png) -Next step is to add a destination. On the same page, click "add destination" and then click "+ add a new destination": +Next step is to add a destination. On the same page, click "add destination" and then click "+ add a +new destination": -![](../../.gitbook/assets/newsourcetutorial_add_destination_new_destination.png) +![](../../../.gitbook/assets/newsourcetutorial_add_destination_new_destination.png) -"New destination" wizard will show up. Type a name (e.g. "Local JSON") into the Name field and select "Local JSON" in Destination type drop-down. -After you select the destination type, type `/local/tutorial_json` into Destination path field. -When we run syncs, we'll find the output on our local filesystem in `/tmp/airbyte_local/tutorial_json`. +"New destination" wizard will show up. Type a name (e.g. "Local JSON") into the Name field and +select "Local JSON" in Destination type drop-down. After you select the destination type, type +`/local/tutorial_json` into Destination path field. When we run syncs, we'll find the output on our +local filesystem in `/tmp/airbyte_local/tutorial_json`. Click "Set up destination" at the lower right of the form. -![](../../.gitbook/assets/newsourcetutorial_add_destination.png) +![](../../../.gitbook/assets/newsourcetutorial_add_destination.png) -After that Airbyte will test the destination and prompt you to configure the connection between Stock Ticker source and Local JSON destination. -Select "Mirror source structure" in the Destination Namespace, check the checkbox next to the stock_prices stream, and click "Set up connection" button at the bottom of the form: +After that Airbyte will test the destination and prompt you to configure the connection between +Stock Ticker source and Local JSON destination. Select "Mirror source structure" in the Destination +Namespace, check the checkbox next to the stock_prices stream, and click "Set up connection" button +at the bottom of the form: -![](../../.gitbook/assets/newsourcetutorial_configure_connection.png) +![](../../../.gitbook/assets/newsourcetutorial_configure_connection.png) -Ta-da! Your connection is now configured to sync once a day. You will see your new connection on the next screen: +Ta-da! Your connection is now configured to sync once a day. You will see your new connection on the +next screen: -![](../../.gitbook/assets/newsourcetutorial_connection_done.png) +![](../../../.gitbook/assets/newsourcetutorial_connection_done.png) -Airbyte will run the first sync job as soon as your connection is saved. Navigate to "Connections" in the side bar and wait for the first sync to succeed: +Airbyte will run the first sync job as soon as your connection is saved. Navigate to "Connections" +in the side bar and wait for the first sync to succeed: -![](../../.gitbook/assets/newsourcetutorial_first_sync.png) +![](../../../.gitbook/assets/newsourcetutorial_first_sync.png) Let's verify the output. From your shell, run: @@ -1132,14 +1237,17 @@ $ cat /tmp/airbyte_local/tutorial_json/_airbyte_raw_stock_prices.jsonl {"_airbyte_ab_id":"0b7a8d33-4500-4a6d-9d74-11716bd22f01","_airbyte_emitted_at":1647026803000,"_airbyte_data":{"date":"2022-03-10","stock_ticker":"TSLA","price":838.3}} ``` -Congratulations! We've successfully written a fully functioning Airbyte connector. You're an Airbyte contributor now ;\) +Congratulations! We've successfully written a fully functioning Airbyte connector. You're an Airbyte +contributor now ;\) 1. Follow the [next tutorial](adding-incremental-sync.md) to implement incremental sync. -2. Implement another connector using the Low-code CDK, [Connector Builder](../connector-builder-ui/overview), or [Connector Development Kit](https://github.com/airbytehq/airbyte/tree/master/airbyte-cdk/python/docs/tutorials) -3. We welcome low-code configuration based connector contributions! If you make a connector in the connector builder - and want to share it with everyone using Airbyte, pull requests are welcome! +2. Implement another connector using the Low-code CDK, + [Connector Builder](../../connector-builder-ui/overview.md), or + [Connector Development Kit](https://github.com/airbytehq/airbyte/tree/master/airbyte-cdk/python/docs/tutorials) +3. We welcome low-code configuration based connector contributions! If you make a connector in the + connector builder and want to share it with everyone using Airbyte, pull requests are welcome! ## Additional guides -- [Building a Python Source](https://docs.airbyte.com/connector-development/tutorials/building-a-python-source) +- [Building a Python Source](https://docs.airbyte.com/connector-development/tutorials/building-a-python-source.md) - [Building a Java Destination](https://docs.airbyte.com/connector-development/tutorials/building-a-java-destination)