From 2da3a0b378d0e7b188984d2d6f8d2e02b3614ba1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Barroso?= Date: Thu, 1 Aug 2024 16:09:25 +0100 Subject: [PATCH] refactor: architecture documentation (#1732) * refactor: architecture reference * update release version * updates * uses markdownlint-cli2 * updates * fix * update references to conduit site docs --- .github/pull_request_template.md | 4 +- README.md | 17 +- .../20230123-pipeline-configuration-files.md | 2 +- docs/architecture.md | 107 ---------- docs/package_structure.md | 38 ++++ docs/pipeline_configuration_files.md | 100 ---------- docs/pipeline_semantics.md | 182 ------------------ pkg/web/openapi/README.md | 4 +- 8 files changed, 52 insertions(+), 402 deletions(-) delete mode 100644 docs/architecture.md create mode 100644 docs/package_structure.md delete mode 100644 docs/pipeline_configuration_files.md delete mode 100644 docs/pipeline_semantics.md diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md index 3ec9230c9..dbeacc353 100644 --- a/.github/pull_request_template.md +++ b/.github/pull_request_template.md @@ -6,9 +6,9 @@ List any dependencies that are required for this change. Fixes # (issue) -### Quick checks: +### Quick checks - [ ] I have followed the [Code Guidelines](https://github.com/ConduitIO/conduit/blob/main/docs/code_guidelines.md). - [ ] There is no other [pull request](https://github.com/ConduitIO/conduit/pulls) for the same update/change. - [ ] I have written unit tests. -- [ ] I have made sure that the PR is of reasonable size and can be easily reviewed. \ No newline at end of file +- [ ] I have made sure that the PR is of reasonable size and can be easily reviewed. diff --git a/README.md b/README.md index 002827b74..8db32b023 100644 --- a/README.md +++ b/README.md @@ -115,7 +115,7 @@ Download the right `.deb` file for your machine architecture from the [latest release](https://github.com/conduitio/conduit/releases/latest), then run: ```sh -dpkg -i conduit_0.8.0_Linux_x86_64.deb +dpkg -i conduit_0.10.0_Linux_x86_64.deb ``` ### RPM @@ -124,7 +124,7 @@ Download the right `.rpm` file for your machine architecture from the [latest release](https://github.com/conduitio/conduit/releases/latest), then run: ```sh -rpm -i conduit_0.8.0_Linux_x86_64.rpm +rpm -i conduit_0.10.0_Linux_x86_64.rpm ``` ### Build from source @@ -310,21 +310,23 @@ For more information about the UI refer to the [Readme](ui/README.md) in `/ui`. ## Documentation To learn more about how to use Conduit -visit [docs.Conduit.io](https://docs.conduit.io). +visit [Conduit.io/docs](https://conduit.io/docs). If you are interested in internals of Conduit we have prepared some technical documentation: -- [Pipeline Semantics](docs/pipeline_semantics.md) explains the internals of how +- [Pipeline Semantics](https://conduit.io/docs/features/pipeline-semantics) explains the internals of how a Conduit pipeline works. -- [Pipeline Configuration Files](docs/pipeline_configuration_files.md) +- [Pipeline Configuration Files](https://conduit.io/docs/pipeline-configuration-files) explains how you can define pipelines using YAML files. - [Processors](https://conduit.io/docs/processors/getting-started) contains examples and more information about Conduit processors. -- [Conduit Architecture](docs/architecture.md) +- [Conduit Architecture](https://conduit.io/docs/getting-started/architecture) will give you a high-level overview of Conduit. - [Conduit Metrics](docs/metrics.md) provides more information about how Conduit exposes metrics. +- [Conduit Package structure](docs/package_structure.md) + provides more information about the different packages in Conduit. ## Contributing @@ -345,8 +347,7 @@ We also value contributions in form of pull requests. When opening a PR please ensure: - You have followed - the [Code Guidelines](https://github.com/ConduitIO/conduit/blob/main/docs/code_guidelines.md) - . + the [Code Guidelines](https://github.com/ConduitIO/conduit/blob/main/docs/code_guidelines.md). - There is no other [pull request](https://github.com/ConduitIO/conduit/pulls) for the same update/change. - You have written unit tests. diff --git a/docs/architecture-decision-records/20230123-pipeline-configuration-files.md b/docs/architecture-decision-records/20230123-pipeline-configuration-files.md index 6f453e7d1..9b7439185 100644 --- a/docs/architecture-decision-records/20230123-pipeline-configuration-files.md +++ b/docs/architecture-decision-records/20230123-pipeline-configuration-files.md @@ -65,4 +65,4 @@ can be ( only incompatible changes cause the state to be deleted). ## Related -[Pipeline configuration files documentation](../pipeline_configuration_files.md) +[Pipeline configuration files documentation](https://conduit.io/docs/pipeline-configuration-files) diff --git a/docs/architecture.md b/docs/architecture.md deleted file mode 100644 index 7a4e3a2de..000000000 --- a/docs/architecture.md +++ /dev/null @@ -1,107 +0,0 @@ -# Conduit Architecture - -This document describes the Conduit architecture. - -## Vocabulary - -- **Pipeline** - a pipeline receives records from one or multiple source connectors, pushes them through zero or - multiple processors until they reach one or multiple destination connectors. -- **Connector** - a connector is the internal entity that communicates with a connector plugin and either pushes records - from the plugin into the pipeline (source connector) or the other way around (destination connector). -- **Connector plugin** - sometimes also referred to as "plugin", is an external process which communicates with Conduit - and knows how to read/write records from/to a data source/destination (e.g. a database). -- **Processor** - a component that executes an operation on a single record that flows through the pipeline. It can - either change the record or filter it out based on some criteria. -- **Record** - a record represents a single piece of data that flows through a pipeline (e.g. one database row). - -## High level overview - -![Component diagram](data/component_diagram_full.svg) - -Conduit is split in the following layers: - -- **API layer** - exposes the public APIs used to communicate with Conduit. It exposes 2 types of APIs: - - **gRPC** - this is the main API provided by Conduit. The gRPC API definition can be found in - [api.proto](../proto/api/v1/api.proto), it can be used to generate code for the client. - - **HTTP** - the HTTP API is generated using [grpc-gateway](https://github.com/grpc-ecosystem/grpc-gateway) and - forwards the requests to the gRPC API. Conduit exposes an - [openapi](../pkg/web/openapi/swagger-ui/api/v1/api.swagger.json) definition that describes the HTTP API, which is - also exposed through Swagger UI on `http://localhost:8080/openapi/`. -- **Orchestration layer** - the orchestration layer is responsible for coordinating the flow of operations between the - core services. It also takes care of transactions, making sure that changes made to specific entities are not visible - to the outside until the whole operation succeeded. There are 3 orchestrators, each responsible for actions related - to one of the 3 main entities - pipelines, connectors and processors. -- **Core** - we regard the core to be the combination of the entity management layer and the pipeline engine. It - provides functionality to the orchestrator layer and does not concern itself with where requests come from and how - single operations are combined into more complex flows. - - **Entity management** - this layer is concerned with the creation, editing, deletion and storage of the main - entities. You can think about this as a simple CRUD layer. It can be split up further using the main entities: - - **Pipeline** - this is the central entity managed by the Pipeline Service that ties together all other components. - A pipeline contains the configuration that defines how pipeline nodes should be connected together in a running - pipeline. It has references to at least one source and one destination connector and zero or multiple processors, - a pipeline that does not meet the criteria is regarded as incomplete and can't be started. A pipeline can be - either running, stopped or degraded (stopped because of an error). The pipeline can only be edited if it's not in - a running state. - - **Connector** - a connector takes care of receiving or forwarding records to connector plugins, depending on its - type (source or destination). It is also responsible for tracking the connector state as records flow through it. - The Connector Service manages the creation of connectors and permanently stores them in the Connector Store. A - connector can be configured to reference a number of processors, which will be executed only on records that are - received from or forwarded to that specific connector. - - **Processor** - processors are stateless components that operate on a single record and can execute arbitrary - actions before forwarding the record to the next node in the pipeline. A processor can also choose to drop a - record without forwarding it. They can be attached either to a connector or to a pipeline, based on that they are - either processing only records that flow from/to a connector or all records that flow through a pipeline. - - **Pipeline Engine** - the pipeline engine consists of nodes that can be connected together with Go channels to form - a data pipeline. - - **Node** - a node is a lightweight component that runs in its own goroutine and runs as long as the incoming channel - is open. As soon as the previous node stops forwarding records and closes its out channel, the current node also - stops running and closes its out channel. This continues down the pipeline until all records are drained and the - pipeline gracefully stops. In case a node experiences an error all other nodes will be notified and stop running - as soon as possible without draining the pipeline. -- **Persistence** - this layer is used directly by the orchestration layer and indirectly by the core layer (through - stores) to persist data. It provides the functionality of creating transactions and storing, retrieving and deleting - arbitrary data like configurations or state. -- **Plugins** - while this is not a layer in the same sense as the other layers, it is a component separate from - everything else. It interfaces with the connector on one side and with Conduit plugins on the other and facilitates - the communication between them. A Conduit plugin is a separate process that implements the interface defined in - [plugins.proto](https://github.com/ConduitIO/conduit/blob/main/pkg/plugins/proto/plugins.proto) and provides the - read/write functionality for a specific resource (e.g. a database). - -## Package structure - -- `cmd` - Contains main applications. The directory name for each application should match the name of the executable - (e.g. `cmd/conduit` produces an executable called `conduit`). It is the responsibility of main applications to do 3 - things, it should not include anything else: - 1. Read the configuration (from a file, the environment or arguments). - 2. Instantiate, wire up and run internal services. - 3. Listen for signals (i.e. SIGTERM, SIGINT) and forward them to internal services to ensure a graceful shutdown - (e.g. via a closed context). - - `conduit` - The entrypoint for the main Conduit executable. -- `pkg` - The internal libraries and services that Conduit runs. - - `conduit` - Defines the main runtime that ties all Conduit layers together. - - `connector` - Code regarding connectors, including connector store, connector service, connector configurations - and running instances. - - `foundation` - Foundation contains reusable code. Should not contain any business logic. A few honorable mentions: - - `assert` - Exposes common assertions for testing. - - `cerrors` - Exposes error creation and wrapping functionality. This is the only package for errors used in Conduit. - - `database` - Exposes functionality for storing values. - - `log` - Exposes a logger. This is the logger used throughout Conduit. - - `metrics` - Exposes functionality for gathering and exposing metrics. - - `orchestrator` - Code regarding the orchestration layer. - - `pipeline` - Code regarding pipelines, including pipeline store, pipeline service, running pipeline instances. - - `plugin` - Currently contains all logic related to plugins as well as the plugins themselves. In the future a lot of - this code will be extracted into separate repositories, what will be left is a plugin service that manages built-in - and external plugins. - - `processor` - Provides the types for processing a `Record`. A common abbreviation for transforms is `txf`. - - `transform/txfbuiltin` - Contains built-in transforms. - - `transform/txfjs` - Provides the functionality for implementing a transform in JavaScript. - - `record` - Everything regarding a `Record`, that is the central entity that is pushed through a Conduit pipeline. - This includes a record `Schema`. - - `web` - Everything related to Conduit APIs or hosted pages like the UI or Swagger. - -Other folders that don't contain Go code: - -- `docs` - Documentation regarding Conduit. -- `proto` - Protobuf files (e.g. gRPC API definition). -- `test` - Contains configurations needed for integration tests. -- `ui` - A subproject containing the web UI for Conduit. diff --git a/docs/package_structure.md b/docs/package_structure.md new file mode 100644 index 000000000..4ff1096ad --- /dev/null +++ b/docs/package_structure.md @@ -0,0 +1,38 @@ +# Package structure + +- `cmd` - Contains main applications. The directory name for each application should match the name of the executable + (e.g. `cmd/conduit` produces an executable called `conduit`). It is the responsibility of main applications to do 3 + things, it should not include anything else: + 1. Read the configuration (from a file, the environment or arguments). + 2. Instantiate, wire up and run internal services. + 3. Listen for signals (i.e. SIGTERM, SIGINT) and forward them to internal services to ensure a graceful shutdown +(e.g. via a closed context). + - `conduit` - The entrypoint for the main Conduit executable. +- `pkg` - The internal libraries and services that Conduit runs. + - `conduit` - Defines the main runtime that ties all Conduit layers together. + - `connector` - Code regarding connectors, including connector store, connector service, connector configurations + and running instances. + - `foundation` - Foundation contains reusable code. Should not contain any business logic. A few honorable mentions: + - `cerrors` - Exposes error creation and wrapping functionality. This is the only package for errors used in Conduit. + - `ctxutil` - Utilities to operate with context.Context. + - `grpcutil` - Utilities to operate with GRPC requests. + - `log` - Exposes a logger. This is the logger used throughout Conduit. + - `metrics` - Exposes functionality for gathering and exposing metrics. + - `inspector` - Exposes a service that can be used to inspect and operate with the state of Conduit. + - `orchestrator` - Code regarding the orchestration layer. + - `pipeline` - Code regarding pipelines, including pipeline store, pipeline service, running pipeline instances. + - `plugin` - Currently contains all logic related to plugins as well as the plugins themselves. In the future a lot of + this code will be extracted into separate repositories, what will be left is a plugin service that manages built-in + and external plugins. + - `processor` - Code regarding processors, including processor store, processor service, running processor instances. + - `provisioning` - Exposes a provisioning service that can be used to provision Conduit resources. + - `schemaregistry` - Code regarding the schema registry. + - `web` - Everything related to Conduit APIs or hosted pages like the UI or Swagger. + +Other folders that don't contain Go code: + +- `docs` - Documentation regarding Conduit that's specific to this repository (more documentation can be found at [Conduit.io/docs](https://conduit.io/docs)). +- `proto` - Protobuf files (e.g. gRPC API definition). +- `scripts` - Contains scripts that are useful for development. +- `test` - Contains configurations needed for integration tests. +- `ui` - A subproject containing the web UI for Conduit. diff --git a/docs/pipeline_configuration_files.md b/docs/pipeline_configuration_files.md deleted file mode 100644 index 9565d736f..000000000 --- a/docs/pipeline_configuration_files.md +++ /dev/null @@ -1,100 +0,0 @@ -# Pipeline Configuration Files - -Pipeline configuration files give you the ability to define pipelines that are provisioned by Conduit at startup. -It's as simple as creating a YAML file that defines pipelines, connectors, processors, and their corresponding configurations. - -## Getting started - -Create a folder called `pipelines` at the same level as your Conduit binary file, add all your YAML files -there, then run Conduit using the command: - -```sh -./conduit -``` - -Conduit will only search for files with `.yml` or `.yaml` extensions, recursively in all sub-folders. - -If you have your YAML files in a different directory, or want to provision only one file, then simply run Conduit with -the CLI flag `pipelines.path` and point to your file or directory: - -```sh -./conduit -pipeline.path ../my-directory -``` - -If your directory does not exist, Conduit will fail with an error: `"pipelines.path" config value is invalid` - -### YAML Schema - -The file in general has two root keys, the `version`, and the `pipelines` map. The map consists of other elements like -`status` and `name`, which are configurations for the pipeline itself. - -To create connectors in that pipeline, simply add another map under the pipeline map, and call it `connectors`. - -To create processors, either add a `processors` map under a pipeline ID, or under a connector ID, depending on its parent. -Check this YAML file example with explanation for each field: - -```yaml -version: 1.0 # parser version, the only supported version for now is 1.0 [mandatory] - -pipelines: # a map of pipelines IDs and their configurations. - pipeline1: # pipeline ID, has to be unique. - status: running # pipelines status at startup, either running or stopped. [mandatory] - name: pipeline1 # pipeline name, if not specified, pipeline ID will be used as name. [optional] - description: desc # pipeline description. [optional] - connectors: # a map of connectors IDs and their configurations. - con1: # connector ID, has to be unique per pipeline. - type: source # connector type, either "source" or "destination". [mandatory] - plugin: builtin:file # connector plugin. [mandatory] - name: con3 # connector name, if not specified, connector ID will be used as name. [optional] - settings: # map of configurations keys and their values. - path: ./file1.txt # for this example, the plugin "bultin:file" has only one configuration, which is path. - con2: - type: destination - plugin: builtin:file - name: file-dest - settings: - path: ./file2.txt - processors: # a map of processor IDs and their configurations, "con2" is the processor parent. - proc1: # processor ID, has to be unique for each parent - type: js # processor type. [mandatory] - settings: # map of processor configurations and values - Prop1: string - processors: # processor IDs, that have the pipeline "pipeline1" as a parent. - proc2: - type: js - settings: - prop1: ${ENV_VAR} # yon can use environmental variables by wrapping them in a dollar sign and curly braces ${}. -``` - -If the file is invalid (missed a mandatory field, or has an invalid configuration value), then the pipeline that has the -invalid value will be skipped, with an error message logged. - -If two pipelines in one file have the same ID, or the `version` field was not specified, then the file would be -non-parsable and will be skipped with an error message logged. - -If two pipelines from different files have the same ID, the second pipeline will be skipped, with an error message -specifying which pipeline was not provisioned. - -**_Note_**: Connector IDs and processor IDs will get their parent ID prefixed, so if you specify a connector ID as `con1` -and its parent is `pipeline1`, then the provisioned connector will have the ID `pipeline1:con1`. Same goes for processors, -if the processor has a pipeline parent, then the processor ID will be `connectorID:processorID`, and if a processor -has a connector parent, then the processor ID will be `pipelineID:connectorID:processorID`. - -## Pipelines Immutability - -Pipelines provisioned by configuration files are **immutable**, any updates needed on a provisioned pipeline have to be -done through the configuration file it was provisioned from. You can only control stopping and starting a pipeline -through the UI or API. - -### Updates and Deletes - -Updates and deletes for a pipeline provisioned by configuration files can only be done through the configuration files. -Changes should be made to the files, then Conduit has to be restarted to reload the changes. Any updates or deletes done -through the API or UI will be prohibited. - -- To delete a pipeline: simply delete it from the `pipelines` map from the configuration file, then run Conduit again. -- To update a pipeline: change any field value from the configuration file, and run Conduit again to address these updates. - -Updates will preserve the status of the pipeline, and will continue working from where it stopped. However, the pipeline -will start from the beginning of the source and will not continue from where it stopped, if one of these values were updated: -{`pipeline ID`, `connector ID`, `connector plugin`, `connector type`}. diff --git a/docs/pipeline_semantics.md b/docs/pipeline_semantics.md deleted file mode 100644 index de14c1114..000000000 --- a/docs/pipeline_semantics.md +++ /dev/null @@ -1,182 +0,0 @@ -# Pipeline semantics - -This document describes the inner workings of a Conduit pipeline, its structure, and behavior. It also describes a -Conduit message and its lifecycle as it flows through the pipeline. - -**NOTE**: Some parts of this document describe behavior that is not yet fully implemented (e.g. DLQs, order of acks). -For more information see [#383](https://github.com/ConduitIO/conduit/pull/383). This note -should be removed once the new behavior is implemented. - -## Pipeline structure - -A Conduit pipeline is a directed acyclic graph of nodes. Each node runs in its own goroutine and is connected to other -nodes via unbuffered Go channels that can transmit messages. In theory, we could create arbitrarily complex graphs of -nodes, but for the sake of a simpler API we expose the ability to create graphs with the following structure: - -![pipeline](data/pipeline_example.svg) - -In the diagram above we see 7 sections: - -- **Source connectors** - represents the code that communicates with a 3rd party system and continuously fetches records - and sends them to Conduit (e.g. [kafka connector](https://github.com/conduitio/conduit-connector-kafka)). Every source - connector is managed by a source node that receives records, wraps them in a message, and sends them downstream to the - next node. A pipeline requires at least one source connector to be runnable. -- **Source processors** - these processors only receive messages originating at a specific source connector. Source - processors are created by specifying the corresponding source connector as the parent entity. Source processors are not - required for starting a pipeline. -- **Fan-in node** - this node is essentially a multiplexer that receives messages produced by all source connectors and - sends them into one output channel. The order of messages coming from different connectors is nondeterministic. A - fan-in node is automatically created for all pipelines. -- **Pipeline processors** - these processors receive all messages that flow through the pipeline, regardless of the - source or destination. Pipeline processors are created by specifying the pipeline as the parent entity. Pipeline processors - are not required for starting a pipeline. -- **Fan-out node** - this node is the counterpart to the fan-in node and acts as a demultiplexer that sends messages - coming from a single input to multiple downstream nodes (one for each destination). The fan-out node does not buffer - messages, instead, it waits for a message to be sent to all downstream nodes before processing the next message (see - [backpressure](#backpressure)). A fan-out node is automatically created for all pipelines. -- **Destination processors** - these processors receive only messages that are meant to be sent to a specific - destination connector. Destination processors are created by specifying the corresponding destination connector as the - parent entity. Destination processors are not required for starting a pipeline. -- **Destination connectors** - represents the code that communicates with a 3rd party system and continuously receives - records from Conduit and writes them to the destination (e.g. - [kafka connector](https://github.com/conduitio/conduit-connector-kafka)). Every destination connector is managed by a - destination node that receives records and sends them to the connector. A pipeline requires at least one destination - connector to be runnable. - -There are additional internal nodes that Conduit adds to each pipeline not shown in the diagram, as they are -inconsequential for the purpose of this document (e.g. nodes for gathering metrics, nodes for managing acknowledgments, -etc.). - -## Message - -A message is a wrapper around a record that manages the record's lifecycle as it flows through the pipeline. Messages -are created in source nodes when they receive records from the source connector, and they are passed down the pipeline -between nodes until they are acked or nacked. Nodes are only allowed to hold a reference to a single message at a time, -meaning that they need to pass the message to the next node before taking another message¹. This also means there is no -explicit buffer in Conduit, a pipeline can only hold only as many messages as there are nodes in the pipeline (see -[backpressure](#backpressure) for more information). - -¹This might change in the future if we decide to add support for multi-message transforms. - -### Message states - -A message can be in one of 3 states: - -- **Open** - all messages start in the open state. This means the message is currently in processing, either by a node - or a destination connector. A pipeline won't stop until all messages transition from the open state into one of the - other two states. -- **Acked** - a message was successfully processed and acknowledged. This can be done either by a processor (e.g. it - filtered the message out) or by a destination. If a pipeline contains multiple destinations, the message needs to be - acknowledged by all destinations before it is marked as acked. Acks are propagated back to the source connector and - can be used to advance the position in the source system if applicable. -- **Nacked** - the processing of the message failed and resulted in an error, so the message was negatively - acknowledged. This can be done either by a processor (e.g. a transform failed) or by a destination. If a pipeline - contains multiple destinations, the message needs to be negatively acknowledged by at least one destination before it - is marked as nacked. When a message is nacked, the message is passed to the [DLQ](#dead-letter-queue) handler, which - essentially controls what happens after a message is nacked (stop pipeline, drop message and continue running or store - message in DLQ and continue running). - -**Important**: if a message gets nacked and the DLQ handler successfully processes the nack (e.g. stores the message in -the dead letter queue), the source connector will receive an ack as if the message was successfully processed, even -though Conduit marks it internally as nacked. In other words, the source connector will receive an ack every time -Conduit handled a message end-to-end and it can be safely discarded from the source. - -Pipeline nodes will either leave the message open and send it to the next node for processing or ack/nack it and not -send it further down the pipeline. If the ack/nack fails, the node will stop running and return an error that will -consequently stop the whole pipeline. The returned error is stored in the pipeline for further inspection by the user. - -### Message state change handlers - -A pipeline node can register state change handlers on a message that will be called when the message state changes. This -is used for example to register handlers that reroute nacked messages to a dead letter queue or to update metrics when a -message reaches the end of the pipeline. If a message state change handler returns an error, the node that triggered the -ack/nack will stop running, essentially causing the whole pipeline to stop. - -## Semantics - -### Messages are delivered in order - -Since messages are passed between nodes in channels and a node only processes one message at a time, it is guaranteed -that messages from a single source connector will flow through the Conduit pipeline in the same order that was produced -by that source. - -There are two caveats: - -- If a pipeline contains multiple source connectors, the order of two messages coming from different connectors is - nondeterministic. Messages coming from the same source connector are still guaranteed to retain their order. -- If a dead letter queue is configured, negatively acknowledged messages will be removed from the stream while the - pipeline will keep running, thus impacting the order of messages. - -The order guarantee only holds inside of Conduit. Once a message reaches a destination connector, it is allowed to buffer -messages and batch write them to 3rd party systems. Normally the connector would retain the order, although we can't -vouch for badly written connectors that don't follow this behavior. - -### Messages are delivered at least once - -Between pipeline restarts, it is guaranteed that any message that is processed successfully by all nodes and not -filtered out will be delivered to a destination connector at least once. Multiple deliveries can occur in pipelines with -multiple destinations that stopped because of a negatively acknowledged record, or pipelines where a destination -negatively acknowledged a record and processed more messages after that. For this reason, we strongly recommend -implementing the write operation of a destination connector in an idempotent way (if possible). - -The delivery guarantee can be changed to "at most once" by adding a [dead letter queue](#dead-letter-queue) handler that -drops unsuccessfully processed messages. - -### Acks are delivered in order - -Conduit ensures that acknowledgments are sent to the source connector in the exact same order as records produced by the -connector. This guarantee still holds, even if a badly implemented destination connector acknowledges records in a -different order, or if a processor filters out a record (i.e. acks the message) while a message that came before it is -still being processed. - -### Acks are delivered at most once - -Acknowledgments are sent back to the source connector at most once. This means that if a message gets negatively -acknowledged and is not successfully processed by a DLQ handler, the acknowledgment for that message won't be delivered -to the source connector. Acknowledgments of all messages produced after this message also won't be delivered to the -source connector, otherwise the order delivery guarantee would be violated. The absence of an acknowledgment after the -source connector teardown is initiated can be interpreted as a negative acknowledgment. - -### Backpressure - -The usage of unbuffered channels between nodes and the absence of explicit buffers results in backpressure. This means -that the speed of the destination connector dictates the speed of the whole pipeline. - -There is an implicit buffer that needs to be filled up before backpressure takes effect. The buffer is equal to the -number of nodes in a pipeline - similar to how a longer garden hose holds more water, a longer Conduit pipeline can hold -more messages. There are two exceptions to this rule: - -- Conduit is using gRPC streams to communicate with standalone connectors, which internally buffers requests before - sending them over the wire, thus creating another implicit buffer (we are aware of this - issue: [#211](https://github.com/ConduitIO/conduit/issues/211)). -- Destination connectors are allowed to collect multiple records and write them in batches to the destination, which - creates a buffer that depends on the connector implementation. - -If there are multiple destinations the fanout node won't fetch a new message from the upstream node until the current -message was successfully sent to all downstream nodes (this doesn't mean the message was necessarily processed, just -that all downstream nodes received the message). As a consequence, the speed of the pipeline will be throttled to -accommodate the abilities of the slowest destination connector. - -### Dead letter queue - -Messages that get negatively acknowledged can be rerouted to another destination called a dead letter queue (DLQ) where -they are stored and can be reprocessed at a later point in time after manual intervention. If rerouting is set up and -the message successfully reaches the DLQ the message will be internally nacked, but an acknowledgment will be sent to -the source connector since Conduit handled the message and it can be discarded from the source. The user has the option -to configure a DLQ that simply logs a warning and drops messages to achieve "at most once" delivery guarantees. - -### Pipeline stop - -A pipeline can be stopped in two ways - either it's stopped gracefully or forcefully. - -- A graceful stop is initiated either by Conduit shutting down or by the user requesting the pipeline to stop. Only the - source connector nodes will receive the signal to stop running. The source nodes will stop running and close their - outgoing channels, notifying the downstream nodes that there will be no more messages. This behavior propagates down - the pipeline until the last node stops running. Any messages that were being processed while the pipeline received a - stop signal will be processed normally and written to all destinations. -- A forceful stop is initiated when a node stops running because it experienced an unrecoverable error (e.g. it nacked a - message and received an error because no DLQ is configured, or the connector plugin returned an unexpected error). In - that case, the context that is shared by all nodes will get canceled, signaling to all nodes simultaneously that they - should stop running as soon as possible. Messages that are in the pipeline won't be drained, instead, they are dropped - and will be requested from the source again once the pipeline is restarted. The error returned from the first node - that failed will be stored in the pipeline and can be retrieved through the API. diff --git a/pkg/web/openapi/README.md b/pkg/web/openapi/README.md index fb4f86d0a..6366250d3 100644 --- a/pkg/web/openapi/README.md +++ b/pkg/web/openapi/README.md @@ -2,9 +2,9 @@ This directory contains an abbreviated copy of the following repository: -* `swagger-ui` - https://github.com/swagger-api/swagger-ui 8e6824cb452ae4b268f45f5203575194217c6653 (LICENSE, dist/) +- `swagger-ui` - 8e6824cb452ae4b268f45f5203575194217c6653 (LICENSE, dist/) -The `swagger-ui` directory contains HTML, Javascript, and CSS assets that dynamically generate Swagger documentation +The `swagger-ui` directory contains HTML, JavaScript, and CSS assets that dynamically generate Swagger documentation from a Swagger-compliant API definition in [api.swagger.json](./swagger-ui/api/v1/api.swagger.json) file. That file is auto-generated by running `make proto` in the root of this repository. The static assets are copied from [this dist folder](https://github.com/swagger-api/swagger-ui/tree/master/dist) of the swagger-ui project. After