rely on conduit.io/docs (#1972)

ConduitIO · Nov 13, 2024 · ed6ea11 · ed6ea11
1 parent 5c42c30
commit ed6ea11
Showing 1 changed file with 2 additions and 261 deletions.
diff --git a/docs/writing-a-connector-guidelines.md b/docs/writing-a-connector-guidelines.md
@@ -1,263 +1,4 @@
 # Guidelines for writing a connector
 
-This document describes patterns and processes that have been successfully used
-in many Conduit connectors over the past few years. The document assumes basic
-knowledge about how Conduit works.
-
-## Start with the connector template
-
-The GitHub
-repository [template](https://github.com/ConduitIO/conduit-connector-template/)
-for Conduit connectors provides a basic project setup that includes placeholders
-for the source connector code, destination connector code, connector
-specifications, an example of a configuration, etc.
-
-Also included is a `Makefile` with commonly used targets and GitHub workflows
-for linting the code, running unit tests, automatically merging minor dependabot
-upgrades, etc.
-
-## Research the 3rd party system
-
-Researching the 3rd party system for which a Conduit connector is built helps
-understand the capabilities and limitations of the system, which in turn results
-in a better connector design and better work organization.
-
-Some questions that typically need to be answered:
-
-1. **How is the data organized in the system?**
-
-   The way data is organized in the system will affect how the connector is
-   configured (e.g. what data is a user able to read or write to) and how the
-   hierarchy is mapped to records and collections in Conduit.
-2. **What parameters are needed to connect to the system?**
-
-   We generally recommend using connection strings/URIs, if available. The
-   reason for this is that modifying a connector's parameters is a matter of
-   changing an existing string, and not separate configuration parameters in a
-   connector.
-3. **What APIs or drivers are available?**
-
-   The choice is influenced by a number of factors, such as:
-   - Is there a driver available? If a public API and a driver are available, we
-     recommend using a driver, since it's a higher level of abstraction.
-   - In which language should the connector be written?
-   - If the language of choice is Go, is a pure Go driver available? A CGo
-     driver may make the usage and deployment of a connector more complex.
-   - Does the driver expose all the functionalities needed?
-   - How mature is the driver?
-4. **What authentication methods should be supported?**
-
-   If multiple authentication methods are available, then a decision needs to be
-   made about the methods that should be implemented first. Additionally, it
-   should be understood how expired credentials should be handled. For example,
-   a connector won't be able to handle an expired password, but a token can
-   sometimes be refreshed.
-5. **Can the connector be isolated from other clients of the system?**
-
-   In some cases a connector, as a client using a system, might affect other
-   clients, for example when getting messages from a message broker's queue, the
-   message will be delivered to the connector, while other clients might expect
-   the message.
-6. **Are the source/destination specific features supported?**
-
-   Source and destination connectors may have specific requirements (some of
-   them are outlined in later sections). When researching, attention should be
-   paid if those requirements can be fulfilled.
-
-## Development
-
-### Familiarize with the Connector SDK/connector protocol
-
-Conduit's Connector SDK is the Go software development kit for implementing a
-connector for Conduit. If you want to implement a connector in another language
-please check
-the [connector protocol](https://github.com/conduitio/conduit-connector-protocol).
-
-### Write a test pipeline
-
-Having a test pipeline helps see the results of development sooner rather than
-later. When working on a source connector, a file or log destination can be used
-to build a test pipeline. When working on a destination, a file or generator
-source can be used to manually or automatically generate some test data for the
-destination connector.
-
-### Use built-in connectors as examples
-
-Conduit's [built-in connectors](https://conduit.io/docs/using/connectors/list/)
-are great references that can be used when writing connectors. They are also the
-first ones to be updated with latest SDK and protocol changes.
-
-### Configuration
-
-A connector's configuration is a map in which keys are parameter names and
-values are parameter values. Every source and destination needs to return the
-parameters that it accepts, and, optionally, the validations that need to be run
-on those.
-
-Conduit
-offers [ParamGen](https://github.com/ConduitIO/conduit-commons/tree/main/paramgen),
-a tool that generates the parameters map from a configuration struct. The SDK
-contains a function, `sdk.Util.ParseConfig`, that can parse and validate a
-user's configuration map into the configuration struct.
-
-### Middleware
-
-Conduit's Connector SDK adds default middleware that, by default, enables some
-functionality. A connector developer should be familiar with the middleware,
-especially with
-the [schema related middleware](https://conduit.io/docs/connectors/configuration-parameters/schema-extraction).
-
-### Source connectors
-
-The following section summarizes best practices that are specific to source
-connectors.
-
-#### Deciding how should a record position look like
-
-Every record read by a source connector has
-a [position](https://conduit.io/docs/features/opencdc-record#fields) attached.
-If a connector is stopped after that record is read, its position will be given
-to the connector's `Open()` method the next time it starts. Hence, a position
-needs to contain enough information for a source connector to resume reading
-records from where it exactly stopped.
-
-A position is a slice of bytes that can represent any data structure. In Conduit
-connectors, it's common to see that a position is actually a `struct`, that's
-marshalled into a JSON string.
-
-#### Implementing snapshots
-
-Firstly, it should be clarified if supporting snapshots is a requirement or if
-it's possible to do at all. If a connector is required to support snapshots,
-then it's recommended to make it possible to turn off snapshots through the
-connector configuration.
-
-The following things need to be taken into account when implementing a snapshot
-procedure:
-
-- The snapshot needs to be consistent.
-- The set of the existing data can be quite large.
-- Restarting a connector during a snapshot should **not** require re-reading all
-  the data again. This is because in some destination connectors it may cause
-  data duplication, and it could be a significant performance overhead.
-
-#### Implementing change data capture (CDC)
-
-Change Data Capture (CDC) should be implemented so that the following criteria
-is met:
-
-1. If a snapshot is needed, changes that happened while the
-   snapshot was running should be captured too.
-2. Different types of changes might be possible (new data inserted, existing
-   data updated or deleted).
-3. Changes that happened while a connector was stopped need to be read by the
-   connector when it starts up (assuming that the changes are still present in
-   the source system).
-
-Some source systems may provide a change log (e.g. WAL in PostgreSQL, binlog in
-MySQL, etc.). Others may not and a different way to capture changes is needed.
-Specifically, for connectors written for SQL databases, two patterns can be
-used:
-
-1. **Triggers**
-
-   With triggers, it's possible to capture all types of changes (creates,
-   updates and deletes). A trigger will write the event with necessary
-   metadata (operation performed, timestamp, etc.) into a "trigger table", that
-   will be read by the connector.
-
-   An advantage of this approach is that all types of operations can be
-   captures. However, it may incur a performance penalty.
-2. **A timestamp-based query**
-
-   If the table a source connector is reading from has a timestamp column that
-   is updated whenever a new row is inserted or an existing row is updated, then
-   a query can be used to fetch changes. A disadvantage of this approach is that
-   delete operations cannot be captured.
-
-#### Iterator pattern
-
-A pattern that we found to be useful when writing source connectors is the
-iterator pattern. The basic idea is that a source connector's `Read()` method
-reads records through an iterator:
-
-```go
-type Iterator interface {
-	HasNext() bool
-	Next() opencdc.Record
-}
-
-type Source struct {}
-func (s *Source) Read(ctx context.Context) (opencdc.Record, error) {
-	if s.iterator.HasNext(ctx) {
-        return opencdc.Record{}, sdk.ErrBackoffRetry
-   }
-
-    return s.iterator.Next(ctx)
-}
-```
-
-There are three implementations of the iterator interface:
-
-- `SnapshotIterator`: used when performing a snapshot
-- `CDCIterator`: used in CDC
-- `CombinedIterator`: an iterator that combines the above two and is able to
-  perform a snapshot and then switch to CDC.
-
-The iterator to be used is determined in the source's `Open()` method based on
-the connector's configuration: if a snapshot is required, then a
-`CombinedIterator` will be used, if not, then the `CDCIterator` will be used.
-
-There are multiple advantages of this approach. The source connector remains "
-focused" on the higher level operations, such as configuration, opening the
-connectors, tearing down, etc. The iterators contain the code that deals with
-the source system itself and convert the data into `opencdc.Record`s. They also
-take care of switching from snapshot mode to CDC mode.
-
-### Destination connectors
-
-#### Batching
-
-Batching can considerably improve the write performance and should be considered
-when developing a destination connector. The Connector SDK
-provides [batching middleware](https://conduit.io/docs/connectors/configuration-parameters/batching)
-that automatically builds batches.
-
-When writing batches into a destination system, the order of records should
-**not** be changed, even if grouping records would be useful in a way. For
-example, in some connectors, it's useful to group records by operation, because
-the destination system has different procedures for those operations.
-
-If a destination connector receives the following batch:
-
-```text
-[create_1, update_1, update_2, create_2]
-```
-
-then this batch can be split into following batches:
-
-```text
-Batch 1: [create_1]
-Batch 2: [update_1, update_2]
-Batch 3: [create_2]
-```
-
-This way, ordering is preserved and the connector can still take advantage of
-batching.
-
-### Acceptance tests
-
-Conduit's Connector SDK contains a set of acceptance tests that verify many of a
-connector's methods, such as:
-
-- Are missing configuration parameters detected?
-- Can a source that is stopped during CDC be successfully restarted?
-- Can a destination write a batch of records?
-
-An example of implemented acceptance tests can be found in
-the [Kafka connector](https://github.com/ConduitIO/conduit-connector-kafka/blob/3233c37392e7249e9d64c9cae0374a0b47ca840f/acceptance_test.go).
-
-### Debugging the connector
-
-The steps for debugging a standalone connector have been
-described [here](https://github.com/ConduitIO/conduit/discussions/1724#discussioncomment-10178484).
+> [!IMPORTANT]  
+> The content of this page has been moved to [Conduit's documentation website](https://conduit.io/docs/developing/connectors).