Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve documentation #3

Merged
merged 2 commits into from
Dec 8, 2023
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Improve documentation: README
amotl committed Dec 8, 2023
commit 59dda4e1fcf1195da99dbd610d4e66e17ad8ae96
204 changes: 193 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -9,26 +9,208 @@
[![PyPI](https://img.shields.io/pypi/v/meltano-target-cratedb.svg)](https://pypi.org/project/meltano-target-cratedb/)
[![Downloads](https://pepy.tech/badge/meltano-target-cratedb/month)](https://pepy.tech/project/meltano-target-cratedb/)


## About

A [Singer] target for [CrateDB], built with the [Meltano SDK] for custom extractors
and loaders, and based on the PostgreSQL target [meltanolabs-target-postgres].
and loaders, and based on the [Meltano PostgreSQL target]. It connects a library of
[600+ connectors] with CrateDB, and vice versa.

In Singer ELT jargon, a "target" conceptually wraps a data sink, where you
"load" data into.

## Details
Singer, Meltano, and PipelineWise provide foundational components and
an integration engine for composable Open Source ETL with [600+ connectors].
On the database integration side, they are heavily based on [SQLAlchemy].

In Singer ELT jargon, a "target" conceptionally wraps a data sink, where you
"load" data into.

CrateDB is a distributed and scalable SQL database for storing and analyzing
### CrateDB

[CrateDB] is a distributed and scalable SQL database for storing and analyzing
massive amounts of data in near real-time, even with complex queries. It is
PostgreSQL-compatible, and based on Apache Lucene.
PostgreSQL-compatible, and based on [Apache Lucene].

CrateDB offers a Python SQLAlchemy dialect, in order to plug into the
comprehensive Python data-science and -wrangling ecosystems.

### Singer

_The open-source standard for writing scripts that move data._

[Singer] is an open source specification and software framework for [ETL]/[ELT]
data exchange between a range of different systems. For talking to SQL databases,
it employs a metadata subsystem based on SQLAlchemy.

Singer reads and writes Singer-formatted messages, following the [Singer Spec].
Effectively, those are JSONL files.

### Meltano

_Unlock all the data that powers your data platform._

_Say goodbye to writing, maintaining, and scaling your own API integrations
with Meltano's declarative code-first data integration engine, bringing
600+ APIs and DBs to the table._

[Meltano] builds upon Singer technologies, uses configuration files in YAML
syntax instead of JSON, adds an improved SDK and other components, and runs
the central addon registry, [meltano | Hub].

### PipelineWise

[PipelineWise] is another Data Pipeline Framework using the Singer.io
specification to ingest and replicate data from various sources to
various destinations. The list of [PipelineWise Taps] include another
20+ high-quality data-source and -sink components.

### SQLAlchemy

[SQLAlchemy] is the leading Python SQL toolkit and Object Relational Mapper
that gives application developers the full power and flexibility of SQL.

It provides a full suite of well known enterprise-level persistence patterns,
designed for efficient and high-performing database access, adapted into a
simple and Pythonic domain language.


## Install

Usually, you will not install this package directly, but on behalf
of a Meltano definition instead, for example. A corresponding snippet
is outlined in the next section. After adding it to your `meltano.yml`
configuration file, you can install all defined components and their
dependencies.
```
meltano install
```


## Usage

You can run the CrateDB Singer target `target-cratedb` by itself, or
in a pipeline using Meltano.

### Meltano

Using the `meltano add` subcommand, you can add the plugin to your
Meltano project.
```shell
meltano add loader target-cratedb
```
NB: It will only work like this when released and registered on Meltano Hub.
In the meanwhile, please add the configuration snippet manually.


#### CrateDB Cloud

In order to connect to [CrateDB Cloud], configure the `sqlalchemy_url` setting
within your `meltano.yml` configuration file like this.
```yaml
- name: target-cratedb
namespace: cratedb
variant: cratedb
pip_url: meltano-target-cratedb
config:
sqlalchemy_url: "crate://admin:K4IgMXNvQBJM3CiElOiPHuSp6CiXPCiQYhB4I9dLccVHGvvvitPSYr1vTpt4@example.aks1.westeurope.azure.cratedb.net:4200?ssl=true"}
add_record_metadata: true
```


#### On localhost
In order to connect to a standalone or on-premise instance of CrateDB, configure
the `sqlalchemy_url` setting within your `meltano.yml` configuration file like this.
```yaml
- name: target-cratedb
namespace: cratedb
variant: cratedb
pip_url: meltano-target-cratedb
config:
sqlalchemy_url: crate://crate@localhost/
add_record_metadata: true
```

Then, invoke the pipeline by using `meltano run`, similar like this.
```shell
meltano run tap-xyz target-cratedb
```

### Standalone

You can also invoke it standalone by using the `target-cratedb` program.
This example demonstrates how to load a file into the database.

First, acquire an example file in Singer format, including the list of
countries of the world.
```shell
wget https://github.com/MeltanoLabs/target-postgres/raw/v0.0.9/target_postgres/tests/data_files/tap_countries.singer
```

Now, define the database connection string including credentials in
SQLAlchemy format.
```shell
echo '{"sqlalchemy_url": "crate://crate@localhost/"}' > settings.json
```

By using Unix pipes, load the data file into the database, referencing
the path to the configuration file.
```shell
cat tap_countries.singer | target-cratedb --config=settings.json
```

Using the interactive terminal program, `crash`, you can run SQL
statements on CrateDB.
```shell
pip install crash
crash --hosts localhost:4200
```

Now, you can verify that the data has been loaded correctly.
```sql
SELECT
"code", "name", "capital", "emoji", "languages[1]"
FROM
"melty"."countries"
ORDER BY
"name"
LIMIT
42;
```


## Development

In order to work on this adapter dialect on behalf of a real pipeline definition,
link your sandbox to a development installation of [meltano-target-cratedb], and
configure the `pip_url` of the component to point to a different location than the
[vanilla package on PyPI].

Use this URL to directly point to a specific Git repository reference.
```yaml
pip_url: git+https://github.com/crate-workbench/meltano-target-cratedb.git@main
```

Singer and Meltano provide foundational components and an integration engine for
composable Open Source ETL with 600+ connectors
Use a `pip`-like notation to link the CrateDB Singer target in development mode,
so you can work on it at the same time while running the pipeline, and iterating
on its definition.
```yaml
pip_url: --editable=/path/to/sources/meltano-target-cratedb
```


[CrateDB]: https://github.com/crate/crate
[600+ connectors]: https://hub.meltano.com/
[Apache Lucene]: https://lucene.apache.org/
[CrateDB]: https://cratedb.com/product
[CrateDB Cloud]: https://console.cratedb.cloud/
[ELT]: https://en.wikipedia.org/wiki/Extract,_load,_transform
[ETL]: https://en.wikipedia.org/wiki/Extract,_transform,_load
[Meltano]: https://meltano.com/
[meltano | Hub]: https://hub.meltano.com/
[Meltano SDK]: https://github.com/meltano/sdk
[meltanolabs-target-postgres]: https://pypi.org/project/meltanolabs-target-postgres/
[Singer]: https://github.com/singer-io
[Meltano PostgreSQL target]: https://pypi.org/project/meltanolabs-target-postgres/
[meltano-target-cratedb]: https://github.com/crate-workbench/meltano-target-cratedb
[Singer]: https://www.singer.io/
[Singer Spec]: https://hub.meltano.com/singer/spec/
[PipelineWise]: https://transferwise.github.io/pipelinewise/
[PipelineWise Taps]: https://transferwise.github.io/pipelinewise/user_guide/yaml_config.html
[SQLAlchemy]: https://www.sqlalchemy.org/
[vanilla package on PyPI]: https://pypi.org/project/meltano-target-cratedb/