Skip to content

Commit

Permalink
Improve documentation: README
Browse files Browse the repository at this point in the history
  • Loading branch information
amotl committed Dec 8, 2023
1 parent 9df9dbd commit 59dda4e
Showing 1 changed file with 193 additions and 11 deletions.
204 changes: 193 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,26 +9,208 @@
[![PyPI](https://img.shields.io/pypi/v/meltano-target-cratedb.svg)](https://pypi.org/project/meltano-target-cratedb/)
[![Downloads](https://pepy.tech/badge/meltano-target-cratedb/month)](https://pepy.tech/project/meltano-target-cratedb/)


## About

A [Singer] target for [CrateDB], built with the [Meltano SDK] for custom extractors
and loaders, and based on the PostgreSQL target [meltanolabs-target-postgres].
and loaders, and based on the [Meltano PostgreSQL target]. It connects a library of
[600+ connectors] with CrateDB, and vice versa.

In Singer ELT jargon, a "target" conceptually wraps a data sink, where you
"load" data into.

## Details
Singer, Meltano, and PipelineWise provide foundational components and
an integration engine for composable Open Source ETL with [600+ connectors].
On the database integration side, they are heavily based on [SQLAlchemy].

In Singer ELT jargon, a "target" conceptionally wraps a data sink, where you
"load" data into.

CrateDB is a distributed and scalable SQL database for storing and analyzing
### CrateDB

[CrateDB] is a distributed and scalable SQL database for storing and analyzing
massive amounts of data in near real-time, even with complex queries. It is
PostgreSQL-compatible, and based on Apache Lucene.
PostgreSQL-compatible, and based on [Apache Lucene].

CrateDB offers a Python SQLAlchemy dialect, in order to plug into the
comprehensive Python data-science and -wrangling ecosystems.

### Singer

_The open-source standard for writing scripts that move data._

[Singer] is an open source specification and software framework for [ETL]/[ELT]
data exchange between a range of different systems. For talking to SQL databases,
it employs a metadata subsystem based on SQLAlchemy.

Singer reads and writes Singer-formatted messages, following the [Singer Spec].
Effectively, those are JSONL files.

### Meltano

_Unlock all the data that powers your data platform._

_Say goodbye to writing, maintaining, and scaling your own API integrations
with Meltano's declarative code-first data integration engine, bringing
600+ APIs and DBs to the table._

[Meltano] builds upon Singer technologies, uses configuration files in YAML
syntax instead of JSON, adds an improved SDK and other components, and runs
the central addon registry, [meltano | Hub].

### PipelineWise

[PipelineWise] is another Data Pipeline Framework using the Singer.io
specification to ingest and replicate data from various sources to
various destinations. The list of [PipelineWise Taps] include another
20+ high-quality data-source and -sink components.

### SQLAlchemy

[SQLAlchemy] is the leading Python SQL toolkit and Object Relational Mapper
that gives application developers the full power and flexibility of SQL.

It provides a full suite of well known enterprise-level persistence patterns,
designed for efficient and high-performing database access, adapted into a
simple and Pythonic domain language.


## Install

Usually, you will not install this package directly, but on behalf
of a Meltano definition instead, for example. A corresponding snippet
is outlined in the next section. After adding it to your `meltano.yml`
configuration file, you can install all defined components and their
dependencies.
```
meltano install
```


## Usage

You can run the CrateDB Singer target `target-cratedb` by itself, or
in a pipeline using Meltano.

### Meltano

Using the `meltano add` subcommand, you can add the plugin to your
Meltano project.
```shell
meltano add loader target-cratedb
```
NB: It will only work like this when released and registered on Meltano Hub.
In the meanwhile, please add the configuration snippet manually.


#### CrateDB Cloud

In order to connect to [CrateDB Cloud], configure the `sqlalchemy_url` setting
within your `meltano.yml` configuration file like this.
```yaml
- name: target-cratedb
namespace: cratedb
variant: cratedb
pip_url: meltano-target-cratedb
config:
sqlalchemy_url: "crate://admin:K4IgMXNvQBJM3CiElOiPHuSp6CiXPCiQYhB4I9dLccVHGvvvitPSYr1vTpt4@example.aks1.westeurope.azure.cratedb.net:4200?ssl=true"}
add_record_metadata: true
```
#### On localhost
In order to connect to a standalone or on-premise instance of CrateDB, configure
the `sqlalchemy_url` setting within your `meltano.yml` configuration file like this.
```yaml
- name: target-cratedb
namespace: cratedb
variant: cratedb
pip_url: meltano-target-cratedb
config:
sqlalchemy_url: crate://crate@localhost/
add_record_metadata: true
```

Then, invoke the pipeline by using `meltano run`, similar like this.
```shell
meltano run tap-xyz target-cratedb
```

### Standalone

You can also invoke it standalone by using the `target-cratedb` program.
This example demonstrates how to load a file into the database.

First, acquire an example file in Singer format, including the list of
countries of the world.
```shell
wget https://github.com/MeltanoLabs/target-postgres/raw/v0.0.9/target_postgres/tests/data_files/tap_countries.singer
```

Now, define the database connection string including credentials in
SQLAlchemy format.
```shell
echo '{"sqlalchemy_url": "crate://crate@localhost/"}' > settings.json
```

By using Unix pipes, load the data file into the database, referencing
the path to the configuration file.
```shell
cat tap_countries.singer | target-cratedb --config=settings.json
```

Using the interactive terminal program, `crash`, you can run SQL
statements on CrateDB.
```shell
pip install crash
crash --hosts localhost:4200
```

Now, you can verify that the data has been loaded correctly.
```sql
SELECT
"code", "name", "capital", "emoji", "languages[1]"
FROM
"melty"."countries"
ORDER BY
"name"
LIMIT
42;
```


## Development

In order to work on this adapter dialect on behalf of a real pipeline definition,
link your sandbox to a development installation of [meltano-target-cratedb], and
configure the `pip_url` of the component to point to a different location than the
[vanilla package on PyPI].

Use this URL to directly point to a specific Git repository reference.
```yaml
pip_url: git+https://github.com/crate-workbench/meltano-target-cratedb.git@main
```

Singer and Meltano provide foundational components and an integration engine for
composable Open Source ETL with 600+ connectors
Use a `pip`-like notation to link the CrateDB Singer target in development mode,
so you can work on it at the same time while running the pipeline, and iterating
on its definition.
```yaml
pip_url: --editable=/path/to/sources/meltano-target-cratedb
```


[CrateDB]: https://github.com/crate/crate
[600+ connectors]: https://hub.meltano.com/
[Apache Lucene]: https://lucene.apache.org/
[CrateDB]: https://cratedb.com/product
[CrateDB Cloud]: https://console.cratedb.cloud/
[ELT]: https://en.wikipedia.org/wiki/Extract,_load,_transform
[ETL]: https://en.wikipedia.org/wiki/Extract,_transform,_load
[Meltano]: https://meltano.com/
[meltano | Hub]: https://hub.meltano.com/
[Meltano SDK]: https://github.com/meltano/sdk
[meltanolabs-target-postgres]: https://pypi.org/project/meltanolabs-target-postgres/
[Singer]: https://github.com/singer-io
[Meltano PostgreSQL target]: https://pypi.org/project/meltanolabs-target-postgres/
[meltano-target-cratedb]: https://github.com/crate-workbench/meltano-target-cratedb
[Singer]: https://www.singer.io/
[Singer Spec]: https://hub.meltano.com/singer/spec/
[PipelineWise]: https://transferwise.github.io/pipelinewise/
[PipelineWise Taps]: https://transferwise.github.io/pipelinewise/user_guide/yaml_config.html
[SQLAlchemy]: https://www.sqlalchemy.org/
[vanilla package on PyPI]: https://pypi.org/project/meltano-target-cratedb/

0 comments on commit 59dda4e

Please sign in to comment.