Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(docs): update Data Catalog documentation #1662

Merged
merged 11 commits into from
Sep 27, 2024
97 changes: 43 additions & 54 deletions docs/runtime_suite_templates/data-catalog/20_configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,49 +186,6 @@ The configuration has the following main sections:
}
```

### Secret support

In k8s environments secrets can be injected in a running workload as an environment variable,
as a standalone file or a INI key in a standalone file. Such secrets may be base64 encoded.

_Data Catalog Agent_ configuration supports referencing such secrets inline in selected fields of the
JSON configuration file. When the field supports secrets you may write a plain string or objects.

In case of a string the secret is considered `plain` and written in the config file.
In case of an object with `env` guard like:

```json
{
"type": "env",
"key": "MY_SECRET_ENV_VAR"
}
```

the agent will use the content of the env var `MY_SECRET_ENV_VAR`. An extra `encoding` field equal
to `base64` can be used to specify a pre-read decoded to use.

In case of an object with `file` guard like:

```json
{
"type": "file",
"path": "/path/to/secret"
}
```

it will use the content of the file on such `path`. If the file is formatted as an `ini` file a `key` may
be specified

```json
{
"type": "file",
"path": "/path/to/secret",
"key": "CONNECTION_STRING"
}
```

An extra `encoding` field equal to `base64` can be used to specify a pre-read decoded to use.

Secretable fields are marked in the following sections.

## Connections
Expand Down Expand Up @@ -305,7 +262,7 @@ Other keys are `host` and `port` which for a **PostgreSQL** connection are defau

#### Secretable fields

`uid`, `pwd` or `params` support secrets
`uid`, `pwd` or `params` support [secrets resolution](/fast_data/configuration/secrets_resolution.md).

### Oracle

Expand Down Expand Up @@ -461,7 +418,7 @@ Also the environment variable must be set:

#### Secretable fields

`uid`, `pwd` or `params` support secrets
`uid`, `pwd` or `params` support [secrets resolution](/fast_data/configuration/secrets_resolution.md).

### MS SQL server

Expand Down Expand Up @@ -505,7 +462,7 @@ Other keys are `host` and `port` which for a **PostgreSQL** connection are defau

#### Secretable fields

`uid`, `pwd` or `params` support secrets
`uid`, `pwd` or `params` support [secrets resolution](/fast_data/configuration/secrets_resolution.md).

### MySQL

Expand Down Expand Up @@ -549,7 +506,7 @@ Other keys are `host` and `port` which for a **PostgreSQL** connection are defau

#### Secretable fields

`uid`, `pwd` or `params` support secrets
`uid`, `pwd` or `params` support [secrets resolution](/fast_data/configuration/secrets_resolution.md).

### Mia CRUD Service

Expand Down Expand Up @@ -714,22 +671,27 @@ Now you should have everything you need to fill out the configuration parameters

#### Secretable fields

`clientId`, `username`, `clientSecret`, `password`, `securityToken` or `privateKey` support secrets
`clientId`, `username`, `clientSecret`, `password`, `securityToken` or `privateKey` support [secrets resolution](/fast_data/configuration/secrets_resolution.md).

## Targets

There are 4 targets available:

1. [**default**] stdout
2. file
3. mia-console
1. [**default**] `stdout`
2. `mongodb`
2. `file`
3. `mia-console`

For each listed connection, after metadata is retrieved, `agent` **sequentially** sends data to the target as:

- `json` for `stdout` and `file`
- [`ndjson`](https://github.com/ndjson/ndjson-spec) for `mia-console`
- `json` for `stdout` and `file`;
- [`ndjson`](https://github.com/ndjson/ndjson-spec) for `mia-console`.
- [`BSON`](https://bsonspec.org/) for `mongodb`

The final content is an `array` of models. Model spec is given in the form of a <a download target="_blank" href="/docs_files_to_download/data-catalog/model.schema.json">JSON schema</a>.
The final content is an `array` of models, where the format of its records changes accordingly to the target:

- `stdout`, `file` and `mia-console`: the models are written in the native agent format, which is defined in the following <a download target="_blank" href="/docs_files_to_download/data-catalog/agent.model.schema.json">JSON schema</a>;
- `mongodb`: the models are written in a format that is supported by the [Data Catalog](/data_catalog/overview.mdx) application, as defined in the following <a download target="_blank" href="/docs_files_to_download/data-catalog/catalog.model.schema.json">JSON schema</a>;

### Standard Output

Expand All @@ -744,6 +706,29 @@ To explicitly configure the `stdout` target use:
}
```

### MongoDB

The MongoDB target enables Data Catalog Agent to feed data from external sources to the [Data Catalog](/data_catalog/overview.mdx) application.

To configure the `mongodb` target use:

```js
{
// ...
"target": {
"type": "mongodb",
"url": "mongodb://test:27017/?replicaSet=rs", // 👈 mongodb connection string: the database must be a replica set
"database": "test_database", // 👈 if defined, it will be used as default database to store the models
}
}
```

The target will write the content of the connections to a MongoDB replica set database, in a collection named `open-lineage-datasets`.

:::tip
To enforce document validation on that collection, be sure to run [Data Catalog Configuration Scripts](/data_catalog/database_setup.mdx) before executing the agent.
:::

### File

To configure the `file` target use:
Expand Down Expand Up @@ -771,6 +756,10 @@ which will save output files in the folder `./output`. To override this use:

### MIA Console

:::caution
This target has been **deprecated** in favour of [`mongodb`](#mongodb) to support [Data Catalog](/data_catalog/overview.mdx) solution.
:::

To configure the `mia-console` target use:

```json
Expand Down
28 changes: 28 additions & 0 deletions docs/runtime_suite_templates/data-catalog/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,34 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [1.3.2] - 2024-09-20

### Added

#### Targets

- `mongodb` target. Models will be stored on a dedicated collection with the following target configuration:
```json
{
// ...
"target": {
"type": "mongodb",
"url": "mongodb://test:27017/?replicaSet=rs", // 👈 mongodb connection string: the database must be a replica set
"database": "test_database", // 👈 if defined, it will be used as default database to store the models
}
}
```

The record will be stored in a collection named `open-lineage-datasets`.

> **NOTE:**
>
> To use MongoDB as a target, the database must be configured as a replica set.

### Updated

- _Data Catalog Agent_ bumped to version `0.6.4`

## [1.3.1] - 2024-07-31

### Updated
Expand Down
Loading
Loading