Skip to content

Commit

Permalink
chore(docs): update Data Catalog documentation (#1662)
Browse files Browse the repository at this point in the history
* chore(docs): update Data Catalog documentation

* chore(docs): update Data Catalog documentation

* chore(docs): update Data Catalog documentation

* chore(docs): update Data Catalog documentation

* chore(docs): update Data Catalog documentation

* chore(docs): update Data Catalog documentation

* chore(docs): update Data Catalog documentation

* chore(docs): update Data Catalog documentation

* remove files related to the template

* chore(docs): update Data Catalog documentation

* chore(docs): update Data Catalog documentation

---------

Co-authored-by: Ian <[email protected]>
  • Loading branch information
bot-targa and Ian authored Sep 27, 2024
1 parent 0f11fff commit bc941eb
Show file tree
Hide file tree
Showing 7 changed files with 481 additions and 54 deletions.
19 changes: 19 additions & 0 deletions docs/runtime_suite_applications/data-catalog/10_overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
id: overview
title: Overview
sidebar_label: Overview
---

<!--
WARNING: this file was automatically generated by Mia-Platform Doc Aggregator.
DO NOT MODIFY IT BY HAND.
Instead, modify the source file and run the aggregator to regenerate this file.
-->

_Data Catalog_ is a Mia-Platform Marketplace application designed to configure in your Console project the
components of [Data Catalog](/docs/data-catalog/overview.mdx) solution.
It streamlines adding the necessary microservices, endpoints and configuration maps providing blueprint
from which further customization can be executed to build and deploy an ad-hoc Data Catalog solution.

An in depth explanation of what is Mia-Platform Data Catalog, which are its components and how to configure them
can be found in the documentation section dedicated to the [product](/docs/data-catalog/overview.mdx).
4 changes: 4 additions & 0 deletions docs/runtime_suite_applications/data-catalog/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"label": "Data Catalog",
"position": 10
}
20 changes: 20 additions & 0 deletions docs/runtime_suite_applications/data-catalog/changelog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
id: changelog
title: Changelog
sidebar_label: CHANGELOG
---

<!--
WARNING: this file was automatically generated by Mia-Platform Doc Aggregator.
DO NOT MODIFY IT BY HAND.
Instead, modify the source file and run the aggregator to regenerate this file.
-->

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## Unreleased

Initial release of Data Catalog application
97 changes: 43 additions & 54 deletions docs/runtime_suite_templates/data-catalog/20_configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,49 +186,6 @@ The configuration has the following main sections:
}
```

### Secret support

In k8s environments secrets can be injected in a running workload as an environment variable,
as a standalone file or a INI key in a standalone file. Such secrets may be base64 encoded.

_Data Catalog Agent_ configuration supports referencing such secrets inline in selected fields of the
JSON configuration file. When the field supports secrets you may write a plain string or objects.

In case of a string the secret is considered `plain` and written in the config file.
In case of an object with `env` guard like:

```json
{
"type": "env",
"key": "MY_SECRET_ENV_VAR"
}
```

the agent will use the content of the env var `MY_SECRET_ENV_VAR`. An extra `encoding` field equal
to `base64` can be used to specify a pre-read decoded to use.

In case of an object with `file` guard like:

```json
{
"type": "file",
"path": "/path/to/secret"
}
```

it will use the content of the file on such `path`. If the file is formatted as an `ini` file a `key` may
be specified

```json
{
"type": "file",
"path": "/path/to/secret",
"key": "CONNECTION_STRING"
}
```

An extra `encoding` field equal to `base64` can be used to specify a pre-read decoded to use.

Secretable fields are marked in the following sections.

## Connections
Expand Down Expand Up @@ -305,7 +262,7 @@ Other keys are `host` and `port` which for a **PostgreSQL** connection are defau

#### Secretable fields

`uid`, `pwd` or `params` support secrets
`uid`, `pwd` or `params` support [secrets resolution](/fast_data/configuration/secrets_resolution.md).

### Oracle

Expand Down Expand Up @@ -461,7 +418,7 @@ Also the environment variable must be set:

#### Secretable fields

`uid`, `pwd` or `params` support secrets
`uid`, `pwd` or `params` support [secrets resolution](/fast_data/configuration/secrets_resolution.md).

### MS SQL server

Expand Down Expand Up @@ -505,7 +462,7 @@ Other keys are `host` and `port` which for a **PostgreSQL** connection are defau

#### Secretable fields

`uid`, `pwd` or `params` support secrets
`uid`, `pwd` or `params` support [secrets resolution](/fast_data/configuration/secrets_resolution.md).

### MySQL

Expand Down Expand Up @@ -549,7 +506,7 @@ Other keys are `host` and `port` which for a **PostgreSQL** connection are defau

#### Secretable fields

`uid`, `pwd` or `params` support secrets
`uid`, `pwd` or `params` support [secrets resolution](/fast_data/configuration/secrets_resolution.md).

### Mia CRUD Service

Expand Down Expand Up @@ -714,22 +671,27 @@ Now you should have everything you need to fill out the configuration parameters

#### Secretable fields

`clientId`, `username`, `clientSecret`, `password`, `securityToken` or `privateKey` support secrets
`clientId`, `username`, `clientSecret`, `password`, `securityToken` or `privateKey` support [secrets resolution](/fast_data/configuration/secrets_resolution.md).

## Targets

There are 4 targets available:

1. [**default**] stdout
2. file
3. mia-console
1. [**default**] `stdout`
2. `mongodb`
2. `file`
3. `mia-console`

For each listed connection, after metadata is retrieved, `agent` **sequentially** sends data to the target as:

- `json` for `stdout` and `file`
- [`ndjson`](https://github.com/ndjson/ndjson-spec) for `mia-console`
- `json` for `stdout` and `file`;
- [`ndjson`](https://github.com/ndjson/ndjson-spec) for `mia-console`.
- [`BSON`](https://bsonspec.org/) for `mongodb`

The final content is an `array` of models. Model spec is given in the form of a <a download target="_blank" href="/docs_files_to_download/data-catalog/model.schema.json">JSON schema</a>.
The final content is an `array` of models, where the format of its records changes accordingly to the target:

- `stdout`, `file` and `mia-console`: the models are written in the native agent format, which is defined in the following <a download target="_blank" href="/docs_files_to_download/data-catalog/agent.model.schema.json">JSON schema</a>;
- `mongodb`: the models are written in a format that is supported by the [Data Catalog](/data_catalog/overview.mdx) application, as defined in the following <a download target="_blank" href="/docs_files_to_download/data-catalog/catalog.model.schema.json">JSON schema</a>;

### Standard Output

Expand All @@ -744,6 +706,29 @@ To explicitly configure the `stdout` target use:
}
```

### MongoDB

The MongoDB target enables Data Catalog Agent to feed data from external sources to the [Data Catalog](/data_catalog/overview.mdx) application.

To configure the `mongodb` target use:

```js
{
// ...
"target": {
"type": "mongodb",
"url": "mongodb://test:27017/?replicaSet=rs", // 👈 mongodb connection string: the database must be a replica set
"database": "test_database", // 👈 if defined, it will be used as default database to store the models
}
}
```

The target will write the content of the connections to a MongoDB replica set database, in a collection named `open-lineage-datasets`.

:::tip
To enforce document validation on that collection, be sure to run [Data Catalog Configuration Scripts](/data_catalog/database_setup.mdx) before executing the agent.
:::

### File

To configure the `file` target use:
Expand Down Expand Up @@ -771,6 +756,10 @@ which will save output files in the folder `./output`. To override this use:

### MIA Console

:::caution
This target has been **deprecated** in favour of [`mongodb`](#mongodb) to support [Data Catalog](/data_catalog/overview.mdx) solution.
:::

To configure the `mia-console` target use:

```json
Expand Down
28 changes: 28 additions & 0 deletions docs/runtime_suite_templates/data-catalog/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,34 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [1.3.2] - 2024-09-20

### Added

#### Targets

- `mongodb` target. Models will be stored on a dedicated collection with the following target configuration:
```json
{
// ...
"target": {
"type": "mongodb",
"url": "mongodb://test:27017/?replicaSet=rs", // 👈 mongodb connection string: the database must be a replica set
"database": "test_database", // 👈 if defined, it will be used as default database to store the models
}
}
```

The record will be stored in a collection named `open-lineage-datasets`.

> **NOTE:**
>
> To use MongoDB as a target, the database must be configured as a replica set.
### Updated

- _Data Catalog Agent_ bumped to version `0.6.4`

## [1.3.1] - 2024-07-31

### Updated
Expand Down
Loading

0 comments on commit bc941eb

Please sign in to comment.