Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Improve #80

Merged
merged 1 commit into from
Jun 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,20 @@

First off, thank you for taking the time to contribute!

## Local development

For local development, this should get you going:

```sh
git clone https://github.com/netwerk-digitaal-erfgoed/ld-workbench.git
cd ld-workbench
npm i
npm run compile
npm run ld-workbench -- --configDir static/example
```

The configuration of this project is validated and defined by [JSON Schema](https://json-schema.org). The schema is located in `./static/ld-workbench-schema.json`. To create the types from this schema, run `npm run util:json-schema-to-typescript`. This will regenerate `./src/types/LDWorkbenchConfiguration.d.ts`, do not modify this file by hand.

## Committing changes

This repository follows [Semantic Versioning](https://semver.org). Tags and [releases](/releases) are
Expand Down
115 changes: 26 additions & 89 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,32 @@ LD Workbench is **scalable** due to its iterator/generator approach:
LD Workbench is **extensible** because it uses pure SPARQL queries (instead of code) for configuring transformation pipelines.
Each pipeline is a sequence of stages; each stage consists of an iterator and generator.

## Configuration
## Usage

To get started with LD Workbench, first install [NodeJS](https://nodejs.org), then run:

```sh
npx @netwerk-digitaal-erfgoed/ld-workbench@latest --init
````

This creates an example LD Workbench pipeline in the `pipelines/configurations/example` directory
and runs that pipeline right away. The output is written to `pipelines/data`.

To run the pipeline again:

```sh
npx @netwerk-digitaal-erfgoed/ld-workbench@latest
```

Your workbench is now ready for use. You can continue by creating your own pipeline configurations.

### Configuration

An LD Workbench pipeline is defined with a YAML configuration file. The configuration is validated by a JSON Schema. The schema is part of this repository ([link](https://github.com/netwerk-digitaal-erfgoed/ld-workbench/blob/main/static/ld-workbench.schema.json)). The YAML and JSON Schema combination is tested to work in the VSCode editor.
An LD Workbench pipeline is defined with a YAML configuration file, validated by a [JSON Schema](https://json-schema.app/view/%23?url=https%3A%2F%2Fraw.githubusercontent.com%2Fnetwerk-digitaal-erfgoed%2Fld-workbench%2Fmain%2Fstatic%2Fld-workbench.schema.json).

A pipeline must have a name, one or more stages, and optionally a description. Multiple pipelines can be configured as long as they have unique names. See the [example configuration file](https://github.com/netwerk-digitaal-erfgoed/ld-workbench/blob/main/static/example/config.yml) for a boilerplate configuration file. A visualization of the schema gives more insights on required and optional properties can be [found here](https://json-schema.app/view/%23?url=https%3A%2F%2Fraw.githubusercontent.com%2Fnetwerk-digitaal-erfgoed%2Fld-workbench%2Fmain%2Fstatic%2Fld-workbench.schema.json).
A pipeline must have a name, one or more stages, and optionally a description. Multiple pipelines can be configured as long as they have unique names. See the [example configuration file](https://github.com/netwerk-digitaal-erfgoed/ld-workbench/blob/main/static/example/config.yml) for a boilerplate configuration file.

### Example YAML File For Configuration Options
#### Example YAML File For Configuration Options

```yaml
name: MyPipeline
Expand Down Expand Up @@ -56,92 +75,10 @@ stages:
destination: output/stage2-result.ttl
```

### Configuration Options Table

| Section | Variable | Description | Required |
|----------------------------------|--------------------|---------------------------------------------------------------------------------------------------------------------|----------|
| General Configuration File | name | The name of your pipeline, it must be unique over all your configurations. | Yes |
| | description | An optional description for your pipeline. | No |
| | destination | The file where the final result of your pipeline is saved. | No |
| Stage | name | The name of your pipeline step, it must be unique within one configuration. | Yes |
| | destination | The file where the results are saved. This is not a required property; if omitted, a temporary file will be created automatically. | No |
| Iterator | query | Path (prefixed with "file://") of SPARQL Query `.rq` file or SPARQL Query string that makes the iterator using SPARQL select. | Yes |
| | endpoint | The SPARQL endpoint for the iterator. If it starts with "file://", a local RDF file is queried. If omitted, the result of the previous stage is used. | No |
| | batchSize | Overrule the iterator's behavior of fetching 10 results per request, regardless of any limits in your query. | No |
| | delay | Human-readable time delay for the iterator's SPARQL endpoint requests (e.g., '5ms', '100 milliseconds', '1s'). | No |
| Generator | query | Path (prefixed with "file://") of SPARQL Query `.rq` file or SPARQL Query string that makes the generator using SPARQL construct. | Yes |
| | endpoint | The SPARQL endpoint for the generator. If it starts with "file://", a local RDF file is queried. If omitted, the endpoint of the Iterator is used. | No |
| | batchSize | Overrule the generator's behavior of fetching results for 10 bindings of $this per request. | No |

## Installation

1. Install Node.js 20.10.0 or larger, by going to <https://nodejs.org> and following the instructions for your OS.

Run the following command to test whether the installation succeeded:

```sh
npm --version
node --version
```

2. Install LD Workbench:

```sh
npx @netwerk-digitaal-erfgoed/ld-workbench --init
```

Your workbench is now ready for use.

## Usage

Once installed, an example workbench is present that can be run with the following command:

```sh
npx @netwerkdigitaalergoed/ld-workbench
```

### Configuring a workbench pipeline

To keep your workbench workspace clean, create a folder for each pipeline that contains the configuration and the SPARQL Select and Construct queries. Use the `static` directory for this.
#### Configuration options

Here is an example of how your file structure may look:

```sh
ld-workbench
|-- static
| |-- my-pipeline
| | |-- configuration.yaml
| | |-- select.rq
| | |-- construct.rq
```
For a full overview of configuration options, please see the [schema](https://json-schema.app/view/%23?url=https%3A%2F%2Fraw.githubusercontent.com%2Fnetwerk-digitaal-erfgoed%2Fld-workbench%2Fmain%2Fstatic%2Fld-workbench.schema.json).

## Development

For local development, the following command should get you going:

```sh
git clone https://github.com/netwerk-digitaal-erfgoed/ld-workbench.git
cd ld-workbench
npm i
npm run compile
```

To start the CLI tool you can use this command:

```sh
npm run ld-workbench -- --configDir static/example
```

Since this project is written in Typescript, your code needs to be transpiled to Javascript before you can run it (using `npm run compile`). With `npm run dev` the transpiler will watch changes in the Typescript code an transpiles on each change.

The configuration of this project is validated and defined by [JSON Schema](https://json-schema.org). The schema is located in `./static/ld-workbench-schema.json`. To create the types from this schema, run `npm run util:json-schema-to-typescript`. This will regenerate `./src/types/LDWorkbenchConfiguration.d.ts`, do not modify this file by hand.

## Workflow & Class Descriptions

### Workflow

This figure represents the workflow of the LD Workbench application:

![Workflow of the LD-Workbench application](static/figures/diagram.svg)

A Pipeline can have multiple Stages, specified in the configuration file. A Stage has one Iterator and can have multiple Generators in it's configuration. An Iterator has to be connected to a SPARQL endpoint, when none is specified for the Generator(s), the Generator reuses the same SPARQL endpoint to generate linked data, when a different endpoint is specified in the Generator's configuration, this endpoint is used instead.
If you want to help develop LD Workbench, please see the [CONTRIBUTING.md](CONTRIBUTING.md) file.
2 changes: 1 addition & 1 deletion src/utils/loadPipelines.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ export default function loadPipelines(
throw new Error(
`Configuration directory/file ${chalk.italic(
configDirOrFile
)} could not be found.\nIf this is your first run of LDWorkbench, you might want to use \`npx ld-workbench --init\` to setup an example workbench project.`
)} could not be found.\nIf this is your first run of LDWorkbench, you might want to use \`npx @netwerk-digitaal-erfgoed/ld-workbench@latest --init\` to set up an example workbench project.`
);

const files: string[] = [];
Expand Down
4 changes: 2 additions & 2 deletions src/utils/tests/utilities.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -191,12 +191,12 @@ describe('Utilities', () => {
expect(() => loadPipelines(nonExistentConfFile)).to.throw(
`Configuration directory/file ${chalk.italic(
nonExistentConfFile
)} could not be found.\nIf this is your first run of LDWorkbench, you might want to use \`npx ld-workbench --init\` to setup an example workbench project.`
)} could not be found.\nIf this is your first run of LDWorkbench, you might want to use \`npx @netwerk-digitaal-erfgoed/ld-workbench@latest --init\` to set up an example workbench project.`
);
expect(() => loadPipelines(nonExistentDirWithFile)).to.throw(
`Configuration directory/file ${chalk.italic(
nonExistentDirWithFile
)} could not be found.\nIf this is your first run of LDWorkbench, you might want to use \`npx ld-workbench --init\` to setup an example workbench project.`
)} could not be found.\nIf this is your first run of LDWorkbench, you might want to use \`npx @netwerk-digitaal-erfgoed/ld-workbench@latest --init\` to set up an example workbench project.`
);
});
it('should throw if directory has no .yml configuration file', () => {
Expand Down
4 changes: 0 additions & 4 deletions static/figures/diagram.svg

This file was deleted.

22 changes: 12 additions & 10 deletions static/ld-workbench.schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,31 +8,32 @@
"properties": {
"name": {
"type": "string",
"description": "The name of your pipeline, it must be unique over all your configurations."
"description": "The name of your pipeline. It must be unique over all your configurations."
},
"description": {
"type": "string",
"description": "An optional description for your pipeline."
},
"baseDir": {
"type": "string",
"description": "The base directory for files referenced by file://... paths. Defaults to the parent directory of the YAML config file."
"description": "An optional base directory for files referenced by `file://...` paths.",
"default": "The directory that contains the YAML config file."
},
"destination": {
"type": "string",
"description": "The file where the final result of your pipeline is saved."
},
"stages": {
"type": "array",
"description": "This is where you define the individual iterator/generator for each step.",
"description": "A pipeline stage consists of an iterator and one or more generators.",
"minItems": 1,
"items": {
"type": "object",
"additionalProperties": false,
"properties": {
"name": {
"type": "string",
"description": "The name of your pipeline step, it must be unique within one configuration."
"description": "The name of the stage. It must be unique within the pipeline."
},
"iterator": {
"type": "object",
Expand All @@ -41,20 +42,21 @@
"properties": {
"query": {
"type": "string",
"description": "Path (prefixed with \"file://\") or SPARQL Query \nthat makes the iterator using SPARQL select."
"description": "SPARQL SELECT query that returns a `$this` binding for each URI that will be passed to the generator(s). Either an inline string (`SELECT $this WHERE {...}`) or a reference to a file (`file://...`) that contains the query."
},
"endpoint": {
"type": "string",
"description": "The SPARQL endpoint for the iterator. \nIf it starts with \"file://\", a local RDF file is queried.\nIf ommmitted the result of the previous file is used."
"description": "SPARQL endpoint for the iterator. If it starts with `file://`, a local RDF file is queried. If omitted the result of the previous stage is used."
},
"batchSize": {
"type": "number",
"minimum": 1,
"description": "Overrule the iterator's behaviour of fetching 10 results per request, regardless of any limit's in your query."
"description": "Number of `$this` bindings retrieved per query.",
"default": "The LIMIT value of your iterator query or 10 if no LIMIT is present."
},
"delay": {
"type": "string",
"description": "Human readable time delay for the iterator's SPARQL endpoint requests (e.g. '5ms', '100 milliseconds', '1s'). "
"description": "Human-readable time delay for requests to the the iterators SPARQL endpoint (e.g. `5ms`, `100 milliseconds`, `1s`)."
}
}
},
Expand All @@ -68,7 +70,7 @@
"properties": {
"query": {
"type": "string",
"description": "Path (prefixed with \"file://\") or SPARQL Query \nthat makes the generator using SPARQL construct."
"description": "SPARQL CONSTRUCT query that takes a `$this` binding from the iterator and generates triples for it. Either an inline string (`CONSTRUCT $this schema:name ?name WHERE {$this ...}`) or a reference to a file (`file://...`) that contains the query."
},
"endpoint": {
"type": "string",
Expand All @@ -84,7 +86,7 @@
},
"destination": {
"type": "string",
"description": "The file where the results are saved. \nThis is not a required property, \nif ommitted a temporary file will be created automatically."
"description": "The optional path where the results are saved. If omitted, a temporary file will be created."
}
},
"required": ["name", "iterator", "generator"]
Expand Down