Replies: 2 comments
-
Thanks @lwinfree. I can help with writing a json-schema if there is interest/need |
Beta Was this translation helpful? Give feedback.
-
The declarative pipeline is intended to be a first-class citizen in Frictionless Framework - https://framework.frictionlessdata.io/docs/guides/transform-guide#transforming-pipeline - along with data schemas like Data Package/Resource, Table Schema, and validation Inquiry/Report. Currently, the schemas are not fully finished and there is no static JSON Schema published yet (for validation it uses internal profiles stored as python objects). Here is an example of a pipeline:
name: pipeline
tasks:
- name: pipeline-task
type: resource
source:
# FD resource descriptor
path: data/transform.csv
steps:
- code: cell-set
fieldName: population
value: 100 Every pipeline consists of a list of pipeline tasks (extracted from above):
name: pipeline-task
type: resource
source:
# FD resource descriptor
path: data/transform.csv
steps:
- code: cell-set
fieldName: population
value: 100 The main idea is that instead of rolling out an independent spec the pipeline task just uses Data Resource or Data Package spec as a source property (the same as validation Inquiry). So you can have as a source any valid data package or resource. The second part is "steps" every of which is a pair of step's code and its arguments. The exact steps are something that is harder to standardize because it depends on the implementation, for example, here are available Python's steps - https://github.com/frictionlessdata/frictionless-py/tree/main/frictionless/steps. Every step has Probably we need to submit these materials (and also ones regarding validation) as the specs' patterns for easier consumption across different platforms. Also, I can generate a JSON Schema for that if needed. Currently, we only have a "container" one - https://github.com/frictionlessdata/frictionless-py/blob/main/frictionless/assets/profiles/pipeline.json |
Beta Was this translation helpful? Give feedback.
-
Asked by Dan F in the community call on 26 August: “I was looking for was some sort of schema for the pipeline or task objects. I’m working on a PHP application for data management and solving similar problems -- want to reuse frictionless schemas/standards as much as possible for interoperability"
@roll tagging you here to start a discussion about this! :-)
Beta Was this translation helpful? Give feedback.
All reactions