From e63e8a0cda82b18dd4766e72a440a92e08f0f12e Mon Sep 17 00:00:00 2001 From: Jonathan Matthews Date: Thu, 26 Oct 2023 12:23:46 +0100 Subject: [PATCH] concept: import cuelang.org/docs/usecases/ This change imports each of the 6 pages under https://cuelang.org/docs/usecases/ as its own Concept Guide, as of commit 4b2c4880862e096ea38517da3c6a938b434581bf. Whilst the words on most of the use case pages are valuable and worth preserving (perhaps with the slight exception of the "Scripting" page!), they're not really "Concept Guides". This is the closest category that fits their content right now, but is a sign that their content should probably be teased apart and made into more Diataxis-friendly pieces of documentation, over time. For now, let's import them here. Each file was only changed sufficiently to allow alpha's preprocessor and hugo version to parse the source successfully. A diff of each can be generated with these commands: git diff 4b2c4880862e096ea38517da3c6a938b434581bf:content/en/docs/usecases/generate.md content/docs/concept/code-generation-and-extraction-use-case/en.md git diff 4b2c4880862e096ea38517da3c6a938b434581bf:content/en/docs/usecases/configuration.md content/docs/concept/configuration-use-case/en.md git diff 4b2c4880862e096ea38517da3c6a938b434581bf:content/en/docs/usecases/validation.md content/docs/concept/data-validation-use-case/en.md git diff 4b2c4880862e096ea38517da3c6a938b434581bf:content/en/docs/usecases/query.md content/docs/concept/querying-use-case/en.md git diff 4b2c4880862e096ea38517da3c6a938b434581bf:content/en/docs/usecases/datadef.md content/docs/concept/schema-definition-use-case/en.md git diff 4b2c4880862e096ea38517da3c6a938b434581bf:content/en/docs/usecases/scripting.md content/docs/concept/scripting-use-case/en.md During import, cue-lang/cue#2666 was opened to request the preprocessor change how it formats multi-document YAML files. Closes cue-lang/docs-and-content#23 Preview-Path: /docs/concept/code-generation-and-extraction-use-case/ Preview-Path: /docs/concept/configuration-use-case/ Preview-Path: /docs/concept/data-validation-use-case/ Preview-Path: /docs/concept/querying-use-case/ Preview-Path: /docs/concept/schema-definition-use-case/ Preview-Path: /docs/concept/scripting-use-case/ Signed-off-by: Jonathan Matthews Change-Id: Iabf035fa1eaed25c663811f9ab1a3208d3c2dac9 Dispatch-Trailer: {"type":"trybot","CL":1171276,"patchset":7,"ref":"refs/changes/76/1171276/7","targetBranch":"alpha"} --- .../en.md | 118 ++++++++ .../gen_cache.cue | 20 ++ .../page.cue | 3 + .../docs/concept/configuration-use-case/en.md | 269 +++++++++++++++++ .../configuration-use-case/gen_cache.cue | 18 ++ .../concept/configuration-use-case/page.cue | 3 + .../concept/data-validation-use-case/en.md | 138 +++++++++ .../data-validation-use-case/gen_cache.cue | 19 ++ .../concept/data-validation-use-case/page.cue | 3 + content/docs/concept/querying-use-case/en.md | 31 ++ .../docs/concept/querying-use-case/page.cue | 3 + .../concept/schema-definition-use-case/en.md | 282 ++++++++++++++++++ .../schema-definition-use-case/gen_cache.cue | 20 ++ .../schema-definition-use-case/page.cue | 3 + content/docs/concept/scripting-use-case/en.md | 20 ++ .../docs/concept/scripting-use-case/page.cue | 3 + .../index.md | 115 +++++++ .../concept/configuration-use-case/index.md | 268 +++++++++++++++++ .../concept/data-validation-use-case/index.md | 136 +++++++++ .../docs/concept/querying-use-case/index.md | 31 ++ .../schema-definition-use-case/index.md | 281 +++++++++++++++++ .../docs/concept/scripting-use-case/index.md | 20 ++ 22 files changed, 1804 insertions(+) create mode 100644 content/docs/concept/code-generation-and-extraction-use-case/en.md create mode 100644 content/docs/concept/code-generation-and-extraction-use-case/gen_cache.cue create mode 100644 content/docs/concept/code-generation-and-extraction-use-case/page.cue create mode 100644 content/docs/concept/configuration-use-case/en.md create mode 100644 content/docs/concept/configuration-use-case/gen_cache.cue create mode 100644 content/docs/concept/configuration-use-case/page.cue create mode 100644 content/docs/concept/data-validation-use-case/en.md create mode 100644 content/docs/concept/data-validation-use-case/gen_cache.cue create mode 100644 content/docs/concept/data-validation-use-case/page.cue create mode 100644 content/docs/concept/querying-use-case/en.md create mode 100644 content/docs/concept/querying-use-case/page.cue create mode 100644 content/docs/concept/schema-definition-use-case/en.md create mode 100644 content/docs/concept/schema-definition-use-case/gen_cache.cue create mode 100644 content/docs/concept/schema-definition-use-case/page.cue create mode 100644 content/docs/concept/scripting-use-case/en.md create mode 100644 content/docs/concept/scripting-use-case/page.cue create mode 100644 hugo/content/en/docs/concept/code-generation-and-extraction-use-case/index.md create mode 100644 hugo/content/en/docs/concept/configuration-use-case/index.md create mode 100644 hugo/content/en/docs/concept/data-validation-use-case/index.md create mode 100644 hugo/content/en/docs/concept/querying-use-case/index.md create mode 100644 hugo/content/en/docs/concept/schema-definition-use-case/index.md create mode 100644 hugo/content/en/docs/concept/scripting-use-case/index.md diff --git a/content/docs/concept/code-generation-and-extraction-use-case/en.md b/content/docs/concept/code-generation-and-extraction-use-case/en.md new file mode 100644 index 0000000000..c6143a58f2 --- /dev/null +++ b/content/docs/concept/code-generation-and-extraction-use-case/en.md @@ -0,0 +1,118 @@ +--- +title: "Code Generation and Extraction use case" +description: "Converting CUE constraints to and from definitions in other languages" +toc_hide: true +--- + +Code generation and extraction is a broad topic and, for instance, overlaps +with the topics discussed in +[Schema Definition]({{< relref "/docs/concept/schema-definition-use-case" >}}) and +[Go](/docs/integrations/go). + + +In this section we emphasize the role of CUE in a code-generation pipeline, +that is using CUE as an interlingua for the extraction from and the +generation to multiple sources. + + +## Core issues addressed by CUE + +### Extract data definition from existing sources + +When one identifies the need to define interchangeable data schema +one usually already has some code base to deal with. + +CUE can currently extract definitions from: + + +- [Go code](/docs/integrations/go#extract-cue-from-go) +- Protobuf definitions. + +Moreover, CUE can combine and reduce the constraints from various sources +and report if there are any inconsistencies. + + +### Enhance existing standards + +CUE also allows annotating existing sources with CUE expressions. +This allows one to keep using existing sources or allow for a smoother +transition into taking a CUE-centric approach. +For instance, a project might be quite reliant on protobuf definitions +as the source of truth of at least one aspect of schema definition. +For this particular case, CUE allows annotating Protobuf field declarations +with CUE expressions using field options. + +{{{with code "en" "proto-1"}}} +-- in.proto -- +message Server { + int32 port = 1 [(cue.val) = ">5000 & <10_000"]; +} +{{{end}}} + +A similar approach is supported for Go: + +{{{with code "en" "go"}}} +-- in.go -- +type Sum struct { + A int `cue:"c-b" json:"a,omitempty"` + B int `cue:"c-a" json:"b,omitempty"` + C int `cue:"a+b" json:"c,omitempty"` +} +{{{end}}} + +In both cases, the constraints will be included the extraction to CUE. +In the case of Go, the constraints specified in the field tags can also +be used to validate Go structs directly. + + +### Convert CUE to other standards + +Currently, CUE supports converting CUE to OpenAPI and Go, although it is +certainly not limited to these cases. + + +## Comparisons + +### CEL + +The [Common Expression Language](https://github.com/google/cel-spec), +or CEL, defines a simple expression language that can be used as a +standardization of constraints. +It focuses on simplicity, speed, termination guarantees and +being able run embedded in applications. + +Unification of basic typed-feature structures has pseudo-linear run +time complexity. +The addition of comprehensions make the operation polynomial. +Not disallowing recursion would make CUE Turing complete. +The addition of sum types in CUE make certain operations NP-complete. +The NP-completeness manifests itself only when reasoning over incomplete types. +Trying to optimize a CEL expression would generally suffer from the same issue. +The same problem does not exist when applying CUE to concrete values. + +That said, CUE is currently not optimized for embedded running. +Currently, generated Go stubs embed a CUE interpreter into the code. +These stubs are compatible with a mode where CUE generates native code, +which would give it similar characteristics. + +CEL allows embedded implementation to add arbitrary functions. +CUE does not. +CUE keeps tight control over the pureness or hermeticity of evaluation +and to ensure the properties of the value lattice are not broken. +It would be possible, however, to provide the ability to add custom functions +for restricted to concrete values. + + +### Protoc-gen-validate (PGV) + +PGV also allows annotating Protobuf fields with validation code, +with implementations for Go and Java and an experimental versions for C++ +as of this writing. + + +{{{with code "en" "proto-2"}}} +-- in.proto -- +message Server { + int32 port = 1 [(validate.rules).int32 = { gte: 5000, lte: 10000 }]; +} +{{{end}}} diff --git a/content/docs/concept/code-generation-and-extraction-use-case/gen_cache.cue b/content/docs/concept/code-generation-and-extraction-use-case/gen_cache.cue new file mode 100644 index 0000000000..10846853a6 --- /dev/null +++ b/content/docs/concept/code-generation-and-extraction-use-case/gen_cache.cue @@ -0,0 +1,20 @@ +package site +{ + content: { + docs: { + concept: { + "code-generation-and-extraction-use-case": { + page: { + cache: { + code: { + "proto-1": "WJqxJBmZxawhD1zanSswGQOV/CRvJ/Q8MPj5XjlN0D8=" + go: "i710rUh7cCg46orXaxdFaeJgm2G32l6Mwj3a2GfQTrc=" + "proto-2": "hftRHuNwzV8FyU3H4oto4/ealvn4RCAJHN7xdxPX0Po=" + } + } + } + } + } + } + } +} diff --git a/content/docs/concept/code-generation-and-extraction-use-case/page.cue b/content/docs/concept/code-generation-and-extraction-use-case/page.cue new file mode 100644 index 0000000000..2034070f1a --- /dev/null +++ b/content/docs/concept/code-generation-and-extraction-use-case/page.cue @@ -0,0 +1,3 @@ +package site + +content: docs: concept: "code-generation-and-extraction-use-case": {} diff --git a/content/docs/concept/configuration-use-case/en.md b/content/docs/concept/configuration-use-case/en.md new file mode 100644 index 0000000000..839c198372 --- /dev/null +++ b/content/docs/concept/configuration-use-case/en.md @@ -0,0 +1,269 @@ +--- +title: "Configuration use case" +description: "Managing text-based files to define a desired state of a system" +toc_hide: true +--- + +Arguably, validation should be the foremost task of any configuration language. +Most configuration languages, however, focus on boilerplate removal. +CUE is different in that it takes the validation first stance. +But CUE's constraints are also effective at reducing boilerplate, +although the approach it takes is quite different from conventional +data templating languages. + +CUE basic operation merges configurations in a way that the outcome is +always the same regardless of the order in which it is carried out +(it is associative, commutative and idempotent). +This property is the foundation for many other favorable properties, as discussed below. + + +## Core issues addressed by CUE + +### Type checking + + For large code bases, no one will question a requirement to + have a compiled/typed language. + Why should one not require the same kind of rigor for data? + +Many configuration languages, including GCL and its offspring, focus on +reducing boilerplate as the primary task of configuration. +Support for typing, however, is minimal or almost non-existent. + +Some languages do add typing support, but it is usually +limited to validating basic types, as is common with programming languages. +For data, however, this is insufficient. +Evidence of this is the uprise of standards like CDDL and OpenAPI that +go beyond basic typing. + +In CUE types and values are a unified concept, which gives it very +expressive, yet intuitive and compact, typing capabilities. + +{{{with code "en" "spec"}}} +-- in.cue -- +#Spec: { + kind: string + + name: { + first: !="" // must be specified and non-empty + middle?: !="" // optional, but must be non-empty when specified + last: !="" + } + + // The minimum must be strictly smaller than the maximum and vice versa. + minimum?: int & minimum +} + +// A spec is of type #Spec +spec: #Spec +spec: { + knid: "Homo Sapiens" // error, misspelled field + + name: first: "Jane" + name: last: "Doe" +} +{{{end}}} + +### Simplicity at Scale + +When using a configuration language to reduce boilerplate +one should consider whether the reduced verbosity is worth the +increased complexity. +Most configurations use an override model to reducing boilerplate: +an existing configuration is used as a base and modified to result in +a new configuration. +This is often in the form of inheritance. + +For small-scale projects, +using inheritance can be too complex, and the simplicity of +spelling everything out is often a superior approach. +For large-scale projects, however, using inheritance often leads to deep +layerings of modifications, making it very hard to see where values come from. +In the end, it is again questionable whether the added complexity is worth it. + +Like with other configuration languages, CUE can add complexity if values +are organized to come from multiple places. +However, as CUE disallows overrides, deep layerings are naturally prevented. +More importantly, CUE can also enhance readability. +A definition in one file may apply to values in many other files. +Where one would usually have to open all these files to verify validity; +with CUE one can see it at a glance. + +CUE's approach has been battle-tested in computational linguistics where it +has been used for decades to describe human languages; +effectively very large, complex and irregular configurations. + + +### Abstractions versus Direct Access + +A common debate for configuration languages is whether a language should +provide an abstraction layer for APIs. +On the one hand, abstraction layers allow for protecting the user against misuse. +On the other hand, they need to keep up with API changes and are +inevitably prone to drift. +So it goes. + +CUE addresses both issues. +On the one hand, its fine-grained typing allows layering detailed constraints +on top of native APIs, without the need for an abstraction layer. +New features can be used without support of existing definitions. + +On the other hand, CUE's order independence allows abstraction layers +to inject arbitrary raw API in a controlled manner, +allowing a general escape hatch to support new or uncovered features. +See the Manual section of the +[Kubernetes tutorial](https://github.com/cue-labs/cue-by-example/blob/main/003_kubernetes_tutorial/README.md) +for an example. + +### Tooling + +A configuration language usually transforms its configurations to +a lower-level representation, like JSON, YAML, or Protobuf so that +it can be consumed by tools taking in these languages. +Piping such output to the needed tools works initially; +but sooner or later one will get the desire to automate this, +usually in the form of some kind of tool. + +And so it goes. +The rise of systems requiring advanced configuration has been paired +with a rise of even more specialized command line tools. +The core structure of all these tools is more or less the same. +More annoyingly, many have overlapping functionality yet are hardly extendable +or interoperable. +In the latter case, one may see the need to layer on yet another set of tools. + +Having tools like `kubectl` or `etcdctl` that directly control +core infrastructure makes sense, but at higher levels of +abstraction one needs a more open approach. + +CUE attempts to address this by providing an open, +declarative scripting layer on top of the configuration layer. +Aside from the above-mentioned case, it is designed to address various +other issues: + +- inject environmental data into configuration, something not allowed + in CUE itself (it is pure, or hermetic, or side-effect free) +- inject computed data into configurations as part of a pipeline +- allow composability of tool integration + +Again, the ability to deterministically merge data from different sources +make this a shoo-in task for CUE. + + +## Comparisons + +### Inheritance-based configuration languages + +Inheritance, is +not commutative and idempotent in the general case. In other words, order +matters. +This makes it hard to track where values are coming from. +This is not only true for humans, but also machines. +It makes it very complicated, if not impossible, to do any kind of +automation. +The complexity of inheritance is even bigger if values +can enter an object from one of several directions (super, overlay, etc.). + +The basic operation of CUE is commutative, associative and idempotent. +This order independence helps both +humans and machines. The resulting model is much less complex. + +{{< info >}} +### Inheritance in CUE +Although CUE does not have inheritance in the override sense, it does have +the notion of one value being an instance of another. +In fact, this is a core principle. + +Let's use a real-world example to make this distinction clear: +In the override model of inheritance, one can take an existing template, +say a dog, and modify it to become a cat. +Trim the ears, dry off the nose, and what have you. + +In CUE, it is a matter of classification. +Cats and dogs are both instances of animals, but once an entity is defined +to be a cat, it can never become a dog. +To most humans (aka computer scientists that have not become accustomed +to inheritance) this makes total sense. +{{< /info >}} + +Although one can create instances of values (remember, types are values), +one can not alter any of the values of a parent. +A template acts as a type. +Just as in statically typed languages where one cannot assign an integer to +a string, one cannot violate the properties of a type in CUE. + +These restrictions reduce flexibility, but also enhance clarity. +To ensure that a configuration holds a certain property, just declare it +in any file included in the project to make it so. +There is no need to look at other files. +As we saw; the imposed restrictions can also improve, rather than hurt, +the ability to remove boilerplate compared to inheritance-based languages. + +The complexity of inheritance-based models also hampers automation. +The introduction of GCL was paired with the promise of advanced tooling. +The mantra of declarative languages was even repeated with some of +its offspring. +The tooling never materialized, though, as the model made it intractable. + +CUE already provides power tools like trim, and its API provides +unify and subsumption operations for incomplete configurations, the building +blocks for powerful analysis. + + + + +### Jsonnet/ GCL + +Like Jsonnet, CUE is a superset of JSON. +They also are both influenced by GCL. +CUE, in turn is influenced by Jsonnet. +This may give the semblance that the languages are very similar. +At the core, though, they are very different. + +CUE's focus is data validation whereas Jsonnet focuses on data templating +(boilerplate removal). +Jsonnet was not designed with validation in mind. + +Jsonnet and GCL can be quite powerful at reducing boilerplate. +The goal of CUE is not to be better at boilerplate removal than Jsonnet or GCL. +CUE was designed to be an answer to two major shortcomings of these approaches: +complexity and lack of typing. +Jsonnet reduces some of the complexities of GCL, but largely falls into the +same category. +For CUE, the tradeoff was to add typing and reduce complexity +(for humans and machines), at the expense of giving up flexibility. + + +### HCL + +HCL has some striking similarities with GCL. +But whether this was a coincidence or deliberate, it removes the core +source of complexity of GCL: inheritance. + +It does introduce a poor man's version of inheritance: file overlays. +Fields may be defined in multiple files that get overwritten in a certain +order of the file names. +Although not nearly as complex as GCL, it does have some of the same issues. + +Also, whether the removal of inheritance was a coincidence or great insight, +there is no construct given in return that one might need for larger scale +configuration management. +This means the use of HCL may hit a ceiling for medium to larger setups. + +So what CUE has to offer to users of HCL is: typing, better growth prospects +to larger scale operations, and eliminating the peculiarities of file overlays. + +CUE does borrow one construct from HCL: the folding of single-field objects +onto a single line was directly inspired by HCL's very similar approach. + + + diff --git a/content/docs/concept/configuration-use-case/gen_cache.cue b/content/docs/concept/configuration-use-case/gen_cache.cue new file mode 100644 index 0000000000..9decb89def --- /dev/null +++ b/content/docs/concept/configuration-use-case/gen_cache.cue @@ -0,0 +1,18 @@ +package site +{ + content: { + docs: { + concept: { + "configuration-use-case": { + page: { + cache: { + code: { + spec: "Py6Q0BhvScCMMX3oUJR1vFfjBVTg43b9qYS1q+VHM3M=" + } + } + } + } + } + } + } +} diff --git a/content/docs/concept/configuration-use-case/page.cue b/content/docs/concept/configuration-use-case/page.cue new file mode 100644 index 0000000000..2b24433cb3 --- /dev/null +++ b/content/docs/concept/configuration-use-case/page.cue @@ -0,0 +1,3 @@ +package site + +content: docs: concept: "configuration-use-case": {} diff --git a/content/docs/concept/data-validation-use-case/en.md b/content/docs/concept/data-validation-use-case/en.md new file mode 100644 index 0000000000..5e0185eab7 --- /dev/null +++ b/content/docs/concept/data-validation-use-case/en.md @@ -0,0 +1,138 @@ +--- +title: "Data Validation use case" +description: "Validate text-based or programmatic data" +toc_hide: true +--- + +By far the most straightforward approach to specify data is in plain +JSON or YAML files. +Every value can be looked up right where it needs to be defined. +But even at small scales, one will soon have to deal with +consistency issues. + +Data validation tools allow verifying the consistency of such data +based on a schema. + + +## Core issues addressed by CUE + +### Client-side validation + +There are not too many handy tools to verify plain data files. +Often, validation is relied upon to be done server side. +If it is done client side, it either relies on rather verbose schema +definitions or using custom tools that verify schema for a specific domain. + +The `cue` command line tool provides a fairly straightforward way to +define schema and verify them against a collection of data files. + +Given these two files, the `cue vet` command can verify that the values in +`ranges.yaml` are correct by just mentioning the two files on the command line. + +{{{with code "en" "client-side-validation"}}} +#nofmt(ranges.yaml) https://github.com/cue-lang/cue/issues/2666: multi-document yaml files + +! exec cue vet ranges.yaml check.cue +cmp stderr output +-- check.cue -- +min?: *0 | number // 0 if undefined +max?: number & >min // must be strictly greater than min if defined. +-- ranges.yaml -- +min: 5 +max: 10 +--- +min: 10 +max: 5 +-- output -- +max: invalid value 5 (out of bound >10): + ./check.cue:2:16 + ./ranges.yaml:5:7 +{{{end}}} + + +### Validating document-oriented databases + +Document-oriented databases like Mongo and many others are characterized +by having flexible schema. +Some of them, like Mongo, optionally allow schema definitions, often in the +form of JSON schema. + +CUE constraints can be used to verify document-oriented databases. +Its default mechanism and expression syntax allow for filling in missing +values for an older version of a schema. +More importantly, CUE's order independence allows +"patch" specifications to be separated from the main schema definition. +CUE can take care of merging these and report if there are any inconsistencies +in the definitions, even before they are applied to a concrete case. + +CUE can be applied directly on the data in code using its API, +but it can also be used to compute JSON schemas from CUE definitions. +(See [cuelang.org/go/encoding/openapi](https://pkg.go.dev/cuelang.org/go/encoding/openapi).) +If a document-oriented database natively supports JSON schema it will likely +have its benefits to do so. +Using CUE to generate the schema has several advantages over doing so directly: + +- CUE is far less verbose. +- CUE can extract base definitions from other sources, like Go and Protobuf. +- It allows annotating validation code in these other sources + (e.g. field tags for Go, options for Protobuf). +- CUE's ability to merge, validate, and normalize configurations, + allows separation of concerns between main schema and patches for + older version, for instance. +- CUE can morph definitions in several forms, such as the structural OpenAPI + needed for Kubernetes' CRDs as of version 1.15. + + + + + + +### Migration path + + +As discussed in +["Be useful at all scales"](/docs/about#be-useful-at-all-scales), +there is a high cost to changing languages as one reaches the limits +with a certain approach. + +CUE adds the benefit of type checking to plain data files. +Once in use, it allows the same, +familiar tools to move to something more structured +as this approach reaches its limits. +CUE provides automated rewrite tools, such as `cue import` and `cue trim` +to aid in such migration. + + +## Comparisons + +### JSON Schema + +The closest approach to validating JSON and YAML with schema is the use +of JSON schema and accompanying tools. + +Compared to CUE, JSON schema does not have a unified type and value model. +This makes the ability to use JSON schema for boilerplate reduction minimal. +As it is specified in JSON itself (it is not a DSL) it can be quite verbose. + +Overall CUE is a more concise, yet more powerful schema language. +For instance, in CUE one can specify that two fields need to be identical to +one another: + +{{{with code "en" "jsonschema"}}} +-- in.cue -- +point: { + x: number + y: number +} + +diagonal: point & { + x: y + y: x +} +{{{end}}} + +Such a thing is not possible in JSON schema (or most configuration languages +for that matter). + +More on JSON Schema and its subset, OpenAPI, +in [Schema Definition]({{< relref "/docs/concept/schema-definition-use-case#json-schema--openapi" >}}). diff --git a/content/docs/concept/data-validation-use-case/gen_cache.cue b/content/docs/concept/data-validation-use-case/gen_cache.cue new file mode 100644 index 0000000000..c46545088d --- /dev/null +++ b/content/docs/concept/data-validation-use-case/gen_cache.cue @@ -0,0 +1,19 @@ +package site +{ + content: { + docs: { + concept: { + "data-validation-use-case": { + page: { + cache: { + code: { + "client-side-validation": "7pUvsY+ACV4j/OlB2trgIBbKYtNkjEPZV4R8e2lvyEY=" + jsonschema: "wV6Z2nmqslUZOHfuyOIQnwpFNDC+iExo9/HsTEKVt2g=" + } + } + } + } + } + } + } +} diff --git a/content/docs/concept/data-validation-use-case/page.cue b/content/docs/concept/data-validation-use-case/page.cue new file mode 100644 index 0000000000..42618615eb --- /dev/null +++ b/content/docs/concept/data-validation-use-case/page.cue @@ -0,0 +1,3 @@ +package site + +content: docs: concept: "data-validation-use-case": {} diff --git a/content/docs/concept/querying-use-case/en.md b/content/docs/concept/querying-use-case/en.md new file mode 100644 index 0000000000..506a3febf5 --- /dev/null +++ b/content/docs/concept/querying-use-case/en.md @@ -0,0 +1,31 @@ +--- +title: "Querying use case" +description: "Find data matching certain criteria" +toc_hide: true +--- + +CUE orders all values in a value lattice. +A value more at the top of a hierarchy is what programming languages would +refer to as a type. +Concrete value or constraints on such a "type" are all instances of that type. + +In other words, CUE constraints can be used to find patterns in data. +`cue vet` is a simple instance of this. + +But more elaborate querying in the form of a `find` or `query` subcommand +is certainly possible. +We would love to hear about your envisioned use cases to plan out +such a subcommand. + +## Programmatic Querying + +In the mean time, you can query data programmatically using the CUE API. +What you will need to do is + +- load data and constraints using + `cuelang.org/go/cue.Runtime` or + `cuelang.org/go/cue/load.Instances`. +- Walk over data using `cuelang.org/go/cue.Value`'s `Walk` method + or look up specific values. +- call `pattern.Subsumes(value)`, where `pattern` and `value` are + `cue.Value`s to see if value is an instance of pattern. diff --git a/content/docs/concept/querying-use-case/page.cue b/content/docs/concept/querying-use-case/page.cue new file mode 100644 index 0000000000..2aee765375 --- /dev/null +++ b/content/docs/concept/querying-use-case/page.cue @@ -0,0 +1,3 @@ +package site + +content: docs: concept: "querying-use-case": {} diff --git a/content/docs/concept/schema-definition-use-case/en.md b/content/docs/concept/schema-definition-use-case/en.md new file mode 100644 index 0000000000..6f1f85f36a --- /dev/null +++ b/content/docs/concept/schema-definition-use-case/en.md @@ -0,0 +1,282 @@ +--- +title: "Schema Definition use case" +description: "Defining schema to communicate an API or standard" +toc_hide: true +--- + +A data definition language describes the structure of data. +The structure defined by such a language can, in turn, be used +to verify implementations, validate inputs, or generate code. + +Most modern dedicated data definition languages or standards allow +more than just describing whether a field is an integer or a string. +Standards like OpenAPI and CDDL allow defining things like default +values, ranges, and various other constraints. +OpenAPI even allows for complex logical combinators. + +A key difference, however, is that these standards do not unify schema +and values—the thing that makes CUE so powerful. +There is no value lattice. +This limits these standards in various ways. + + +## Core issues addressed by CUE + +### Validating backwards compatibility + +CUE's model makes it easy to verify that newer versions of schema +are backwards-compatible with older versions. + +Consider the following versions of the same API: + +{{{with code "en" "api-cue"}}} +-- in.cue -- +// Release notes: +// - You can now specify your age and your hobby! +#V1: { + age: >=0 & <=100 + hobby: string +} + +// Release notes: +// - People get to be older than 100, so we relaxed it. +// - It seems not many people have a hobby, so we made it optional. +#V2: { + age: >=0 & <=150 // people get older now + hobby?: string // some people don't have a hobby +} + +// Release notes: +// - Actually no one seems to have a hobby nowadays anymore, so we dropped the field. +#V3: { + age: >=0 & <=150 +} +{{{end}}} + +Declarations with a name starting with `#` are definitions. +Definitions are not emitted when converting to data, for instance when +exporting to JSON, and thus do not need to be concrete in such cases. +Definitions assume the definition of closed structs, which means a user may +only use fields that are explicitly defined. + +In CUE, an API is backwards compatible if it subsumes the older one, or +if the old one is an instance of the new one. + +This can be computed using the API: + +{{{with code "en" "api-go"}}} +-- in.go -- +inst, err := r.Compile("apis", /* text of the above API */) +if err != nil { + // handle error +} +v1, err1 := inst.LookupField("V1") +v2, err2 := inst.LookupField("V2") +v3, err3 := inst.LookupField("V3") +if err1 != nil || err2 != nil || err3 != nil { + // handle errors +} + +// Check if V2 is backwards compatible with V1 +fmt.Println(v2.Value.Subsumes(v1.Value))) // true + +// Check if V3 is backwards compatible with V2 +fmt.Println(v3.Value.Subsumes(v2.Value))) // false +{{{end}}} + +It is as simple as that. +This is the kind of thing that is made possible +by ordering all values in a lattice, like CUE does. +For CUE, checking whether one API is an instance of another is like checking +whether 3 is less than 4. + +Note that `V2` strictly relaxed the API relative to `V1`. +It allowed specifying a wider age range and made the `hobby` field optional. +In `V3` the `hobby` field is explicitly disallowed. +This is not backwards compatibly as it breaks previous field that did +contain a `hobby` field. + +The current API only reports a _yea_ or _nay_. +The plan is to give full actionable reports. +Feedback welcome! + + +### Combining constraints from different sources + +Most data definition languages are often not +explicitly defined for commutativity. +For instance, CDDL, although much less expressive than CUE, introduces operators +that break commutativity. + +The additive property obtained by commutativity is of great value for +data definition. +Constraints often come from many sources. +For instance, one can have constraints from a base template, from code, +policies provided by different departments and policies provided by +a client. + +CUE's additive nature of constraints allows piling up constraints, +in any order, to obtain a new definition. +Which leads us to the next topic. + +### Normalization of data definitions + +Adding constraints from many sources can result in a lot of redundancy. +Even worse, constraints can be specified in different logical forms, +making their additive form verbose and unwieldy. +This is fine if all a system does using these constraints is validate data. +But this is problematic if the added constraints are to form the basis for, +say, human consumption. + +CUE's logical inference engine automatically reduces constraints. +Its API makes it possible to compute and select between +various normal forms to optimize for a certain representation. +This is used in [CUE's OpenAPI generator](/docs/integrations/openapi), +for instance. + + +## Comparisons + +### JSON Schema / OpenAPI + +JSON Schema and OpenAPI are purely data-driven data definition standards. +OpenAPI originates from Swagger. +As of version 3, OpenAPI is more or less a subset of JSON Schema. +OpenAPI is used to define Kubernetes Custom Resource Definitions. +As of 1.15, this requires a variant of OpenAPI called Structural OpenAPI. +We will collectively refer to these as OpenAPI henceforth. + +OpenAPI does not have any expressions or references. +They have powerful logical operators, though, +that make them remarkably expressive. + +{{< info >}} +### On logical not +OpenAPI defines a `not` operator. +These get fuzzy when defined on structs, which OpenAPI allows. +CUE doesn't have such a construct, partly to avoid its logical pitfalls. +However, it can get a good approximation by interpreting `¬P` as `P→⊥`. +{{< /info >}} + +An advantage of OpenAPI is that it is purely defined in terms of data (JSON). +This allows sending it over the wire. +It is defined such that implementing an interpreter is fairly straightforward. + +One disadvantage is that it is very verbose. +Compare the following two equivalent schema definitions: + +{{{with code "en" "openapi-comparison"}}} +exec true +-- native.cue -- +// Definitions. + +// Info describes... +Info: { + // Name of the adapter. + name: string + + // Templates. + templates?: [...string] + + // Max is the limit. + max?: uint & <100 +} +-- openapi.json -- +{ + "openapi": "3.0.0", + "info": { + "title": "Definitions.", + "version": "v1beta1" + }, + "components": { + "schemas": { + "Info": { + "description": "Info describes...", + "type": "object", + "required": [ + "name" + ], + "properties": { + "name": { + "description": "Name of the adapter.", + "type": "string", + "format": "string" + }, + "templates": { + "description": "Templates.", + "type": "array", + "items": { + "type": "string", + "format": "string" + } + }, + "max": { + "description": "Max is the limit", + "type": "integer", + "minimum": 0, + "exclusiveMaximum": 100 + } + } + } + } + } +} +{{{end}}} + +The difference gets more extreme as more constraints and logical +combinators are used. + +OpenAPI and CUE both have theirs roles. +The JSON format of OpenAPI makes it good interchange standard. +CUE, on the other hand, can serve as an engine to generate and interpret +OpenAPI constraints. +Note that CUE is generally more expressive and many CUE constraints will +not be encodeable in OpenAPI. + + +### OPA / Rego + +Although not designed as a data definition language, Rego, the language +used for Open Policy Agent (OPA), also solves the issue of being able to +add constraints from multiple sources. + +Rego, like CUE, has its roots in logic programming. +It is based on Datalog, a restricted form of Prolog, whereas CUE is based on +typed-feature structure or graph unification. +Typed-feature structures were designed to deal with the shortcomings +of Prolog for applications in encoding human languages. + +Using a Datalog variant for what is essentially a constraint +validation task is somewhat curious. +Datalog makes an excellent query language. +But for constraint enforcement, it is a bit cumbersome as one effectively +first needs to query values to which to apply the constraints. +CUE collates the constraints with the location of the data to which they apply. +As a result, CUE constraints look a lot like the data they constrain, +unlike Rego which will be more reminiscent of a Datalog program. + +But more importantly, CUE's approach is more amenable to finding normalized +and simplified representations of constraints, which makes it more suitable +for creating OpenAPI from them. + + +### CDDL + +The Concise Data Definition Language (CDDL) is used to define +the structure of CBOR or JSON data. +CDDL shares many of the same constructs from CUE, including +disjunctions, embedding, optional fields, and definitions. + +CDDL, however, has no value lattice and does not define mathematical +properties of its data. +There several other aspects in CDDL that contradict the use of a value lattice +or make it harder to do so. +Overall this restricts the expressiveness of CDDL compared to CUE +while complicating the ability to combine constraints on types +from multiple sources. + +Unlike OpenAPI, CDDL is a domain-specific language (DSL). +It needs a specific interpreter. +It also has some non-trivial aspects to its evaluation, making it much harder +than OpenAPI to implement. + diff --git a/content/docs/concept/schema-definition-use-case/gen_cache.cue b/content/docs/concept/schema-definition-use-case/gen_cache.cue new file mode 100644 index 0000000000..18e08143f6 --- /dev/null +++ b/content/docs/concept/schema-definition-use-case/gen_cache.cue @@ -0,0 +1,20 @@ +package site +{ + content: { + docs: { + concept: { + "schema-definition-use-case": { + page: { + cache: { + code: { + "api-cue": "EaxLFdnMbGd3BHjH9ZghDaSqSuGmNwuRaqm5CPyEL1c=" + "api-go": "EEibYnSR5uYcZhmNEHu9CSW2LTc2HpnQubZwlPRN6X4=" + "openapi-comparison": "NYcksiI/gwxoslm/S5XiSe1TTyM/eAioCg+1b41sXdA=" + } + } + } + } + } + } + } +} diff --git a/content/docs/concept/schema-definition-use-case/page.cue b/content/docs/concept/schema-definition-use-case/page.cue new file mode 100644 index 0000000000..36f8c769bd --- /dev/null +++ b/content/docs/concept/schema-definition-use-case/page.cue @@ -0,0 +1,3 @@ +package site + +content: docs: concept: "schema-definition-use-case": {} diff --git a/content/docs/concept/scripting-use-case/en.md b/content/docs/concept/scripting-use-case/en.md new file mode 100644 index 0000000000..1923afeb9e --- /dev/null +++ b/content/docs/concept/scripting-use-case/en.md @@ -0,0 +1,20 @@ +--- +title: "Scripting use case" +description: "Make static data come to life" +toc_hide: true +--- + +CUE has a powerful scripting layer. + +More on this later. + +For now, we refer to the documentation included in the CUE tool itself: +``` +$ cue help cmd +``` +or the "Define Commands" section of the +[Kubernetes Tutorial](https://github.com/cue-labs/cue-by-example/blob/main/003_kubernetes_tutorial/README.md). + +{{< warning >}} +User-defined command line flags are not yet supported. +{{< /warning >}} diff --git a/content/docs/concept/scripting-use-case/page.cue b/content/docs/concept/scripting-use-case/page.cue new file mode 100644 index 0000000000..485a8ccbaf --- /dev/null +++ b/content/docs/concept/scripting-use-case/page.cue @@ -0,0 +1,3 @@ +package site + +content: docs: concept: "scripting-use-case": {} diff --git a/hugo/content/en/docs/concept/code-generation-and-extraction-use-case/index.md b/hugo/content/en/docs/concept/code-generation-and-extraction-use-case/index.md new file mode 100644 index 0000000000..890c8f62e7 --- /dev/null +++ b/hugo/content/en/docs/concept/code-generation-and-extraction-use-case/index.md @@ -0,0 +1,115 @@ +--- +title: "Code Generation and Extraction use case" +description: "Converting CUE constraints to and from definitions in other languages" +toc_hide: true +--- + +Code generation and extraction is a broad topic and, for instance, overlaps +with the topics discussed in +[Schema Definition]({{< relref "/docs/concept/schema-definition-use-case" >}}) and +[Go](/docs/integrations/go). + + +In this section we emphasize the role of CUE in a code-generation pipeline, +that is using CUE as an interlingua for the extraction from and the +generation to multiple sources. + + +## Core issues addressed by CUE + +### Extract data definition from existing sources + +When one identifies the need to define interchangeable data schema +one usually already has some code base to deal with. + +CUE can currently extract definitions from: + + +- [Go code](/docs/integrations/go#extract-cue-from-go) +- Protobuf definitions. + +Moreover, CUE can combine and reduce the constraints from various sources +and report if there are any inconsistencies. + + +### Enhance existing standards + +CUE also allows annotating existing sources with CUE expressions. +This allows one to keep using existing sources or allow for a smoother +transition into taking a CUE-centric approach. +For instance, a project might be quite reliant on protobuf definitions +as the source of truth of at least one aspect of schema definition. +For this particular case, CUE allows annotating Protobuf field declarations +with CUE expressions using field options. + +```proto +message Server { + int32 port = 1 [(cue.val) = ">5000 & <10_000"]; +} +``` + +A similar approach is supported for Go: + +```go +type Sum struct { + A int `cue:"c-b" json:"a,omitempty"` + B int `cue:"c-a" json:"b,omitempty"` + C int `cue:"a+b" json:"c,omitempty"` +} +``` + +In both cases, the constraints will be included the extraction to CUE. +In the case of Go, the constraints specified in the field tags can also +be used to validate Go structs directly. + + +### Convert CUE to other standards + +Currently, CUE supports converting CUE to OpenAPI and Go, although it is +certainly not limited to these cases. + + +## Comparisons + +### CEL + +The [Common Expression Language](https://github.com/google/cel-spec), +or CEL, defines a simple expression language that can be used as a +standardization of constraints. +It focuses on simplicity, speed, termination guarantees and +being able run embedded in applications. + +Unification of basic typed-feature structures has pseudo-linear run +time complexity. +The addition of comprehensions make the operation polynomial. +Not disallowing recursion would make CUE Turing complete. +The addition of sum types in CUE make certain operations NP-complete. +The NP-completeness manifests itself only when reasoning over incomplete types. +Trying to optimize a CEL expression would generally suffer from the same issue. +The same problem does not exist when applying CUE to concrete values. + +That said, CUE is currently not optimized for embedded running. +Currently, generated Go stubs embed a CUE interpreter into the code. +These stubs are compatible with a mode where CUE generates native code, +which would give it similar characteristics. + +CEL allows embedded implementation to add arbitrary functions. +CUE does not. +CUE keeps tight control over the pureness or hermeticity of evaluation +and to ensure the properties of the value lattice are not broken. +It would be possible, however, to provide the ability to add custom functions +for restricted to concrete values. + + +### Protoc-gen-validate (PGV) + +PGV also allows annotating Protobuf fields with validation code, +with implementations for Go and Java and an experimental versions for C++ +as of this writing. + + +```proto +message Server { + int32 port = 1 [(validate.rules).int32 = { gte: 5000, lte: 10000 }]; +} +``` diff --git a/hugo/content/en/docs/concept/configuration-use-case/index.md b/hugo/content/en/docs/concept/configuration-use-case/index.md new file mode 100644 index 0000000000..352215c426 --- /dev/null +++ b/hugo/content/en/docs/concept/configuration-use-case/index.md @@ -0,0 +1,268 @@ +--- +title: "Configuration use case" +description: "Managing text-based files to define a desired state of a system" +toc_hide: true +--- + +Arguably, validation should be the foremost task of any configuration language. +Most configuration languages, however, focus on boilerplate removal. +CUE is different in that it takes the validation first stance. +But CUE's constraints are also effective at reducing boilerplate, +although the approach it takes is quite different from conventional +data templating languages. + +CUE basic operation merges configurations in a way that the outcome is +always the same regardless of the order in which it is carried out +(it is associative, commutative and idempotent). +This property is the foundation for many other favorable properties, as discussed below. + + +## Core issues addressed by CUE + +### Type checking + + For large code bases, no one will question a requirement to + have a compiled/typed language. + Why should one not require the same kind of rigor for data? + +Many configuration languages, including GCL and its offspring, focus on +reducing boilerplate as the primary task of configuration. +Support for typing, however, is minimal or almost non-existent. + +Some languages do add typing support, but it is usually +limited to validating basic types, as is common with programming languages. +For data, however, this is insufficient. +Evidence of this is the uprise of standards like CDDL and OpenAPI that +go beyond basic typing. + +In CUE types and values are a unified concept, which gives it very +expressive, yet intuitive and compact, typing capabilities. + +```text +#Spec: { + kind: string + + name: { + first: !="" // must be specified and non-empty + middle?: !="" // optional, but must be non-empty when specified + last: !="" + } + + // The minimum must be strictly smaller than the maximum and vice versa. + minimum?: int & minimum +} + +// A spec is of type #Spec +spec: #Spec +spec: { + knid: "Homo Sapiens" // error, misspelled field + + name: first: "Jane" + name: last: "Doe" +} +``` + +### Simplicity at Scale + +When using a configuration language to reduce boilerplate +one should consider whether the reduced verbosity is worth the +increased complexity. +Most configurations use an override model to reducing boilerplate: +an existing configuration is used as a base and modified to result in +a new configuration. +This is often in the form of inheritance. + +For small-scale projects, +using inheritance can be too complex, and the simplicity of +spelling everything out is often a superior approach. +For large-scale projects, however, using inheritance often leads to deep +layerings of modifications, making it very hard to see where values come from. +In the end, it is again questionable whether the added complexity is worth it. + +Like with other configuration languages, CUE can add complexity if values +are organized to come from multiple places. +However, as CUE disallows overrides, deep layerings are naturally prevented. +More importantly, CUE can also enhance readability. +A definition in one file may apply to values in many other files. +Where one would usually have to open all these files to verify validity; +with CUE one can see it at a glance. + +CUE's approach has been battle-tested in computational linguistics where it +has been used for decades to describe human languages; +effectively very large, complex and irregular configurations. + + +### Abstractions versus Direct Access + +A common debate for configuration languages is whether a language should +provide an abstraction layer for APIs. +On the one hand, abstraction layers allow for protecting the user against misuse. +On the other hand, they need to keep up with API changes and are +inevitably prone to drift. +So it goes. + +CUE addresses both issues. +On the one hand, its fine-grained typing allows layering detailed constraints +on top of native APIs, without the need for an abstraction layer. +New features can be used without support of existing definitions. + +On the other hand, CUE's order independence allows abstraction layers +to inject arbitrary raw API in a controlled manner, +allowing a general escape hatch to support new or uncovered features. +See the Manual section of the +[Kubernetes tutorial](https://github.com/cue-labs/cue-by-example/blob/main/003_kubernetes_tutorial/README.md) +for an example. + +### Tooling + +A configuration language usually transforms its configurations to +a lower-level representation, like JSON, YAML, or Protobuf so that +it can be consumed by tools taking in these languages. +Piping such output to the needed tools works initially; +but sooner or later one will get the desire to automate this, +usually in the form of some kind of tool. + +And so it goes. +The rise of systems requiring advanced configuration has been paired +with a rise of even more specialized command line tools. +The core structure of all these tools is more or less the same. +More annoyingly, many have overlapping functionality yet are hardly extendable +or interoperable. +In the latter case, one may see the need to layer on yet another set of tools. + +Having tools like `kubectl` or `etcdctl` that directly control +core infrastructure makes sense, but at higher levels of +abstraction one needs a more open approach. + +CUE attempts to address this by providing an open, +declarative scripting layer on top of the configuration layer. +Aside from the above-mentioned case, it is designed to address various +other issues: + +- inject environmental data into configuration, something not allowed + in CUE itself (it is pure, or hermetic, or side-effect free) +- inject computed data into configurations as part of a pipeline +- allow composability of tool integration + +Again, the ability to deterministically merge data from different sources +make this a shoo-in task for CUE. + + +## Comparisons + +### Inheritance-based configuration languages + +Inheritance, is +not commutative and idempotent in the general case. In other words, order +matters. +This makes it hard to track where values are coming from. +This is not only true for humans, but also machines. +It makes it very complicated, if not impossible, to do any kind of +automation. +The complexity of inheritance is even bigger if values +can enter an object from one of several directions (super, overlay, etc.). + +The basic operation of CUE is commutative, associative and idempotent. +This order independence helps both +humans and machines. The resulting model is much less complex. + +{{< info >}} +### Inheritance in CUE +Although CUE does not have inheritance in the override sense, it does have +the notion of one value being an instance of another. +In fact, this is a core principle. + +Let's use a real-world example to make this distinction clear: +In the override model of inheritance, one can take an existing template, +say a dog, and modify it to become a cat. +Trim the ears, dry off the nose, and what have you. + +In CUE, it is a matter of classification. +Cats and dogs are both instances of animals, but once an entity is defined +to be a cat, it can never become a dog. +To most humans (aka computer scientists that have not become accustomed +to inheritance) this makes total sense. +{{< /info >}} + +Although one can create instances of values (remember, types are values), +one can not alter any of the values of a parent. +A template acts as a type. +Just as in statically typed languages where one cannot assign an integer to +a string, one cannot violate the properties of a type in CUE. + +These restrictions reduce flexibility, but also enhance clarity. +To ensure that a configuration holds a certain property, just declare it +in any file included in the project to make it so. +There is no need to look at other files. +As we saw; the imposed restrictions can also improve, rather than hurt, +the ability to remove boilerplate compared to inheritance-based languages. + +The complexity of inheritance-based models also hampers automation. +The introduction of GCL was paired with the promise of advanced tooling. +The mantra of declarative languages was even repeated with some of +its offspring. +The tooling never materialized, though, as the model made it intractable. + +CUE already provides power tools like trim, and its API provides +unify and subsumption operations for incomplete configurations, the building +blocks for powerful analysis. + + + + +### Jsonnet/ GCL + +Like Jsonnet, CUE is a superset of JSON. +They also are both influenced by GCL. +CUE, in turn is influenced by Jsonnet. +This may give the semblance that the languages are very similar. +At the core, though, they are very different. + +CUE's focus is data validation whereas Jsonnet focuses on data templating +(boilerplate removal). +Jsonnet was not designed with validation in mind. + +Jsonnet and GCL can be quite powerful at reducing boilerplate. +The goal of CUE is not to be better at boilerplate removal than Jsonnet or GCL. +CUE was designed to be an answer to two major shortcomings of these approaches: +complexity and lack of typing. +Jsonnet reduces some of the complexities of GCL, but largely falls into the +same category. +For CUE, the tradeoff was to add typing and reduce complexity +(for humans and machines), at the expense of giving up flexibility. + + +### HCL + +HCL has some striking similarities with GCL. +But whether this was a coincidence or deliberate, it removes the core +source of complexity of GCL: inheritance. + +It does introduce a poor man's version of inheritance: file overlays. +Fields may be defined in multiple files that get overwritten in a certain +order of the file names. +Although not nearly as complex as GCL, it does have some of the same issues. + +Also, whether the removal of inheritance was a coincidence or great insight, +there is no construct given in return that one might need for larger scale +configuration management. +This means the use of HCL may hit a ceiling for medium to larger setups. + +So what CUE has to offer to users of HCL is: typing, better growth prospects +to larger scale operations, and eliminating the peculiarities of file overlays. + +CUE does borrow one construct from HCL: the folding of single-field objects +onto a single line was directly inspired by HCL's very similar approach. + + + diff --git a/hugo/content/en/docs/concept/data-validation-use-case/index.md b/hugo/content/en/docs/concept/data-validation-use-case/index.md new file mode 100644 index 0000000000..71886d0d43 --- /dev/null +++ b/hugo/content/en/docs/concept/data-validation-use-case/index.md @@ -0,0 +1,136 @@ +--- +title: "Data Validation use case" +description: "Validate text-based or programmatic data" +toc_hide: true +--- + +By far the most straightforward approach to specify data is in plain +JSON or YAML files. +Every value can be looked up right where it needs to be defined. +But even at small scales, one will soon have to deal with +consistency issues. + +Data validation tools allow verifying the consistency of such data +based on a schema. + + +## Core issues addressed by CUE + +### Client-side validation + +There are not too many handy tools to verify plain data files. +Often, validation is relied upon to be done server side. +If it is done client side, it either relies on rather verbose schema +definitions or using custom tools that verify schema for a specific domain. + +The `cue` command line tool provides a fairly straightforward way to +define schema and verify them against a collection of data files. + +Given these two files, the `cue vet` command can verify that the values in +`ranges.yaml` are correct by just mentioning the two files on the command line. + +{{< code-tabs >}} +{{< code-tab name="check.cue" language="text" area="top-left" >}} +min?: *0 | number // 0 if undefined +max?: number & >min // must be strictly greater than min if defined. +{{< /code-tab >}} +{{< code-tab name="ranges.yaml" language="yaml" area="top-right" >}} +min: 5 +max: 10 +--- +min: 10 +max: 5 +{{< /code-tab >}} +{{< code-tab name="output" language="" area="bottom" >}} +max: invalid value 5 (out of bound >10): + ./check.cue:2:16 + ./ranges.yaml:5:7 +{{< /code-tab >}} +{{< /code-tabs >}} + + +### Validating document-oriented databases + +Document-oriented databases like Mongo and many others are characterized +by having flexible schema. +Some of them, like Mongo, optionally allow schema definitions, often in the +form of JSON schema. + +CUE constraints can be used to verify document-oriented databases. +Its default mechanism and expression syntax allow for filling in missing +values for an older version of a schema. +More importantly, CUE's order independence allows +"patch" specifications to be separated from the main schema definition. +CUE can take care of merging these and report if there are any inconsistencies +in the definitions, even before they are applied to a concrete case. + +CUE can be applied directly on the data in code using its API, +but it can also be used to compute JSON schemas from CUE definitions. +(See [cuelang.org/go/encoding/openapi](https://pkg.go.dev/cuelang.org/go/encoding/openapi).) +If a document-oriented database natively supports JSON schema it will likely +have its benefits to do so. +Using CUE to generate the schema has several advantages over doing so directly: + +- CUE is far less verbose. +- CUE can extract base definitions from other sources, like Go and Protobuf. +- It allows annotating validation code in these other sources + (e.g. field tags for Go, options for Protobuf). +- CUE's ability to merge, validate, and normalize configurations, + allows separation of concerns between main schema and patches for + older version, for instance. +- CUE can morph definitions in several forms, such as the structural OpenAPI + needed for Kubernetes' CRDs as of version 1.15. + + + + + + +### Migration path + + +As discussed in +["Be useful at all scales"](/docs/about#be-useful-at-all-scales), +there is a high cost to changing languages as one reaches the limits +with a certain approach. + +CUE adds the benefit of type checking to plain data files. +Once in use, it allows the same, +familiar tools to move to something more structured +as this approach reaches its limits. +CUE provides automated rewrite tools, such as `cue import` and `cue trim` +to aid in such migration. + + +## Comparisons + +### JSON Schema + +The closest approach to validating JSON and YAML with schema is the use +of JSON schema and accompanying tools. + +Compared to CUE, JSON schema does not have a unified type and value model. +This makes the ability to use JSON schema for boilerplate reduction minimal. +As it is specified in JSON itself (it is not a DSL) it can be quite verbose. + +Overall CUE is a more concise, yet more powerful schema language. +For instance, in CUE one can specify that two fields need to be identical to +one another: + +```text +point: { + x: number + y: number +} + +diagonal: point & { + x: y + y: x +} +``` + +Such a thing is not possible in JSON schema (or most configuration languages +for that matter). + +More on JSON Schema and its subset, OpenAPI, +in [Schema Definition]({{< relref "/docs/concept/schema-definition-use-case#json-schema--openapi" >}}). diff --git a/hugo/content/en/docs/concept/querying-use-case/index.md b/hugo/content/en/docs/concept/querying-use-case/index.md new file mode 100644 index 0000000000..506a3febf5 --- /dev/null +++ b/hugo/content/en/docs/concept/querying-use-case/index.md @@ -0,0 +1,31 @@ +--- +title: "Querying use case" +description: "Find data matching certain criteria" +toc_hide: true +--- + +CUE orders all values in a value lattice. +A value more at the top of a hierarchy is what programming languages would +refer to as a type. +Concrete value or constraints on such a "type" are all instances of that type. + +In other words, CUE constraints can be used to find patterns in data. +`cue vet` is a simple instance of this. + +But more elaborate querying in the form of a `find` or `query` subcommand +is certainly possible. +We would love to hear about your envisioned use cases to plan out +such a subcommand. + +## Programmatic Querying + +In the mean time, you can query data programmatically using the CUE API. +What you will need to do is + +- load data and constraints using + `cuelang.org/go/cue.Runtime` or + `cuelang.org/go/cue/load.Instances`. +- Walk over data using `cuelang.org/go/cue.Value`'s `Walk` method + or look up specific values. +- call `pattern.Subsumes(value)`, where `pattern` and `value` are + `cue.Value`s to see if value is an instance of pattern. diff --git a/hugo/content/en/docs/concept/schema-definition-use-case/index.md b/hugo/content/en/docs/concept/schema-definition-use-case/index.md new file mode 100644 index 0000000000..4f1e4b4438 --- /dev/null +++ b/hugo/content/en/docs/concept/schema-definition-use-case/index.md @@ -0,0 +1,281 @@ +--- +title: "Schema Definition use case" +description: "Defining schema to communicate an API or standard" +toc_hide: true +--- + +A data definition language describes the structure of data. +The structure defined by such a language can, in turn, be used +to verify implementations, validate inputs, or generate code. + +Most modern dedicated data definition languages or standards allow +more than just describing whether a field is an integer or a string. +Standards like OpenAPI and CDDL allow defining things like default +values, ranges, and various other constraints. +OpenAPI even allows for complex logical combinators. + +A key difference, however, is that these standards do not unify schema +and values—the thing that makes CUE so powerful. +There is no value lattice. +This limits these standards in various ways. + + +## Core issues addressed by CUE + +### Validating backwards compatibility + +CUE's model makes it easy to verify that newer versions of schema +are backwards-compatible with older versions. + +Consider the following versions of the same API: + +```text +// Release notes: +// - You can now specify your age and your hobby! +#V1: { + age: >=0 & <=100 + hobby: string +} + +// Release notes: +// - People get to be older than 100, so we relaxed it. +// - It seems not many people have a hobby, so we made it optional. +#V2: { + age: >=0 & <=150 // people get older now + hobby?: string // some people don't have a hobby +} + +// Release notes: +// - Actually no one seems to have a hobby nowadays anymore, so we dropped the field. +#V3: { + age: >=0 & <=150 +} +``` + +Declarations with a name starting with `#` are definitions. +Definitions are not emitted when converting to data, for instance when +exporting to JSON, and thus do not need to be concrete in such cases. +Definitions assume the definition of closed structs, which means a user may +only use fields that are explicitly defined. + +In CUE, an API is backwards compatible if it subsumes the older one, or +if the old one is an instance of the new one. + +This can be computed using the API: + +```go +inst, err := r.Compile("apis", /* text of the above API */) +if err != nil { + // handle error +} +v1, err1 := inst.LookupField("V1") +v2, err2 := inst.LookupField("V2") +v3, err3 := inst.LookupField("V3") +if err1 != nil || err2 != nil || err3 != nil { + // handle errors +} + +// Check if V2 is backwards compatible with V1 +fmt.Println(v2.Value.Subsumes(v1.Value))) // true + +// Check if V3 is backwards compatible with V2 +fmt.Println(v3.Value.Subsumes(v2.Value))) // false +``` + +It is as simple as that. +This is the kind of thing that is made possible +by ordering all values in a lattice, like CUE does. +For CUE, checking whether one API is an instance of another is like checking +whether 3 is less than 4. + +Note that `V2` strictly relaxed the API relative to `V1`. +It allowed specifying a wider age range and made the `hobby` field optional. +In `V3` the `hobby` field is explicitly disallowed. +This is not backwards compatibly as it breaks previous field that did +contain a `hobby` field. + +The current API only reports a _yea_ or _nay_. +The plan is to give full actionable reports. +Feedback welcome! + + +### Combining constraints from different sources + +Most data definition languages are often not +explicitly defined for commutativity. +For instance, CDDL, although much less expressive than CUE, introduces operators +that break commutativity. + +The additive property obtained by commutativity is of great value for +data definition. +Constraints often come from many sources. +For instance, one can have constraints from a base template, from code, +policies provided by different departments and policies provided by +a client. + +CUE's additive nature of constraints allows piling up constraints, +in any order, to obtain a new definition. +Which leads us to the next topic. + +### Normalization of data definitions + +Adding constraints from many sources can result in a lot of redundancy. +Even worse, constraints can be specified in different logical forms, +making their additive form verbose and unwieldy. +This is fine if all a system does using these constraints is validate data. +But this is problematic if the added constraints are to form the basis for, +say, human consumption. + +CUE's logical inference engine automatically reduces constraints. +Its API makes it possible to compute and select between +various normal forms to optimize for a certain representation. +This is used in [CUE's OpenAPI generator](/docs/integrations/openapi), +for instance. + + +## Comparisons + +### JSON Schema / OpenAPI + +JSON Schema and OpenAPI are purely data-driven data definition standards. +OpenAPI originates from Swagger. +As of version 3, OpenAPI is more or less a subset of JSON Schema. +OpenAPI is used to define Kubernetes Custom Resource Definitions. +As of 1.15, this requires a variant of OpenAPI called Structural OpenAPI. +We will collectively refer to these as OpenAPI henceforth. + +OpenAPI does not have any expressions or references. +They have powerful logical operators, though, +that make them remarkably expressive. + +{{< info >}} +### On logical not +OpenAPI defines a `not` operator. +These get fuzzy when defined on structs, which OpenAPI allows. +CUE doesn't have such a construct, partly to avoid its logical pitfalls. +However, it can get a good approximation by interpreting `¬P` as `P→⊥`. +{{< /info >}} + +An advantage of OpenAPI is that it is purely defined in terms of data (JSON). +This allows sending it over the wire. +It is defined such that implementing an interpreter is fairly straightforward. + +One disadvantage is that it is very verbose. +Compare the following two equivalent schema definitions: + +{{< code-tabs >}} +{{< code-tab name="native.cue" language="text" area="top-left" >}} +// Definitions. + +// Info describes... +Info: { + // Name of the adapter. + name: string + + // Templates. + templates?: [...string] + + // Max is the limit. + max?: uint & <100 +} +{{< /code-tab >}} +{{< code-tab name="openapi.json" language="json" area="top-right" >}} +{ + "openapi": "3.0.0", + "info": { + "title": "Definitions.", + "version": "v1beta1" + }, + "components": { + "schemas": { + "Info": { + "description": "Info describes...", + "type": "object", + "required": [ + "name" + ], + "properties": { + "name": { + "description": "Name of the adapter.", + "type": "string", + "format": "string" + }, + "templates": { + "description": "Templates.", + "type": "array", + "items": { + "type": "string", + "format": "string" + } + }, + "max": { + "description": "Max is the limit", + "type": "integer", + "minimum": 0, + "exclusiveMaximum": 100 + } + } + } + } + } +} +{{< /code-tab >}} +{{< /code-tabs >}} + +The difference gets more extreme as more constraints and logical +combinators are used. + +OpenAPI and CUE both have theirs roles. +The JSON format of OpenAPI makes it good interchange standard. +CUE, on the other hand, can serve as an engine to generate and interpret +OpenAPI constraints. +Note that CUE is generally more expressive and many CUE constraints will +not be encodeable in OpenAPI. + + +### OPA / Rego + +Although not designed as a data definition language, Rego, the language +used for Open Policy Agent (OPA), also solves the issue of being able to +add constraints from multiple sources. + +Rego, like CUE, has its roots in logic programming. +It is based on Datalog, a restricted form of Prolog, whereas CUE is based on +typed-feature structure or graph unification. +Typed-feature structures were designed to deal with the shortcomings +of Prolog for applications in encoding human languages. + +Using a Datalog variant for what is essentially a constraint +validation task is somewhat curious. +Datalog makes an excellent query language. +But for constraint enforcement, it is a bit cumbersome as one effectively +first needs to query values to which to apply the constraints. +CUE collates the constraints with the location of the data to which they apply. +As a result, CUE constraints look a lot like the data they constrain, +unlike Rego which will be more reminiscent of a Datalog program. + +But more importantly, CUE's approach is more amenable to finding normalized +and simplified representations of constraints, which makes it more suitable +for creating OpenAPI from them. + + +### CDDL + +The Concise Data Definition Language (CDDL) is used to define +the structure of CBOR or JSON data. +CDDL shares many of the same constructs from CUE, including +disjunctions, embedding, optional fields, and definitions. + +CDDL, however, has no value lattice and does not define mathematical +properties of its data. +There several other aspects in CDDL that contradict the use of a value lattice +or make it harder to do so. +Overall this restricts the expressiveness of CDDL compared to CUE +while complicating the ability to combine constraints on types +from multiple sources. + +Unlike OpenAPI, CDDL is a domain-specific language (DSL). +It needs a specific interpreter. +It also has some non-trivial aspects to its evaluation, making it much harder +than OpenAPI to implement. + diff --git a/hugo/content/en/docs/concept/scripting-use-case/index.md b/hugo/content/en/docs/concept/scripting-use-case/index.md new file mode 100644 index 0000000000..1923afeb9e --- /dev/null +++ b/hugo/content/en/docs/concept/scripting-use-case/index.md @@ -0,0 +1,20 @@ +--- +title: "Scripting use case" +description: "Make static data come to life" +toc_hide: true +--- + +CUE has a powerful scripting layer. + +More on this later. + +For now, we refer to the documentation included in the CUE tool itself: +``` +$ cue help cmd +``` +or the "Define Commands" section of the +[Kubernetes Tutorial](https://github.com/cue-labs/cue-by-example/blob/main/003_kubernetes_tutorial/README.md). + +{{< warning >}} +User-defined command line flags are not yet supported. +{{< /warning >}}