diff --git a/content/docs/concept/code-generation-and-extraction-use-case/en.md b/content/docs/concept/code-generation-and-extraction-use-case/en.md new file mode 100644 index 0000000000..c6143a58f2 --- /dev/null +++ b/content/docs/concept/code-generation-and-extraction-use-case/en.md @@ -0,0 +1,118 @@ +--- +title: "Code Generation and Extraction use case" +description: "Converting CUE constraints to and from definitions in other languages" +toc_hide: true +--- + +Code generation and extraction is a broad topic and, for instance, overlaps +with the topics discussed in +[Schema Definition]({{< relref "/docs/concept/schema-definition-use-case" >}}) and +[Go](/docs/integrations/go). + + +In this section we emphasize the role of CUE in a code-generation pipeline, +that is using CUE as an interlingua for the extraction from and the +generation to multiple sources. + + +## Core issues addressed by CUE + +### Extract data definition from existing sources + +When one identifies the need to define interchangeable data schema +one usually already has some code base to deal with. + +CUE can currently extract definitions from: + + +- [Go code](/docs/integrations/go#extract-cue-from-go) +- Protobuf definitions. + +Moreover, CUE can combine and reduce the constraints from various sources +and report if there are any inconsistencies. + + +### Enhance existing standards + +CUE also allows annotating existing sources with CUE expressions. +This allows one to keep using existing sources or allow for a smoother +transition into taking a CUE-centric approach. +For instance, a project might be quite reliant on protobuf definitions +as the source of truth of at least one aspect of schema definition. +For this particular case, CUE allows annotating Protobuf field declarations +with CUE expressions using field options. + +{{{with code "en" "proto-1"}}} +-- in.proto -- +message Server { + int32 port = 1 [(cue.val) = ">5000 & <10_000"]; +} +{{{end}}} + +A similar approach is supported for Go: + +{{{with code "en" "go"}}} +-- in.go -- +type Sum struct { + A int `cue:"c-b" json:"a,omitempty"` + B int `cue:"c-a" json:"b,omitempty"` + C int `cue:"a+b" json:"c,omitempty"` +} +{{{end}}} + +In both cases, the constraints will be included the extraction to CUE. +In the case of Go, the constraints specified in the field tags can also +be used to validate Go structs directly. + + +### Convert CUE to other standards + +Currently, CUE supports converting CUE to OpenAPI and Go, although it is +certainly not limited to these cases. + + +## Comparisons + +### CEL + +The [Common Expression Language](https://github.com/google/cel-spec), +or CEL, defines a simple expression language that can be used as a +standardization of constraints. +It focuses on simplicity, speed, termination guarantees and +being able run embedded in applications. + +Unification of basic typed-feature structures has pseudo-linear run +time complexity. +The addition of comprehensions make the operation polynomial. +Not disallowing recursion would make CUE Turing complete. +The addition of sum types in CUE make certain operations NP-complete. +The NP-completeness manifests itself only when reasoning over incomplete types. +Trying to optimize a CEL expression would generally suffer from the same issue. +The same problem does not exist when applying CUE to concrete values. + +That said, CUE is currently not optimized for embedded running. +Currently, generated Go stubs embed a CUE interpreter into the code. +These stubs are compatible with a mode where CUE generates native code, +which would give it similar characteristics. + +CEL allows embedded implementation to add arbitrary functions. +CUE does not. +CUE keeps tight control over the pureness or hermeticity of evaluation +and to ensure the properties of the value lattice are not broken. +It would be possible, however, to provide the ability to add custom functions +for restricted to concrete values. + + +### Protoc-gen-validate (PGV) + +PGV also allows annotating Protobuf fields with validation code, +with implementations for Go and Java and an experimental versions for C++ +as of this writing. + + +{{{with code "en" "proto-2"}}} +-- in.proto -- +message Server { + int32 port = 1 [(validate.rules).int32 = { gte: 5000, lte: 10000 }]; +} +{{{end}}} diff --git a/content/docs/concept/code-generation-and-extraction-use-case/gen_cache.cue b/content/docs/concept/code-generation-and-extraction-use-case/gen_cache.cue new file mode 100644 index 0000000000..10846853a6 --- /dev/null +++ b/content/docs/concept/code-generation-and-extraction-use-case/gen_cache.cue @@ -0,0 +1,20 @@ +package site +{ + content: { + docs: { + concept: { + "code-generation-and-extraction-use-case": { + page: { + cache: { + code: { + "proto-1": "WJqxJBmZxawhD1zanSswGQOV/CRvJ/Q8MPj5XjlN0D8=" + go: "i710rUh7cCg46orXaxdFaeJgm2G32l6Mwj3a2GfQTrc=" + "proto-2": "hftRHuNwzV8FyU3H4oto4/ealvn4RCAJHN7xdxPX0Po=" + } + } + } + } + } + } + } +} diff --git a/content/docs/concept/code-generation-and-extraction-use-case/page.cue b/content/docs/concept/code-generation-and-extraction-use-case/page.cue new file mode 100644 index 0000000000..2034070f1a --- /dev/null +++ b/content/docs/concept/code-generation-and-extraction-use-case/page.cue @@ -0,0 +1,3 @@ +package site + +content: docs: concept: "code-generation-and-extraction-use-case": {} diff --git a/content/docs/concept/configuration-use-case/en.md b/content/docs/concept/configuration-use-case/en.md new file mode 100644 index 0000000000..839c198372 --- /dev/null +++ b/content/docs/concept/configuration-use-case/en.md @@ -0,0 +1,269 @@ +--- +title: "Configuration use case" +description: "Managing text-based files to define a desired state of a system" +toc_hide: true +--- + +Arguably, validation should be the foremost task of any configuration language. +Most configuration languages, however, focus on boilerplate removal. +CUE is different in that it takes the validation first stance. +But CUE's constraints are also effective at reducing boilerplate, +although the approach it takes is quite different from conventional +data templating languages. + +CUE basic operation merges configurations in a way that the outcome is +always the same regardless of the order in which it is carried out +(it is associative, commutative and idempotent). +This property is the foundation for many other favorable properties, as discussed below. + + +## Core issues addressed by CUE + +### Type checking + + For large code bases, no one will question a requirement to + have a compiled/typed language. + Why should one not require the same kind of rigor for data? + +Many configuration languages, including GCL and its offspring, focus on +reducing boilerplate as the primary task of configuration. +Support for typing, however, is minimal or almost non-existent. + +Some languages do add typing support, but it is usually +limited to validating basic types, as is common with programming languages. +For data, however, this is insufficient. +Evidence of this is the uprise of standards like CDDL and OpenAPI that +go beyond basic typing. + +In CUE types and values are a unified concept, which gives it very +expressive, yet intuitive and compact, typing capabilities. + +{{{with code "en" "spec"}}} +-- in.cue -- +#Spec: { + kind: string + + name: { + first: !="" // must be specified and non-empty + middle?: !="" // optional, but must be non-empty when specified + last: !="" + } + + // The minimum must be strictly smaller than the maximum and vice versa. + minimum?: int & minimum +} + +// A spec is of type #Spec +spec: #Spec +spec: { + knid: "Homo Sapiens" // error, misspelled field + + name: first: "Jane" + name: last: "Doe" +} +{{{end}}} + +### Simplicity at Scale + +When using a configuration language to reduce boilerplate +one should consider whether the reduced verbosity is worth the +increased complexity. +Most configurations use an override model to reducing boilerplate: +an existing configuration is used as a base and modified to result in +a new configuration. +This is often in the form of inheritance. + +For small-scale projects, +using inheritance can be too complex, and the simplicity of +spelling everything out is often a superior approach. +For large-scale projects, however, using inheritance often leads to deep +layerings of modifications, making it very hard to see where values come from. +In the end, it is again questionable whether the added complexity is worth it. + +Like with other configuration languages, CUE can add complexity if values +are organized to come from multiple places. +However, as CUE disallows overrides, deep layerings are naturally prevented. +More importantly, CUE can also enhance readability. +A definition in one file may apply to values in many other files. +Where one would usually have to open all these files to verify validity; +with CUE one can see it at a glance. + +CUE's approach has been battle-tested in computational linguistics where it +has been used for decades to describe human languages; +effectively very large, complex and irregular configurations. + + +### Abstractions versus Direct Access + +A common debate for configuration languages is whether a language should +provide an abstraction layer for APIs. +On the one hand, abstraction layers allow for protecting the user against misuse. +On the other hand, they need to keep up with API changes and are +inevitably prone to drift. +So it goes. + +CUE addresses both issues. +On the one hand, its fine-grained typing allows layering detailed constraints +on top of native APIs, without the need for an abstraction layer. +New features can be used without support of existing definitions. + +On the other hand, CUE's order independence allows abstraction layers +to inject arbitrary raw API in a controlled manner, +allowing a general escape hatch to support new or uncovered features. +See the Manual section of the +[Kubernetes tutorial](https://github.com/cue-labs/cue-by-example/blob/main/003_kubernetes_tutorial/README.md) +for an example. + +### Tooling + +A configuration language usually transforms its configurations to +a lower-level representation, like JSON, YAML, or Protobuf so that +it can be consumed by tools taking in these languages. +Piping such output to the needed tools works initially; +but sooner or later one will get the desire to automate this, +usually in the form of some kind of tool. + +And so it goes. +The rise of systems requiring advanced configuration has been paired +with a rise of even more specialized command line tools. +The core structure of all these tools is more or less the same. +More annoyingly, many have overlapping functionality yet are hardly extendable +or interoperable. +In the latter case, one may see the need to layer on yet another set of tools. + +Having tools like `kubectl` or `etcdctl` that directly control +core infrastructure makes sense, but at higher levels of +abstraction one needs a more open approach. + +CUE attempts to address this by providing an open, +declarative scripting layer on top of the configuration layer. +Aside from the above-mentioned case, it is designed to address various +other issues: + +- inject environmental data into configuration, something not allowed + in CUE itself (it is pure, or hermetic, or side-effect free) +- inject computed data into configurations as part of a pipeline +- allow composability of tool integration + +Again, the ability to deterministically merge data from different sources +make this a shoo-in task for CUE. + + +## Comparisons + +### Inheritance-based configuration languages + +Inheritance, is +not commutative and idempotent in the general case. In other words, order +matters. +This makes it hard to track where values are coming from. +This is not only true for humans, but also machines. +It makes it very complicated, if not impossible, to do any kind of +automation. +The complexity of inheritance is even bigger if values +can enter an object from one of several directions (super, overlay, etc.). + +The basic operation of CUE is commutative, associative and idempotent. +This order independence helps both +humans and machines. The resulting model is much less complex. + +{{< info >}} +### Inheritance in CUE +Although CUE does not have inheritance in the override sense, it does have +the notion of one value being an instance of another. +In fact, this is a core principle. + +Let's use a real-world example to make this distinction clear: +In the override model of inheritance, one can take an existing template, +say a dog, and modify it to become a cat. +Trim the ears, dry off the nose, and what have you. + +In CUE, it is a matter of classification. +Cats and dogs are both instances of animals, but once an entity is defined +to be a cat, it can never become a dog. +To most humans (aka computer scientists that have not become accustomed +to inheritance) this makes total sense. +{{< /info >}} + +Although one can create instances of values (remember, types are values), +one can not alter any of the values of a parent. +A template acts as a type. +Just as in statically typed languages where one cannot assign an integer to +a string, one cannot violate the properties of a type in CUE. + +These restrictions reduce flexibility, but also enhance clarity. +To ensure that a configuration holds a certain property, just declare it +in any file included in the project to make it so. +There is no need to look at other files. +As we saw; the imposed restrictions can also improve, rather than hurt, +the ability to remove boilerplate compared to inheritance-based languages. + +The complexity of inheritance-based models also hampers automation. +The introduction of GCL was paired with the promise of advanced tooling. +The mantra of declarative languages was even repeated with some of +its offspring. +The tooling never materialized, though, as the model made it intractable. + +CUE already provides power tools like trim, and its API provides +unify and subsumption operations for incomplete configurations, the building +blocks for powerful analysis. + + + + +### Jsonnet/ GCL + +Like Jsonnet, CUE is a superset of JSON. +They also are both influenced by GCL. +CUE, in turn is influenced by Jsonnet. +This may give the semblance that the languages are very similar. +At the core, though, they are very different. + +CUE's focus is data validation whereas Jsonnet focuses on data templating +(boilerplate removal). +Jsonnet was not designed with validation in mind. + +Jsonnet and GCL can be quite powerful at reducing boilerplate. +The goal of CUE is not to be better at boilerplate removal than Jsonnet or GCL. +CUE was designed to be an answer to two major shortcomings of these approaches: +complexity and lack of typing. +Jsonnet reduces some of the complexities of GCL, but largely falls into the +same category. +For CUE, the tradeoff was to add typing and reduce complexity +(for humans and machines), at the expense of giving up flexibility. + + +### HCL + +HCL has some striking similarities with GCL. +But whether this was a coincidence or deliberate, it removes the core +source of complexity of GCL: inheritance. + +It does introduce a poor man's version of inheritance: file overlays. +Fields may be defined in multiple files that get overwritten in a certain +order of the file names. +Although not nearly as complex as GCL, it does have some of the same issues. + +Also, whether the removal of inheritance was a coincidence or great insight, +there is no construct given in return that one might need for larger scale +configuration management. +This means the use of HCL may hit a ceiling for medium to larger setups. + +So what CUE has to offer to users of HCL is: typing, better growth prospects +to larger scale operations, and eliminating the peculiarities of file overlays. + +CUE does borrow one construct from HCL: the folding of single-field objects +onto a single line was directly inspired by HCL's very similar approach. + + + diff --git a/content/docs/concept/configuration-use-case/gen_cache.cue b/content/docs/concept/configuration-use-case/gen_cache.cue new file mode 100644 index 0000000000..9decb89def --- /dev/null +++ b/content/docs/concept/configuration-use-case/gen_cache.cue @@ -0,0 +1,18 @@ +package site +{ + content: { + docs: { + concept: { + "configuration-use-case": { + page: { + cache: { + code: { + spec: "Py6Q0BhvScCMMX3oUJR1vFfjBVTg43b9qYS1q+VHM3M=" + } + } + } + } + } + } + } +} diff --git a/content/docs/concept/configuration-use-case/page.cue b/content/docs/concept/configuration-use-case/page.cue new file mode 100644 index 0000000000..2b24433cb3 --- /dev/null +++ b/content/docs/concept/configuration-use-case/page.cue @@ -0,0 +1,3 @@ +package site + +content: docs: concept: "configuration-use-case": {} diff --git a/content/docs/concept/data-validation-use-case/en.md b/content/docs/concept/data-validation-use-case/en.md new file mode 100644 index 0000000000..5e0185eab7 --- /dev/null +++ b/content/docs/concept/data-validation-use-case/en.md @@ -0,0 +1,138 @@ +--- +title: "Data Validation use case" +description: "Validate text-based or programmatic data" +toc_hide: true +--- + +By far the most straightforward approach to specify data is in plain +JSON or YAML files. +Every value can be looked up right where it needs to be defined. +But even at small scales, one will soon have to deal with +consistency issues. + +Data validation tools allow verifying the consistency of such data +based on a schema. + + +## Core issues addressed by CUE + +### Client-side validation + +There are not too many handy tools to verify plain data files. +Often, validation is relied upon to be done server side. +If it is done client side, it either relies on rather verbose schema +definitions or using custom tools that verify schema for a specific domain. + +The `cue` command line tool provides a fairly straightforward way to +define schema and verify them against a collection of data files. + +Given these two files, the `cue vet` command can verify that the values in +`ranges.yaml` are correct by just mentioning the two files on the command line. + +{{{with code "en" "client-side-validation"}}} +#nofmt(ranges.yaml) https://github.com/cue-lang/cue/issues/2666: multi-document yaml files + +! exec cue vet ranges.yaml check.cue +cmp stderr output +-- check.cue -- +min?: *0 | number // 0 if undefined +max?: number & >min // must be strictly greater than min if defined. +-- ranges.yaml -- +min: 5 +max: 10 +--- +min: 10 +max: 5 +-- output -- +max: invalid value 5 (out of bound >10): + ./check.cue:2:16 + ./ranges.yaml:5:7 +{{{end}}} + + +### Validating document-oriented databases + +Document-oriented databases like Mongo and many others are characterized +by having flexible schema. +Some of them, like Mongo, optionally allow schema definitions, often in the +form of JSON schema. + +CUE constraints can be used to verify document-oriented databases. +Its default mechanism and expression syntax allow for filling in missing +values for an older version of a schema. +More importantly, CUE's order independence allows +"patch" specifications to be separated from the main schema definition. +CUE can take care of merging these and report if there are any inconsistencies +in the definitions, even before they are applied to a concrete case. + +CUE can be applied directly on the data in code using its API, +but it can also be used to compute JSON schemas from CUE definitions. +(See [cuelang.org/go/encoding/openapi](https://pkg.go.dev/cuelang.org/go/encoding/openapi).) +If a document-oriented database natively supports JSON schema it will likely +have its benefits to do so. +Using CUE to generate the schema has several advantages over doing so directly: + +- CUE is far less verbose. +- CUE can extract base definitions from other sources, like Go and Protobuf. +- It allows annotating validation code in these other sources + (e.g. field tags for Go, options for Protobuf). +- CUE's ability to merge, validate, and normalize configurations, + allows separation of concerns between main schema and patches for + older version, for instance. +- CUE can morph definitions in several forms, such as the structural OpenAPI + needed for Kubernetes' CRDs as of version 1.15. + + + + + + +### Migration path + + +As discussed in +["Be useful at all scales"](/docs/about#be-useful-at-all-scales), +there is a high cost to changing languages as one reaches the limits +with a certain approach. + +CUE adds the benefit of type checking to plain data files. +Once in use, it allows the same, +familiar tools to move to something more structured +as this approach reaches its limits. +CUE provides automated rewrite tools, such as `cue import` and `cue trim` +to aid in such migration. + + +## Comparisons + +### JSON Schema + +The closest approach to validating JSON and YAML with schema is the use +of JSON schema and accompanying tools. + +Compared to CUE, JSON schema does not have a unified type and value model. +This makes the ability to use JSON schema for boilerplate reduction minimal. +As it is specified in JSON itself (it is not a DSL) it can be quite verbose. + +Overall CUE is a more concise, yet more powerful schema language. +For instance, in CUE one can specify that two fields need to be identical to +one another: + +{{{with code "en" "jsonschema"}}} +-- in.cue -- +point: { + x: number + y: number +} + +diagonal: point & { + x: y + y: x +} +{{{end}}} + +Such a thing is not possible in JSON schema (or most configuration languages +for that matter). + +More on JSON Schema and its subset, OpenAPI, +in [Schema Definition]({{< relref "/docs/concept/schema-definition-use-case#json-schema--openapi" >}}). diff --git a/content/docs/concept/data-validation-use-case/gen_cache.cue b/content/docs/concept/data-validation-use-case/gen_cache.cue new file mode 100644 index 0000000000..c46545088d --- /dev/null +++ b/content/docs/concept/data-validation-use-case/gen_cache.cue @@ -0,0 +1,19 @@ +package site +{ + content: { + docs: { + concept: { + "data-validation-use-case": { + page: { + cache: { + code: { + "client-side-validation": "7pUvsY+ACV4j/OlB2trgIBbKYtNkjEPZV4R8e2lvyEY=" + jsonschema: "wV6Z2nmqslUZOHfuyOIQnwpFNDC+iExo9/HsTEKVt2g=" + } + } + } + } + } + } + } +} diff --git a/content/docs/concept/data-validation-use-case/page.cue b/content/docs/concept/data-validation-use-case/page.cue new file mode 100644 index 0000000000..42618615eb --- /dev/null +++ b/content/docs/concept/data-validation-use-case/page.cue @@ -0,0 +1,3 @@ +package site + +content: docs: concept: "data-validation-use-case": {} diff --git a/content/docs/concept/querying-use-case/en.md b/content/docs/concept/querying-use-case/en.md new file mode 100644 index 0000000000..506a3febf5 --- /dev/null +++ b/content/docs/concept/querying-use-case/en.md @@ -0,0 +1,31 @@ +--- +title: "Querying use case" +description: "Find data matching certain criteria" +toc_hide: true +--- + +CUE orders all values in a value lattice. +A value more at the top of a hierarchy is what programming languages would +refer to as a type. +Concrete value or constraints on such a "type" are all instances of that type. + +In other words, CUE constraints can be used to find patterns in data. +`cue vet` is a simple instance of this. + +But more elaborate querying in the form of a `find` or `query` subcommand +is certainly possible. +We would love to hear about your envisioned use cases to plan out +such a subcommand. + +## Programmatic Querying + +In the mean time, you can query data programmatically using the CUE API. +What you will need to do is + +- load data and constraints using + `cuelang.org/go/cue.Runtime` or + `cuelang.org/go/cue/load.Instances`. +- Walk over data using `cuelang.org/go/cue.Value`'s `Walk` method + or look up specific values. +- call `pattern.Subsumes(value)`, where `pattern` and `value` are + `cue.Value`s to see if value is an instance of pattern. diff --git a/content/docs/concept/querying-use-case/page.cue b/content/docs/concept/querying-use-case/page.cue new file mode 100644 index 0000000000..2aee765375 --- /dev/null +++ b/content/docs/concept/querying-use-case/page.cue @@ -0,0 +1,3 @@ +package site + +content: docs: concept: "querying-use-case": {} diff --git a/content/docs/concept/schema-definition-use-case/en.md b/content/docs/concept/schema-definition-use-case/en.md new file mode 100644 index 0000000000..6f1f85f36a --- /dev/null +++ b/content/docs/concept/schema-definition-use-case/en.md @@ -0,0 +1,282 @@ +--- +title: "Schema Definition use case" +description: "Defining schema to communicate an API or standard" +toc_hide: true +--- + +A data definition language describes the structure of data. +The structure defined by such a language can, in turn, be used +to verify implementations, validate inputs, or generate code. + +Most modern dedicated data definition languages or standards allow +more than just describing whether a field is an integer or a string. +Standards like OpenAPI and CDDL allow defining things like default +values, ranges, and various other constraints. +OpenAPI even allows for complex logical combinators. + +A key difference, however, is that these standards do not unify schema +and values—the thing that makes CUE so powerful. +There is no value lattice. +This limits these standards in various ways. + + +## Core issues addressed by CUE + +### Validating backwards compatibility + +CUE's model makes it easy to verify that newer versions of schema +are backwards-compatible with older versions. + +Consider the following versions of the same API: + +{{{with code "en" "api-cue"}}} +-- in.cue -- +// Release notes: +// - You can now specify your age and your hobby! +#V1: { + age: >=0 & <=100 + hobby: string +} + +// Release notes: +// - People get to be older than 100, so we relaxed it. +// - It seems not many people have a hobby, so we made it optional. +#V2: { + age: >=0 & <=150 // people get older now + hobby?: string // some people don't have a hobby +} + +// Release notes: +// - Actually no one seems to have a hobby nowadays anymore, so we dropped the field. +#V3: { + age: >=0 & <=150 +} +{{{end}}} + +Declarations with a name starting with `#` are definitions. +Definitions are not emitted when converting to data, for instance when +exporting to JSON, and thus do not need to be concrete in such cases. +Definitions assume the definition of closed structs, which means a user may +only use fields that are explicitly defined. + +In CUE, an API is backwards compatible if it subsumes the older one, or +if the old one is an instance of the new one. + +This can be computed using the API: + +{{{with code "en" "api-go"}}} +-- in.go -- +inst, err := r.Compile("apis", /* text of the above API */) +if err != nil { + // handle error +} +v1, err1 := inst.LookupField("V1") +v2, err2 := inst.LookupField("V2") +v3, err3 := inst.LookupField("V3") +if err1 != nil || err2 != nil || err3 != nil { + // handle errors +} + +// Check if V2 is backwards compatible with V1 +fmt.Println(v2.Value.Subsumes(v1.Value))) // true + +// Check if V3 is backwards compatible with V2 +fmt.Println(v3.Value.Subsumes(v2.Value))) // false +{{{end}}} + +It is as simple as that. +This is the kind of thing that is made possible +by ordering all values in a lattice, like CUE does. +For CUE, checking whether one API is an instance of another is like checking +whether 3 is less than 4. + +Note that `V2` strictly relaxed the API relative to `V1`. +It allowed specifying a wider age range and made the `hobby` field optional. +In `V3` the `hobby` field is explicitly disallowed. +This is not backwards compatibly as it breaks previous field that did +contain a `hobby` field. + +The current API only reports a _yea_ or _nay_. +The plan is to give full actionable reports. +Feedback welcome! + + +### Combining constraints from different sources + +Most data definition languages are often not +explicitly defined for commutativity. +For instance, CDDL, although much less expressive than CUE, introduces operators +that break commutativity. + +The additive property obtained by commutativity is of great value for +data definition. +Constraints often come from many sources. +For instance, one can have constraints from a base template, from code, +policies provided by different departments and policies provided by +a client. + +CUE's additive nature of constraints allows piling up constraints, +in any order, to obtain a new definition. +Which leads us to the next topic. + +### Normalization of data definitions + +Adding constraints from many sources can result in a lot of redundancy. +Even worse, constraints can be specified in different logical forms, +making their additive form verbose and unwieldy. +This is fine if all a system does using these constraints is validate data. +But this is problematic if the added constraints are to form the basis for, +say, human consumption. + +CUE's logical inference engine automatically reduces constraints. +Its API makes it possible to compute and select between +various normal forms to optimize for a certain representation. +This is used in [CUE's OpenAPI generator](/docs/integrations/openapi), +for instance. + + +## Comparisons + +### JSON Schema / OpenAPI + +JSON Schema and OpenAPI are purely data-driven data definition standards. +OpenAPI originates from Swagger. +As of version 3, OpenAPI is more or less a subset of JSON Schema. +OpenAPI is used to define Kubernetes Custom Resource Definitions. +As of 1.15, this requires a variant of OpenAPI called Structural OpenAPI. +We will collectively refer to these as OpenAPI henceforth. + +OpenAPI does not have any expressions or references. +They have powerful logical operators, though, +that make them remarkably expressive. + +{{< info >}} +### On logical not +OpenAPI defines a `not` operator. +These get fuzzy when defined on structs, which OpenAPI allows. +CUE doesn't have such a construct, partly to avoid its logical pitfalls. +However, it can get a good approximation by interpreting `¬P` as `P→⊥`. +{{< /info >}} + +An advantage of OpenAPI is that it is purely defined in terms of data (JSON). +This allows sending it over the wire. +It is defined such that implementing an interpreter is fairly straightforward. + +One disadvantage is that it is very verbose. +Compare the following two equivalent schema definitions: + +{{{with code "en" "openapi-comparison"}}} +exec true +-- native.cue -- +// Definitions. + +// Info describes... +Info: { + // Name of the adapter. + name: string + + // Templates. + templates?: [...string] + + // Max is the limit. + max?: uint & <100 +} +-- openapi.json -- +{ + "openapi": "3.0.0", + "info": { + "title": "Definitions.", + "version": "v1beta1" + }, + "components": { + "schemas": { + "Info": { + "description": "Info describes...", + "type": "object", + "required": [ + "name" + ], + "properties": { + "name": { + "description": "Name of the adapter.", + "type": "string", + "format": "string" + }, + "templates": { + "description": "Templates.", + "type": "array", + "items": { + "type": "string", + "format": "string" + } + }, + "max": { + "description": "Max is the limit", + "type": "integer", + "minimum": 0, + "exclusiveMaximum": 100 + } + } + } + } + } +} +{{{end}}} + +The difference gets more extreme as more constraints and logical +combinators are used. + +OpenAPI and CUE both have theirs roles. +The JSON format of OpenAPI makes it good interchange standard. +CUE, on the other hand, can serve as an engine to generate and interpret +OpenAPI constraints. +Note that CUE is generally more expressive and many CUE constraints will +not be encodeable in OpenAPI. + + +### OPA / Rego + +Although not designed as a data definition language, Rego, the language +used for Open Policy Agent (OPA), also solves the issue of being able to +add constraints from multiple sources. + +Rego, like CUE, has its roots in logic programming. +It is based on Datalog, a restricted form of Prolog, whereas CUE is based on +typed-feature structure or graph unification. +Typed-feature structures were designed to deal with the shortcomings +of Prolog for applications in encoding human languages. + +Using a Datalog variant for what is essentially a constraint +validation task is somewhat curious. +Datalog makes an excellent query language. +But for constraint enforcement, it is a bit cumbersome as one effectively +first needs to query values to which to apply the constraints. +CUE collates the constraints with the location of the data to which they apply. +As a result, CUE constraints look a lot like the data they constrain, +unlike Rego which will be more reminiscent of a Datalog program. + +But more importantly, CUE's approach is more amenable to finding normalized +and simplified representations of constraints, which makes it more suitable +for creating OpenAPI from them. + + +### CDDL + +The Concise Data Definition Language (CDDL) is used to define +the structure of CBOR or JSON data. +CDDL shares many of the same constructs from CUE, including +disjunctions, embedding, optional fields, and definitions. + +CDDL, however, has no value lattice and does not define mathematical +properties of its data. +There several other aspects in CDDL that contradict the use of a value lattice +or make it harder to do so. +Overall this restricts the expressiveness of CDDL compared to CUE +while complicating the ability to combine constraints on types +from multiple sources. + +Unlike OpenAPI, CDDL is a domain-specific language (DSL). +It needs a specific interpreter. +It also has some non-trivial aspects to its evaluation, making it much harder +than OpenAPI to implement. + diff --git a/content/docs/concept/schema-definition-use-case/gen_cache.cue b/content/docs/concept/schema-definition-use-case/gen_cache.cue new file mode 100644 index 0000000000..18e08143f6 --- /dev/null +++ b/content/docs/concept/schema-definition-use-case/gen_cache.cue @@ -0,0 +1,20 @@ +package site +{ + content: { + docs: { + concept: { + "schema-definition-use-case": { + page: { + cache: { + code: { + "api-cue": "EaxLFdnMbGd3BHjH9ZghDaSqSuGmNwuRaqm5CPyEL1c=" + "api-go": "EEibYnSR5uYcZhmNEHu9CSW2LTc2HpnQubZwlPRN6X4=" + "openapi-comparison": "NYcksiI/gwxoslm/S5XiSe1TTyM/eAioCg+1b41sXdA=" + } + } + } + } + } + } + } +} diff --git a/content/docs/concept/schema-definition-use-case/page.cue b/content/docs/concept/schema-definition-use-case/page.cue new file mode 100644 index 0000000000..36f8c769bd --- /dev/null +++ b/content/docs/concept/schema-definition-use-case/page.cue @@ -0,0 +1,3 @@ +package site + +content: docs: concept: "schema-definition-use-case": {} diff --git a/content/docs/concept/scripting-use-case/en.md b/content/docs/concept/scripting-use-case/en.md new file mode 100644 index 0000000000..1923afeb9e --- /dev/null +++ b/content/docs/concept/scripting-use-case/en.md @@ -0,0 +1,20 @@ +--- +title: "Scripting use case" +description: "Make static data come to life" +toc_hide: true +--- + +CUE has a powerful scripting layer. + +More on this later. + +For now, we refer to the documentation included in the CUE tool itself: +``` +$ cue help cmd +``` +or the "Define Commands" section of the +[Kubernetes Tutorial](https://github.com/cue-labs/cue-by-example/blob/main/003_kubernetes_tutorial/README.md). + +{{< warning >}} +User-defined command line flags are not yet supported. +{{< /warning >}} diff --git a/content/docs/concept/scripting-use-case/page.cue b/content/docs/concept/scripting-use-case/page.cue new file mode 100644 index 0000000000..485a8ccbaf --- /dev/null +++ b/content/docs/concept/scripting-use-case/page.cue @@ -0,0 +1,3 @@ +package site + +content: docs: concept: "scripting-use-case": {} diff --git a/hugo/content/en/docs/concept/code-generation-and-extraction-use-case/index.md b/hugo/content/en/docs/concept/code-generation-and-extraction-use-case/index.md new file mode 100644 index 0000000000..890c8f62e7 --- /dev/null +++ b/hugo/content/en/docs/concept/code-generation-and-extraction-use-case/index.md @@ -0,0 +1,115 @@ +--- +title: "Code Generation and Extraction use case" +description: "Converting CUE constraints to and from definitions in other languages" +toc_hide: true +--- + +Code generation and extraction is a broad topic and, for instance, overlaps +with the topics discussed in +[Schema Definition]({{< relref "/docs/concept/schema-definition-use-case" >}}) and +[Go](/docs/integrations/go). + + +In this section we emphasize the role of CUE in a code-generation pipeline, +that is using CUE as an interlingua for the extraction from and the +generation to multiple sources. + + +## Core issues addressed by CUE + +### Extract data definition from existing sources + +When one identifies the need to define interchangeable data schema +one usually already has some code base to deal with. + +CUE can currently extract definitions from: + + +- [Go code](/docs/integrations/go#extract-cue-from-go) +- Protobuf definitions. + +Moreover, CUE can combine and reduce the constraints from various sources +and report if there are any inconsistencies. + + +### Enhance existing standards + +CUE also allows annotating existing sources with CUE expressions. +This allows one to keep using existing sources or allow for a smoother +transition into taking a CUE-centric approach. +For instance, a project might be quite reliant on protobuf definitions +as the source of truth of at least one aspect of schema definition. +For this particular case, CUE allows annotating Protobuf field declarations +with CUE expressions using field options. + +```proto +message Server { + int32 port = 1 [(cue.val) = ">5000 & <10_000"]; +} +``` + +A similar approach is supported for Go: + +```go +type Sum struct { + A int `cue:"c-b" json:"a,omitempty"` + B int `cue:"c-a" json:"b,omitempty"` + C int `cue:"a+b" json:"c,omitempty"` +} +``` + +In both cases, the constraints will be included the extraction to CUE. +In the case of Go, the constraints specified in the field tags can also +be used to validate Go structs directly. + + +### Convert CUE to other standards + +Currently, CUE supports converting CUE to OpenAPI and Go, although it is +certainly not limited to these cases. + + +## Comparisons + +### CEL + +The [Common Expression Language](https://github.com/google/cel-spec), +or CEL, defines a simple expression language that can be used as a +standardization of constraints. +It focuses on simplicity, speed, termination guarantees and +being able run embedded in applications. + +Unification of basic typed-feature structures has pseudo-linear run +time complexity. +The addition of comprehensions make the operation polynomial. +Not disallowing recursion would make CUE Turing complete. +The addition of sum types in CUE make certain operations NP-complete. +The NP-completeness manifests itself only when reasoning over incomplete types. +Trying to optimize a CEL expression would generally suffer from the same issue. +The same problem does not exist when applying CUE to concrete values. + +That said, CUE is currently not optimized for embedded running. +Currently, generated Go stubs embed a CUE interpreter into the code. +These stubs are compatible with a mode where CUE generates native code, +which would give it similar characteristics. + +CEL allows embedded implementation to add arbitrary functions. +CUE does not. +CUE keeps tight control over the pureness or hermeticity of evaluation +and to ensure the properties of the value lattice are not broken. +It would be possible, however, to provide the ability to add custom functions +for restricted to concrete values. + + +### Protoc-gen-validate (PGV) + +PGV also allows annotating Protobuf fields with validation code, +with implementations for Go and Java and an experimental versions for C++ +as of this writing. + + +```proto +message Server { + int32 port = 1 [(validate.rules).int32 = { gte: 5000, lte: 10000 }]; +} +``` diff --git a/hugo/content/en/docs/concept/configuration-use-case/index.md b/hugo/content/en/docs/concept/configuration-use-case/index.md new file mode 100644 index 0000000000..352215c426 --- /dev/null +++ b/hugo/content/en/docs/concept/configuration-use-case/index.md @@ -0,0 +1,268 @@ +--- +title: "Configuration use case" +description: "Managing text-based files to define a desired state of a system" +toc_hide: true +--- + +Arguably, validation should be the foremost task of any configuration language. +Most configuration languages, however, focus on boilerplate removal. +CUE is different in that it takes the validation first stance. +But CUE's constraints are also effective at reducing boilerplate, +although the approach it takes is quite different from conventional +data templating languages. + +CUE basic operation merges configurations in a way that the outcome is +always the same regardless of the order in which it is carried out +(it is associative, commutative and idempotent). +This property is the foundation for many other favorable properties, as discussed below. + + +## Core issues addressed by CUE + +### Type checking + + For large code bases, no one will question a requirement to + have a compiled/typed language. + Why should one not require the same kind of rigor for data? + +Many configuration languages, including GCL and its offspring, focus on +reducing boilerplate as the primary task of configuration. +Support for typing, however, is minimal or almost non-existent. + +Some languages do add typing support, but it is usually +limited to validating basic types, as is common with programming languages. +For data, however, this is insufficient. +Evidence of this is the uprise of standards like CDDL and OpenAPI that +go beyond basic typing. + +In CUE types and values are a unified concept, which gives it very +expressive, yet intuitive and compact, typing capabilities. + +```text +#Spec: { + kind: string + + name: { + first: !="" // must be specified and non-empty + middle?: !="" // optional, but must be non-empty when specified + last: !="" + } + + // The minimum must be strictly smaller than the maximum and vice versa. + minimum?: int & minimum +} + +// A spec is of type #Spec +spec: #Spec +spec: { + knid: "Homo Sapiens" // error, misspelled field + + name: first: "Jane" + name: last: "Doe" +} +``` + +### Simplicity at Scale + +When using a configuration language to reduce boilerplate +one should consider whether the reduced verbosity is worth the +increased complexity. +Most configurations use an override model to reducing boilerplate: +an existing configuration is used as a base and modified to result in +a new configuration. +This is often in the form of inheritance. + +For small-scale projects, +using inheritance can be too complex, and the simplicity of +spelling everything out is often a superior approach. +For large-scale projects, however, using inheritance often leads to deep +layerings of modifications, making it very hard to see where values come from. +In the end, it is again questionable whether the added complexity is worth it. + +Like with other configuration languages, CUE can add complexity if values +are organized to come from multiple places. +However, as CUE disallows overrides, deep layerings are naturally prevented. +More importantly, CUE can also enhance readability. +A definition in one file may apply to values in many other files. +Where one would usually have to open all these files to verify validity; +with CUE one can see it at a glance. + +CUE's approach has been battle-tested in computational linguistics where it +has been used for decades to describe human languages; +effectively very large, complex and irregular configurations. + + +### Abstractions versus Direct Access + +A common debate for configuration languages is whether a language should +provide an abstraction layer for APIs. +On the one hand, abstraction layers allow for protecting the user against misuse. +On the other hand, they need to keep up with API changes and are +inevitably prone to drift. +So it goes. + +CUE addresses both issues. +On the one hand, its fine-grained typing allows layering detailed constraints +on top of native APIs, without the need for an abstraction layer. +New features can be used without support of existing definitions. + +On the other hand, CUE's order independence allows abstraction layers +to inject arbitrary raw API in a controlled manner, +allowing a general escape hatch to support new or uncovered features. +See the Manual section of the +[Kubernetes tutorial](https://github.com/cue-labs/cue-by-example/blob/main/003_kubernetes_tutorial/README.md) +for an example. + +### Tooling + +A configuration language usually transforms its configurations to +a lower-level representation, like JSON, YAML, or Protobuf so that +it can be consumed by tools taking in these languages. +Piping such output to the needed tools works initially; +but sooner or later one will get the desire to automate this, +usually in the form of some kind of tool. + +And so it goes. +The rise of systems requiring advanced configuration has been paired +with a rise of even more specialized command line tools. +The core structure of all these tools is more or less the same. +More annoyingly, many have overlapping functionality yet are hardly extendable +or interoperable. +In the latter case, one may see the need to layer on yet another set of tools. + +Having tools like `kubectl` or `etcdctl` that directly control +core infrastructure makes sense, but at higher levels of +abstraction one needs a more open approach. + +CUE attempts to address this by providing an open, +declarative scripting layer on top of the configuration layer. +Aside from the above-mentioned case, it is designed to address various +other issues: + +- inject environmental data into configuration, something not allowed + in CUE itself (it is pure, or hermetic, or side-effect free) +- inject computed data into configurations as part of a pipeline +- allow composability of tool integration + +Again, the ability to deterministically merge data from different sources +make this a shoo-in task for CUE. + + +## Comparisons + +### Inheritance-based configuration languages + +Inheritance, is +not commutative and idempotent in the general case. In other words, order +matters. +This makes it hard to track where values are coming from. +This is not only true for humans, but also machines. +It makes it very complicated, if not impossible, to do any kind of +automation. +The complexity of inheritance is even bigger if values +can enter an object from one of several directions (super, overlay, etc.). + +The basic operation of CUE is commutative, associative and idempotent. +This order independence helps both +humans and machines. The resulting model is much less complex. + +{{< info >}} +### Inheritance in CUE +Although CUE does not have inheritance in the override sense, it does have +the notion of one value being an instance of another. +In fact, this is a core principle. + +Let's use a real-world example to make this distinction clear: +In the override model of inheritance, one can take an existing template, +say a dog, and modify it to become a cat. +Trim the ears, dry off the nose, and what have you. + +In CUE, it is a matter of classification. +Cats and dogs are both instances of animals, but once an entity is defined +to be a cat, it can never become a dog. +To most humans (aka computer scientists that have not become accustomed +to inheritance) this makes total sense. +{{< /info >}} + +Although one can create instances of values (remember, types are values), +one can not alter any of the values of a parent. +A template acts as a type. +Just as in statically typed languages where one cannot assign an integer to +a string, one cannot violate the properties of a type in CUE. + +These restrictions reduce flexibility, but also enhance clarity. +To ensure that a configuration holds a certain property, just declare it +in any file included in the project to make it so. +There is no need to look at other files. +As we saw; the imposed restrictions can also improve, rather than hurt, +the ability to remove boilerplate compared to inheritance-based languages. + +The complexity of inheritance-based models also hampers automation. +The introduction of GCL was paired with the promise of advanced tooling. +The mantra of declarative languages was even repeated with some of +its offspring. +The tooling never materialized, though, as the model made it intractable. + +CUE already provides power tools like trim, and its API provides +unify and subsumption operations for incomplete configurations, the building +blocks for powerful analysis. + + + + +### Jsonnet/ GCL + +Like Jsonnet, CUE is a superset of JSON. +They also are both influenced by GCL. +CUE, in turn is influenced by Jsonnet. +This may give the semblance that the languages are very similar. +At the core, though, they are very different. + +CUE's focus is data validation whereas Jsonnet focuses on data templating +(boilerplate removal). +Jsonnet was not designed with validation in mind. + +Jsonnet and GCL can be quite powerful at reducing boilerplate. +The goal of CUE is not to be better at boilerplate removal than Jsonnet or GCL. +CUE was designed to be an answer to two major shortcomings of these approaches: +complexity and lack of typing. +Jsonnet reduces some of the complexities of GCL, but largely falls into the +same category. +For CUE, the tradeoff was to add typing and reduce complexity +(for humans and machines), at the expense of giving up flexibility. + + +### HCL + +HCL has some striking similarities with GCL. +But whether this was a coincidence or deliberate, it removes the core +source of complexity of GCL: inheritance. + +It does introduce a poor man's version of inheritance: file overlays. +Fields may be defined in multiple files that get overwritten in a certain +order of the file names. +Although not nearly as complex as GCL, it does have some of the same issues. + +Also, whether the removal of inheritance was a coincidence or great insight, +there is no construct given in return that one might need for larger scale +configuration management. +This means the use of HCL may hit a ceiling for medium to larger setups. + +So what CUE has to offer to users of HCL is: typing, better growth prospects +to larger scale operations, and eliminating the peculiarities of file overlays. + +CUE does borrow one construct from HCL: the folding of single-field objects +onto a single line was directly inspired by HCL's very similar approach. + + + diff --git a/hugo/content/en/docs/concept/data-validation-use-case/index.md b/hugo/content/en/docs/concept/data-validation-use-case/index.md new file mode 100644 index 0000000000..71886d0d43 --- /dev/null +++ b/hugo/content/en/docs/concept/data-validation-use-case/index.md @@ -0,0 +1,136 @@ +--- +title: "Data Validation use case" +description: "Validate text-based or programmatic data" +toc_hide: true +--- + +By far the most straightforward approach to specify data is in plain +JSON or YAML files. +Every value can be looked up right where it needs to be defined. +But even at small scales, one will soon have to deal with +consistency issues. + +Data validation tools allow verifying the consistency of such data +based on a schema. + + +## Core issues addressed by CUE + +### Client-side validation + +There are not too many handy tools to verify plain data files. +Often, validation is relied upon to be done server side. +If it is done client side, it either relies on rather verbose schema +definitions or using custom tools that verify schema for a specific domain. + +The `cue` command line tool provides a fairly straightforward way to +define schema and verify them against a collection of data files. + +Given these two files, the `cue vet` command can verify that the values in +`ranges.yaml` are correct by just mentioning the two files on the command line. + +{{< code-tabs >}} +{{< code-tab name="check.cue" language="text" area="top-left" >}} +min?: *0 | number // 0 if undefined +max?: number & >min // must be strictly greater than min if defined. +{{< /code-tab >}} +{{< code-tab name="ranges.yaml" language="yaml" area="top-right" >}} +min: 5 +max: 10 +--- +min: 10 +max: 5 +{{< /code-tab >}} +{{< code-tab name="output" language="" area="bottom" >}} +max: invalid value 5 (out of bound >10): + ./check.cue:2:16 + ./ranges.yaml:5:7 +{{< /code-tab >}} +{{< /code-tabs >}} + + +### Validating document-oriented databases + +Document-oriented databases like Mongo and many others are characterized +by having flexible schema. +Some of them, like Mongo, optionally allow schema definitions, often in the +form of JSON schema. + +CUE constraints can be used to verify document-oriented databases. +Its default mechanism and expression syntax allow for filling in missing +values for an older version of a schema. +More importantly, CUE's order independence allows +"patch" specifications to be separated from the main schema definition. +CUE can take care of merging these and report if there are any inconsistencies +in the definitions, even before they are applied to a concrete case. + +CUE can be applied directly on the data in code using its API, +but it can also be used to compute JSON schemas from CUE definitions. +(See [cuelang.org/go/encoding/openapi](https://pkg.go.dev/cuelang.org/go/encoding/openapi).) +If a document-oriented database natively supports JSON schema it will likely +have its benefits to do so. +Using CUE to generate the schema has several advantages over doing so directly: + +- CUE is far less verbose. +- CUE can extract base definitions from other sources, like Go and Protobuf. +- It allows annotating validation code in these other sources + (e.g. field tags for Go, options for Protobuf). +- CUE's ability to merge, validate, and normalize configurations, + allows separation of concerns between main schema and patches for + older version, for instance. +- CUE can morph definitions in several forms, such as the structural OpenAPI + needed for Kubernetes' CRDs as of version 1.15. + + + + + + +### Migration path + + +As discussed in +["Be useful at all scales"](/docs/about#be-useful-at-all-scales), +there is a high cost to changing languages as one reaches the limits +with a certain approach. + +CUE adds the benefit of type checking to plain data files. +Once in use, it allows the same, +familiar tools to move to something more structured +as this approach reaches its limits. +CUE provides automated rewrite tools, such as `cue import` and `cue trim` +to aid in such migration. + + +## Comparisons + +### JSON Schema + +The closest approach to validating JSON and YAML with schema is the use +of JSON schema and accompanying tools. + +Compared to CUE, JSON schema does not have a unified type and value model. +This makes the ability to use JSON schema for boilerplate reduction minimal. +As it is specified in JSON itself (it is not a DSL) it can be quite verbose. + +Overall CUE is a more concise, yet more powerful schema language. +For instance, in CUE one can specify that two fields need to be identical to +one another: + +```text +point: { + x: number + y: number +} + +diagonal: point & { + x: y + y: x +} +``` + +Such a thing is not possible in JSON schema (or most configuration languages +for that matter). + +More on JSON Schema and its subset, OpenAPI, +in [Schema Definition]({{< relref "/docs/concept/schema-definition-use-case#json-schema--openapi" >}}). diff --git a/hugo/content/en/docs/concept/querying-use-case/index.md b/hugo/content/en/docs/concept/querying-use-case/index.md new file mode 100644 index 0000000000..506a3febf5 --- /dev/null +++ b/hugo/content/en/docs/concept/querying-use-case/index.md @@ -0,0 +1,31 @@ +--- +title: "Querying use case" +description: "Find data matching certain criteria" +toc_hide: true +--- + +CUE orders all values in a value lattice. +A value more at the top of a hierarchy is what programming languages would +refer to as a type. +Concrete value or constraints on such a "type" are all instances of that type. + +In other words, CUE constraints can be used to find patterns in data. +`cue vet` is a simple instance of this. + +But more elaborate querying in the form of a `find` or `query` subcommand +is certainly possible. +We would love to hear about your envisioned use cases to plan out +such a subcommand. + +## Programmatic Querying + +In the mean time, you can query data programmatically using the CUE API. +What you will need to do is + +- load data and constraints using + `cuelang.org/go/cue.Runtime` or + `cuelang.org/go/cue/load.Instances`. +- Walk over data using `cuelang.org/go/cue.Value`'s `Walk` method + or look up specific values. +- call `pattern.Subsumes(value)`, where `pattern` and `value` are + `cue.Value`s to see if value is an instance of pattern. diff --git a/hugo/content/en/docs/concept/schema-definition-use-case/index.md b/hugo/content/en/docs/concept/schema-definition-use-case/index.md new file mode 100644 index 0000000000..4f1e4b4438 --- /dev/null +++ b/hugo/content/en/docs/concept/schema-definition-use-case/index.md @@ -0,0 +1,281 @@ +--- +title: "Schema Definition use case" +description: "Defining schema to communicate an API or standard" +toc_hide: true +--- + +A data definition language describes the structure of data. +The structure defined by such a language can, in turn, be used +to verify implementations, validate inputs, or generate code. + +Most modern dedicated data definition languages or standards allow +more than just describing whether a field is an integer or a string. +Standards like OpenAPI and CDDL allow defining things like default +values, ranges, and various other constraints. +OpenAPI even allows for complex logical combinators. + +A key difference, however, is that these standards do not unify schema +and values—the thing that makes CUE so powerful. +There is no value lattice. +This limits these standards in various ways. + + +## Core issues addressed by CUE + +### Validating backwards compatibility + +CUE's model makes it easy to verify that newer versions of schema +are backwards-compatible with older versions. + +Consider the following versions of the same API: + +```text +// Release notes: +// - You can now specify your age and your hobby! +#V1: { + age: >=0 & <=100 + hobby: string +} + +// Release notes: +// - People get to be older than 100, so we relaxed it. +// - It seems not many people have a hobby, so we made it optional. +#V2: { + age: >=0 & <=150 // people get older now + hobby?: string // some people don't have a hobby +} + +// Release notes: +// - Actually no one seems to have a hobby nowadays anymore, so we dropped the field. +#V3: { + age: >=0 & <=150 +} +``` + +Declarations with a name starting with `#` are definitions. +Definitions are not emitted when converting to data, for instance when +exporting to JSON, and thus do not need to be concrete in such cases. +Definitions assume the definition of closed structs, which means a user may +only use fields that are explicitly defined. + +In CUE, an API is backwards compatible if it subsumes the older one, or +if the old one is an instance of the new one. + +This can be computed using the API: + +```go +inst, err := r.Compile("apis", /* text of the above API */) +if err != nil { + // handle error +} +v1, err1 := inst.LookupField("V1") +v2, err2 := inst.LookupField("V2") +v3, err3 := inst.LookupField("V3") +if err1 != nil || err2 != nil || err3 != nil { + // handle errors +} + +// Check if V2 is backwards compatible with V1 +fmt.Println(v2.Value.Subsumes(v1.Value))) // true + +// Check if V3 is backwards compatible with V2 +fmt.Println(v3.Value.Subsumes(v2.Value))) // false +``` + +It is as simple as that. +This is the kind of thing that is made possible +by ordering all values in a lattice, like CUE does. +For CUE, checking whether one API is an instance of another is like checking +whether 3 is less than 4. + +Note that `V2` strictly relaxed the API relative to `V1`. +It allowed specifying a wider age range and made the `hobby` field optional. +In `V3` the `hobby` field is explicitly disallowed. +This is not backwards compatibly as it breaks previous field that did +contain a `hobby` field. + +The current API only reports a _yea_ or _nay_. +The plan is to give full actionable reports. +Feedback welcome! + + +### Combining constraints from different sources + +Most data definition languages are often not +explicitly defined for commutativity. +For instance, CDDL, although much less expressive than CUE, introduces operators +that break commutativity. + +The additive property obtained by commutativity is of great value for +data definition. +Constraints often come from many sources. +For instance, one can have constraints from a base template, from code, +policies provided by different departments and policies provided by +a client. + +CUE's additive nature of constraints allows piling up constraints, +in any order, to obtain a new definition. +Which leads us to the next topic. + +### Normalization of data definitions + +Adding constraints from many sources can result in a lot of redundancy. +Even worse, constraints can be specified in different logical forms, +making their additive form verbose and unwieldy. +This is fine if all a system does using these constraints is validate data. +But this is problematic if the added constraints are to form the basis for, +say, human consumption. + +CUE's logical inference engine automatically reduces constraints. +Its API makes it possible to compute and select between +various normal forms to optimize for a certain representation. +This is used in [CUE's OpenAPI generator](/docs/integrations/openapi), +for instance. + + +## Comparisons + +### JSON Schema / OpenAPI + +JSON Schema and OpenAPI are purely data-driven data definition standards. +OpenAPI originates from Swagger. +As of version 3, OpenAPI is more or less a subset of JSON Schema. +OpenAPI is used to define Kubernetes Custom Resource Definitions. +As of 1.15, this requires a variant of OpenAPI called Structural OpenAPI. +We will collectively refer to these as OpenAPI henceforth. + +OpenAPI does not have any expressions or references. +They have powerful logical operators, though, +that make them remarkably expressive. + +{{< info >}} +### On logical not +OpenAPI defines a `not` operator. +These get fuzzy when defined on structs, which OpenAPI allows. +CUE doesn't have such a construct, partly to avoid its logical pitfalls. +However, it can get a good approximation by interpreting `¬P` as `P→⊥`. +{{< /info >}} + +An advantage of OpenAPI is that it is purely defined in terms of data (JSON). +This allows sending it over the wire. +It is defined such that implementing an interpreter is fairly straightforward. + +One disadvantage is that it is very verbose. +Compare the following two equivalent schema definitions: + +{{< code-tabs >}} +{{< code-tab name="native.cue" language="text" area="top-left" >}} +// Definitions. + +// Info describes... +Info: { + // Name of the adapter. + name: string + + // Templates. + templates?: [...string] + + // Max is the limit. + max?: uint & <100 +} +{{< /code-tab >}} +{{< code-tab name="openapi.json" language="json" area="top-right" >}} +{ + "openapi": "3.0.0", + "info": { + "title": "Definitions.", + "version": "v1beta1" + }, + "components": { + "schemas": { + "Info": { + "description": "Info describes...", + "type": "object", + "required": [ + "name" + ], + "properties": { + "name": { + "description": "Name of the adapter.", + "type": "string", + "format": "string" + }, + "templates": { + "description": "Templates.", + "type": "array", + "items": { + "type": "string", + "format": "string" + } + }, + "max": { + "description": "Max is the limit", + "type": "integer", + "minimum": 0, + "exclusiveMaximum": 100 + } + } + } + } + } +} +{{< /code-tab >}} +{{< /code-tabs >}} + +The difference gets more extreme as more constraints and logical +combinators are used. + +OpenAPI and CUE both have theirs roles. +The JSON format of OpenAPI makes it good interchange standard. +CUE, on the other hand, can serve as an engine to generate and interpret +OpenAPI constraints. +Note that CUE is generally more expressive and many CUE constraints will +not be encodeable in OpenAPI. + + +### OPA / Rego + +Although not designed as a data definition language, Rego, the language +used for Open Policy Agent (OPA), also solves the issue of being able to +add constraints from multiple sources. + +Rego, like CUE, has its roots in logic programming. +It is based on Datalog, a restricted form of Prolog, whereas CUE is based on +typed-feature structure or graph unification. +Typed-feature structures were designed to deal with the shortcomings +of Prolog for applications in encoding human languages. + +Using a Datalog variant for what is essentially a constraint +validation task is somewhat curious. +Datalog makes an excellent query language. +But for constraint enforcement, it is a bit cumbersome as one effectively +first needs to query values to which to apply the constraints. +CUE collates the constraints with the location of the data to which they apply. +As a result, CUE constraints look a lot like the data they constrain, +unlike Rego which will be more reminiscent of a Datalog program. + +But more importantly, CUE's approach is more amenable to finding normalized +and simplified representations of constraints, which makes it more suitable +for creating OpenAPI from them. + + +### CDDL + +The Concise Data Definition Language (CDDL) is used to define +the structure of CBOR or JSON data. +CDDL shares many of the same constructs from CUE, including +disjunctions, embedding, optional fields, and definitions. + +CDDL, however, has no value lattice and does not define mathematical +properties of its data. +There several other aspects in CDDL that contradict the use of a value lattice +or make it harder to do so. +Overall this restricts the expressiveness of CDDL compared to CUE +while complicating the ability to combine constraints on types +from multiple sources. + +Unlike OpenAPI, CDDL is a domain-specific language (DSL). +It needs a specific interpreter. +It also has some non-trivial aspects to its evaluation, making it much harder +than OpenAPI to implement. + diff --git a/hugo/content/en/docs/concept/scripting-use-case/index.md b/hugo/content/en/docs/concept/scripting-use-case/index.md new file mode 100644 index 0000000000..1923afeb9e --- /dev/null +++ b/hugo/content/en/docs/concept/scripting-use-case/index.md @@ -0,0 +1,20 @@ +--- +title: "Scripting use case" +description: "Make static data come to life" +toc_hide: true +--- + +CUE has a powerful scripting layer. + +More on this later. + +For now, we refer to the documentation included in the CUE tool itself: +``` +$ cue help cmd +``` +or the "Define Commands" section of the +[Kubernetes Tutorial](https://github.com/cue-labs/cue-by-example/blob/main/003_kubernetes_tutorial/README.md). + +{{< warning >}} +User-defined command line flags are not yet supported. +{{< /warning >}}