Skip to content

Commit

Permalink
docs/introduction: import welcome from cuelang.org
Browse files Browse the repository at this point in the history
Imported from master branch at commit c06bde0 file:

    /content/en/docs/about/_index.md

with only those markup changes required to satisfy the current alpha
processing chain.

https://cuelang.org/issue/2604 captures making this page
preprocessor-aware.

The diff is:

    $ diff -wu <(git show c06bde0:content/en/docs/about/_index.md) content/docs/introduction/welcome/en.md
    --- /dev/fd/63  2023-09-19 20:33:16.302252821 +0100
    +++ content/docs/introduction/welcome/en.md     2023-09-19 20:33:12.410262509 +0100
    @@ -1,8 +1,8 @@
    -+++
    -title = "About"
    -weight = 100
    -description = "How did CUE come about and what are its principles."
    -+++
    +---
    +title: "Welcome"
    +weight: 10
    +description: "How did CUE come about and what are its principles."
    +---

     ## Intro

    @@ -107,40 +107,34 @@
     The middle column shows a possible schema for any municipality.
     On the right one sees a mix between data and schema as is exemplary of CUE.

    -{{< blocks/sidebyside >}}
    -<div class="col">
    +{{< columns >}}
     Data
    -{{< highlight go >}}
    +```cue
     moscow: {
       name:    "Moscow"
       pop:     11.92M
       capital: true
     }
    -{{< /highlight >}}
    -</div>
    -
    -<div class="col">
    +```
    +{{< columns-separator >}}
     Schema
    -{{< highlight go >}}
    +```cue
     municipality: {
       name:    string
       pop:     int
       capital: bool
     }
    -{{< /highlight >}}
    -</div>
    -
    -<div class="col">
    +```
    +{{< columns-separator >}}
     CUE
    -{{< highlight go >}}
    +```cue
     largeCapital: {
       name:    string
       pop:     >5M
       capital: true
     }
    -{{< /highlight >}}
    -</div>
    -{{< /blocks/sidebyside >}}
    +```
    +{{< /columns >}}

     In general, in CUE one starts with a broad definition of a type, describing
     all possible instances.

For cue-lang/docs-and-content#81.

Preview-Path: /docs/introduction/welcome/
Signed-off-by: Jonathan Matthews <[email protected]>
Change-Id: Ief0da7495fa46325d1d26e2629272738019f5335
Dispatch-Trailer: {"type":"trybot","CL":1169105,"patchset":3,"ref":"refs/changes/05/1169105/3","targetBranch":"alpha"}
  • Loading branch information
jpluscplusm authored and cueckoo committed Sep 19, 2023
1 parent 962841f commit 1f59f37
Show file tree
Hide file tree
Showing 2 changed files with 614 additions and 8 deletions.
311 changes: 307 additions & 4 deletions content/docs/introduction/welcome/en.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,314 @@
---
title: Welcome
title: "Welcome"
weight: 10
description: "How did CUE come about and what are its principles."
---

## Heading 2 example
## Intro

This page will include a general welcome to CUE the open source project.
CUE is an open-source data validation language and inference engine
with its roots in logic programming.
Although the language is not a general-purpose programming language,
it has many applications, such as
data validation, data templating, configuration, querying,
code generation and even scripting.
The inference engine can be used to validate
data in code or to include it as part of a code generation pipeline.

It will likely not contain any sub pages.
A key thing that sets CUE apart from its peer languages
is that it merges types and values into a single concept.
Whereas in most languages types and values are strictly distinct,
CUE orders them in a single hierarchy (a lattice, to be precise).
This is a very powerful concept that allows CUE to do
many fancy things.
It also simplifies matters.
For instance, there is no need for generics and enums, sum types
and null coalescing are all the same thing.


## Applications

CUE's design ensures that combining CUE values in any
order always gives the same result
(it is associative, commutative and idempotent).
This makes CUE particularly well-suited for cases where CUE
constraints are combined from different sources:

- Data validation: different departments or groups can each
define their own constraints to apply to the same set of data.

- Code extraction and generation: extract CUE definitions from
multiple sources (Go code, Protobuf), combine them into a single
definition, and use that to generate definitions in another
format (e.g. OpenAPI).

- Configuration: values can be combined from different sources
without one having to import the other.

The ordering of values also allows set containment analysis of entire
configurations.
Where most validation systems are limited to checking whether a concrete
value matches a schema, CUE can validate whether any instance of
one schema is also an instance of another (is it backwards compatible?),
or compute a new schema that represents all instances that match
two other schema.


## History

Although it is a very different language, the roots of CUE lie in GCL,
the dominant configuration language in use at Google as of this writing.
It was originally designed to configure Borg, the predecessor of Kubernetes.
In fact, the original idea was to use graph unification as used in CUE for GCL.
One of the authors of GCL had extensive experience with such systems and
experienced the benefit of being able to compute and reason with types for the
creation of powerful tooling.

The graph unification model CUE is based on
was in common use in computational linguistics at that time and was
successfully used to manage grammars and lexicons of over 100k lines of
declarative definitions.
These were effectively very large
configurations of something as irregular and complex as a human language.
A property of these systems were that the types, or constraints, one
defines validate the data while simultaneously reducing boilerplate.
Overall, this approach seemed to be extremely well-suited
for cloud configuration.

However, the early design of GCL went for something simpler that coincidentally
was also incompatible with the notion of graph unification.
This simpler approach proved insufficient, but it was already too late to
move to the earlier foreseen approach.
Instead, an inheritance-based override model was adopted.
Its complexity made the earlier foreseen tooling intractable
and they never materialized.
The same holds for the GCL offsprings that copied its model.

CUE goes back to the original idea of using a constraint-based approach and
also makes an effort to incorporate lessons learned from 15 years of GCL usage.
This also includes lessons learned from offsprings and different approaches to
configuration altogether.


## Philosophy and principles

### Types are Values

CUE does not distinguish between values and types.
This is a powerful notion that allows CUE to define ultra-detailed
constraints, but it also simplifies things considerably:
there is no separate schema or data definition language to learn
and related language constructs such as sum types, enums,
and even null coalescing collapse onto a single construct.

Below is a demonstration of this concept.
On the left one can see a JSON object (in CUE syntax) with some properties
about the city of Moscow.
The middle column shows a possible schema for any municipality.
On the right one sees a mix between data and schema as is exemplary of CUE.

{{< columns >}}
Data
```cue
moscow: {
name: "Moscow"
pop:   11.92M
capital: true
}
```
{{< columns-separator >}}
Schema
```cue
municipality: {
name: string
pop:   int
capital: bool
}
```
{{< columns-separator >}}
CUE
```cue
largeCapital: {
name: string
pop:   >5M
capital: true
}
```
{{< /columns >}}

In general, in CUE one starts with a broad definition of a type, describing
all possible instances.
One then narrows down these definitions, possibly by combining constraints
from different sources (departments, users), until a concrete data instance
remains.


### Push, not pull, constraints

CUE's constraints act as data validators, but also double as
a mechanism to reduce boilerplate.
This is a powerful approach, but requires some different thinking.
With traditional inheritance approaches one specifies the templates that
are to be inherited from at each point they should be used.
In CUE, instead, one selects a set of nodes in the configuration to which
to apply a template.
This selection can be at a different point in the configuration altogether.

Another way to view this, a JSON configuration, say, can be
defined as a sequence of path-leaf values.
For instance,
```json
{
"a": 3,
"b": {
"c": "foo"
}
}
```

could be represented as
```
"a": 3
"b": "c": "foo"
```
All the information of the original JSON file is retained in this
representation.

CUE generalizes this notion to the following pattern:
```
<set of nodes>: <constraints>
```
Each field declaration in CUE defines a set of nodes to which to apply
a specific constraint.
Because order doesn't matter, multiple constraints can be applied to the
same nodes, all of which need to apply simultaneously.
Such constraints may even be in different files.
But they may never contradict each other:
if one declaration says a field is `5`, another may not override it to be `6`.
Declaring a field to be both `>5` and `<10` is valid, though.

This approach is more restricted than full-blown inheritance;
it may not be possible to reuse existing configurations.
On the other hand, it is also a more powerful boilerplate remover.
For instance, suppose each job in a set needs to use a specific
template.
Instead of having to spell this out at each point,
one can declare this separately in a one blanket statement.

So instead of
```
jobs: {
foo: acmeMonitoring & { /* ... */ }
bar: acmeMonitoring & { /* ... */ }
baz: acmeMonitoring & { /* ... */ }
}
```
one can write

```
jobs: [string]: acmeMonitoring
jobs: {
foo: { /* ... */ }
bar: { /* ... */ }
baz: { /* ... */ }
}
```
There is no need to repeat the reference to the monitoring template for
each job, as the first already states that all jobs _must_ use `acmeMonitoring`.
Such requirements can be specified across files.

This approach not only reduces the boilerplate contained in `acmeMonitoring`
but also removes the repetitiveness of having to specify
this template for each job in `jobs`.
At the same time, this statement act as a type enforcement.
This dual function is a key aspect of CUE and
typed feature structure languages in general.

This approach breaks down, of course, if the restrictions in
`acmeMonitoring` are too stringent and jobs need to override them.
To this extent, CUE provides mechanisms to allow defaults, opt-out, and
soft constraints.


### Separate configuration from computation

There comes a time that one (seemingly) will need do complex
computations to generate some configuration data.
But simplicity of a configuration language can be paramount when one quickly
needs to make changes.
These are obviously conflicting interests.

CUE takes the stance that computation and configuration should
be separated.
And CUE actually makes this easy.
The data that needs to be computed can be generated outside of CUE
and put in a file that is to be mixed in.
The data can even be generated in CUE's scripting layer and automatically
injected in a configuration pipeline.
Both approaches rely on CUE's property that the order in which this data gets
added is irrelevant.



### Be useful at all scales

The usefulness of a language may depend on the scale of the project.
Having too many different languages can put a cognitive strain on
developers, though, and migrating from one language to another as
scaling requirements change can be very costly.
CUE aims to minimize these costs
by covering a myriad of data- and configuration-related tasks at all scales.

**Small scale**
At small scales, reducing boilerplate in configurations is not necessarily
the best thing to do.
Even at a small scale, however, repetition can be error prone.
For such cases, CUE can define schema to validate otherwise
typeless data files.

**Medium scale**
As soon the desire arises to reduce boilerplate, the `cue` tool can
help to automatically rewrite configurations.
See the Quick and Dirty section of the
[Kubernetes tutorial](/docs/tutorials/kubernetes)
for an example using the `import` and `trim` tool.
Thousands of lines can be obliterated automatically using this approach.

**Large scale**
CUE's underlying formalism was developed for large-scale configuration.
Its import model incorporates best practices for large-scale engineering
and it is optimized for automation.
A key to this is advanced tooling.
The mathematical model underlying CUE's operations allows for
automation that is untractable for most other approaches.
CUE's `trim` command is an example of this.


### Tooling

Automation is key.
Nowadays, a good chunk of code gets generated, analyzed, reformatted,
and so on by machines.
The CUE language, APIs, and tooling have been designed to allow for
machine manipulation.
Aspects of this are:

- make the language easy to scan and parse,
- restrictions on imports,
- allow any piece of data to be split across files and generated
from different sources,
- define packages at the directory level,
- and of course its value and type model.

The order independence also plays a key role in this.
It allows combining constraints from various sources without having
to define any order in which they are to be applied to get
predictable results.


<!-- something about this?
Not turing complete.
Run in contexts where cost is hard to attribute.
Easier to make claims about termination (smart contracts).
-->
Loading

0 comments on commit 1f59f37

Please sign in to comment.