This document contains a detailed specification of the zserio schema language. The Zserio Language Overview document is targeted for developers who write zserio schema definitions.
This document does not describe the details how Zserio encodes data to the binary stream. Encoding details are described in Zserio Encoding Guide.
Zserio is a serialization schema language for modeling binary data types, bitstreams or file formats. Based on the zserio language it is possible to automatically generate encoders and decoders for a given schema in various target languages (e.g. Java, C++, Python).
Zserio is similar to other serialization mechanism like Google's Protocol Buffers but does not use what is called a "wire-format". Zserio therefore gives full control to the developers and comes with no serialization overhead. It is a WYSIWYG serialization mechanism.
Zserio also features an extension for SQLite databases. With that extension it is possible to use SQLite as a backend store for data defined with the zserio language. SQLite tables, columns and BLOBs can be all described in zserio, giving the developer overall control of the data schema used in SQLite databases.
The zserio syntax for literal values is similar to the Java syntax. There are no character literals, only string literals with the usual escape syntax. Integer literals can use decimal, hexadecimal, octal or binary notation.
Type | Value |
---|---|
Boolean | true , false |
Decimal | 100 , 4711 , 255 , -3 , +2 |
Hexadecimal | 0xCAFEBABE , 0Xff , -0xEF |
Octal | 044 , 0377 , -010 |
Binary | 111b , 110b , 001B , -1010b |
Float16 | 3.14f , 31.4e-1f , 314e-2f |
Float32 | 3.14f , 31.4e-1f , 314e-2f |
Float64 | 3.14 , 0.314e+1 , 0.0314e2 |
String | "You" |
Hexadecimal digits and the x
prefix as well as the b
, e
and 'f' suffixes are case-insensitive.
Signing literals can be defined by -
or +
prefix. Signs are not applicable for string literals.
Zserio supports the following integer built-in types
Sign | Types |
---|---|
unsigned | uint8 , uint16 , uint32 , uint64 |
signed | int8 , int16 , int32 , int64 |
These types correspond to unsigned or signed integers represented as sequences of 8, 16, 32 or 64 bits, respectively.
An unsigned bit field type is denoted by bit:1
, bit:2
, ...
The colon must be followed by a positive integer literal, which indicates the length of the type in bits.
The length should not exceed 64 bits. An unsigned bit field type corresponds to an unsigned integer of the
given length. Thus, bit:16
and uint16
are equivalent. The value range of bit:n
is 0..2n-1.
Unsigned bitfield types of variable length (dynamic unsigned bitfield types) can be specified as bit<expr>
,
where expr
is an expression of integer type to be evaluated at run-time and shall be in range [1, 64]
bits.
A signed bit field type is denoted by int:1
, int:2
, ...
The colon must be followed by a positive integer literal, which indicates the length of the type in bits.
The length should not exceed 64 bits. A signed bit field type corresponds to a signed integer of the given
length. Thus, int:16
and int16
are equivalent. The value range of int:n
is -2n-1..2n-1-1.
Signed bitfield types of variable length (dynamic signed bitfield types) can be specified as int<expr>
,
where expr
is an expression of integer type to be evaluated at run-time and shall be in range [1, 64]
bits.
Floating point types are modeled after the IEEE 754 specification. The following types are supported:
-
float16
- Half-precision floating-point format stored in 16 bits using 1 bit for the sign, 5 bits for the exponent and 10 bits for the significand. -
float32
- Single-precision floating-point format stored in 32 bits using 1 bit for the sign, 8 bits for the exponent and 23 bits for the significand. -
float64
- Double-precision floating-point format stored in 64 bits using 1 bit for the sign, 11 bits for the exponent and 52 bits for the significand.
Variable integer types store integer values but the number of bytes used is dependent on the actual value stored
in the data type. The supported types are varint16
, varint32
, varint64
and varint
for
the signed values and varuint16
, varuint32
, varuint64
, varuint
and varsize
for the unsigned
signed values. This is a special type of integer that uses only the bytes needed to store the value.
The value ranges of the variable integer types are:
Data Type | Value Range | Max Value | Max Bytes |
---|---|---|---|
varint16 | -16383 to 16383 |
2^14-1 |
2 |
varint32 | -268435455 to 268435455 |
2^28-1 |
4 |
varint64 | -72057594037927935 to 72057594037927935 |
2^56-1 |
8 |
varint | -9223372036854775808 to 9223372036854775807 |
2^63-1 |
9 |
varuint16 | 0 to 32767 |
2^15-1 |
2 |
varuint32 | 0 to 536870911 |
2^29-1 |
4 |
varuint64 | 0 to 144115188075855871 |
2^57-1 |
8 |
varuint | 0 to 18446744073709551615 |
2^64-1 |
9 |
varsize | 0 to 2147483647 |
2^31-1 |
5 |
Note that
varint
andvaruint
can handle allint64
anduint64
values respectively.
Note that
varsize
is designed for array size type to allow compatibility across languages. Because of that its range is limited to the biggest value of 32-bit signed integer and does not cover all available bits.
Note that
varsize
is available since2.0.0
.
In zserio, booleans are denoted by bool
. Both true
and false
are available as built-in keywords.
Example
struct TestStructure
{
bool hasValue;
int16 value if hasValue == true;
};
A String type is denoted by string
. It is represented by a length field followed by a sequence of bytes
in UTF-8 encoding. The string type allows a reader to skip the byte sequence since its length is known upfront.
Example
struct TestStructure
{
string textField;
};
External type is a zserio built-in type which format is not known by zserio. It is handled as arbitrary bit
sequence which is passed to application for further processing. It is denoted by zserio keyword extern
. It is
represented by a number of bits followed by a bit sequence.
Example
struct StructureWithExternalField
{
bit:3 numberA;
extern blob;
bit:7 numberC;
};
Since
1.4.0
Bytes type is a zserio built-in type for raw bytes sequences. It is denoted by zserio keyword bytes
. It is
represented by a number of bytes followed by a byte sequence.
Example
struct StructureWithBytesField
{
string text;
bytes data;
};
Since
2.9.0
A constant is an immutable named value. The syntax and behavior is similar to C or C++. Their syntax is as follows:
const built-in-type NAME = literal;
Example
const bit:1 FALSE = 0;
const bit:1 TRUE = 1;
const int16 i = 1234;
const int32 j = -5678;
An enumeration type has a base type which is an integer type or a bit field type. The items of an enumeration have a name and a value which may be assigned explicitly or implicitly. An item that does not have an initializer gets assigned the value of its predecessor incremented by 1, or the value 0 if it is the first item.
Example
enum bit:3 Color
{
NONE = 000b,
RED = 010b,
BLUE,
BLACK = 111b
};
In the example above, BLUE
has the value 3. When decoding an item of type Color
, the decoder will read
3 bits from the stream and report an error when the integer value of these 3 bits is not one of 0, 2, 3 or 7.
An enumeration type provides its own lexical scope, similar to Java and dissimilar to C++. The item names
must be unique within each enumeration type, but may be reused in other contexts with different meanings.
Referring to the example, any other enumeration type Foo
may also contain an item named NONE
.
In expressions outside of the defining type, enumeration items must always be prefixed by the type name
and a dot, e.g. Color.NONE
.
The enumeration value represented by integer type can be referenced as valueof(enumeration)
,
see valueof Operator.
Enumeration items can be @deprecated
or @removed
.
A bitmask type has a base type which is an unsigned integer or unsigned bit field type. The values of a bitmask have a name and a value which may be assigned explicitly or implicitly. A value that does not have an initializer gets assigned the value calculated from its predecessor by finding the first unused bit. If the unspecified value is the first one, it will be assigned to 1.
Example
bitmask uint8 Permission
{
EXECUTABLE,
READABLE = 0x02,
WRITABLE
};
In the example above, EXECUTABLE
is auto-assigned to 1, READABLE
is manually assigned to 2 and the
WRITABLE
is assigned by finding the first unused bit to value 4.
A bitmask type provides its own lexical scope. The value names must be unique within each bitmask type.
In expressions outside of the defining type, bitmask values must always be prefixed by the type name and a dot,
e.g. Permission.WRITABLE
. The isset operator is the exception to this rule where
its type name with a dot can be omitted.
Bitmasks support all basic bit operations:
&
bitwise and,|
bitwise or,^
bitwise xor,~
bitwise complement.
Example
bitmask bit:2 Availability
{
VERSION_NUMBER,
VERSION_STRING
};
struct Version(Availability availability)
{
uint32 versionNumber if (availability & Availability.VERSION_NUMBER) == Availability.VERSION_NUMBER;
string versionString if (availability & Availability.VERSION_STRING) == Availability.VERSION_STRING;
};
In the example above, the availability
parameter defines available version formats. versionNumber
and
versionString
are present only if the availability
contains the VERSION_NUMBER
and VERSION_STRING
masks
respectively.
Note that the bitmask can contain a value which is manually assigned to 0 and which represents an empty bitmask (e.g.
NONE = 0
). SuchNONE
mask can be useful in expressions.
The bitmask value represented by integer type can be referenced as valueof(Permission.EXECUTABLE)
,
see valueof Operator.
Since
2.0.0
A structure type is the concatenation of its members. There is no padding or alignment between members.
Example
struct MyStructure
{
bit:4 a;
uint8 b;
bit:4 c;
};
A choice type depends on a selector expression following the on
keyword. Each branch of the choice type is
preceded by one or more case labels with a literal value. After evaluating the selector expression, the decoder
will directly select the branch labeled with a literal value equal to the selector value.
Example
choice VarCoordXY(uint8 width) on width
{
case 8: bit:8 coord8;
case 16: bit:16 coord16;
case 24: bit:24 coord24;
case 32: bit:32 coord32;
};
In the example above, the selector expression refers to a parameter width
of a uint8
type (see
Parameterized Types).
A given branch of a choice may have more than one case label. In this case, the branch is selected when the selector value is equal to any of the case label values. A choice type may have a default branch which is selected when no case label matches the selector value. The decoder will throw an exception when there is no default branch and the selector does not match any case label. Any branch, including the default branch, may be empty, with a terminating semicolon directly following the label. It is good practice to insert a comment in this case. When the selector expression has an enumeration type, the enumeration type prefix may be omitted from the case label literals.
Example
choice AreaAttributes(AreaType type) on type
{
case AreaType.COUNTRY: // prefix "AreaType." is optional
case STATE:
case CITY:
RegionAttributes regionAttr;
case MAP:
/* empty */ ;
case ROAD:
RoadAttributes roadAttr;
default:
DefaultAttributes defaultAttr;
};
An union type corresponds to exactly one of its members, which are also called branches. Union type is an automatic choice type, which automatically handles the selector according to the last used branch (i.e. last called setter). The selector is stored in a bitstream automatically before the union branch data to allow selection of the proper branch during parsing.
When a specific handling of selector is needed (e.g. when the selector is already known in a parent), it might be better to use choice types instead of unions.
Example
union SimpleUnion
{
uint8 value8;
uint16 value16;
};
In this example, the union SimpleUnion
has two branches value8
and value16
. The syntax of a member
definition is the same as in structure types.
Any top-level structure type can have extended members. Extended members can be only at the end of the structure and allows to extend schema while preserves ability to read data created by older schemas (backward compatibility).
Example
struct TopLevelBlob
{
uint32 data[];
extend ExtendedData extendedData;
};
struct ExtendedData
{
varint additionalData[];
};
Extended member is defined by extend
keyword which must be before the field definition. Extended member does
not have any additional footprint in the bitstream. The only difference is that generators know that the parsed
bitstream can terminate before any of the extended field.
Multiple extended fields are allowed but once used, all other consecutive fields must also be marked as extended. It's up to user's responsibility that extended fields are not placed before fields which were extended in previous versions.
Example
struct TopLevelBlob
{
uint32 data[]; // last field in version 1
extend ExtendedData extendedData; // added in version 2
extend bool additionalFlag; // added in version 3
};
struct ExtendedData
{
varint additionalData[];
};
Since
2.12.0
A constraint may be specified for any member of a compound type. After decoding a member with a constraint, the decoder checks the constraint and reports an error if the constraint is not satisfied.
Example
struct GraphicControlExtension
{
uint8 byteCount : byteCount == 4;
uint8 blockTerminator : blockTerminator == 0;
};
choice ChoiceConstraints(bool selector) on selector
{
case true:
uint8 value8 : value8 != 0;
case false:
uint16 value16 : value16 > 255;
};
Because constraint is a boolean expression, the following example is valid:
Example
struct TestStructure
{
bool isValueValid;
uint16 value : isValueValid;
};
A structure type may have optional members.
Example
struct ItemCount
{
uint8 count8;
uint16 count16 if count8 == 0xFF;
};
An optional member has an if
clause with a boolean expression. The member will be decoded only if
the expression evaluates to true at run-time.
Because optional member has an if
clause with a boolean expression, the following example is valid:
Example
struct TestStructure
{
bool hasValue;
uint16 value if hasValue;
};
An optional member can be defined without if
clause. In this case, a keyword optional
must be used before
the field definition:
Example
struct Container
{
int32 nonOptionalInt;
optional int32 autoOptionalInt;
};
An optional member defined by the keyword optional
will be decoded only if the member has been set.
A compound type may contain functions:
Example
struct ItemCount
{
uint8 count8;
uint16 count16 if count8 == 0xFF;
function uint16 getValue()
{
return (count8 == 0xFF) ? count16 : count8;
}
};
The return type of a function has to be a standard integer or compound type, and the function parameter list must be empty. The function body may contain nothing but a return statement with an expression matching the return type.
Functions are intended to provide no more than simple expression semantics. There are no plans to add more advanced type conversion or even procedural logic to zserio.
A structure type may contain default values for fields which are not arrays or compound types:
Example
enum uint8 BasicColor
{
BLACK,
WHITE,
RED
};
struct StructureDefaultValues
{
bool boolValue = true;
bit:4 bit4Value = 0x0F if boolValue == true;
int16 int16Value = 0x0BEE;
float16 float16Value = 1.23f;
float32 float32Value = 1.234f;
float64 float64Value = 1.2345;
string stringValue = "string";
BasicColor enumValue = BasicColor.BLACK;
};
The default values are used by the encoder if the value of corresponding field has not been set. So, there will be always a value written to the stream. This is in contrast to other serialization mechanisms where the decoders would generate the default value. Reason for this is the missing "wire format" in zserio which would add additional information in the stream for identifying fields and whether they are set or not.
An array type is like a sequence of members of the same type. The element type may be any other type, except an array type. (Two dimensional arrays can be emulated by wrapping the element type in a structure type.)
The length of an array is the number of elements, which may be fixed (i.e. set at compile time) or variable (set at runtime). The elements of an array have indices ranging from 0 to n-1, where n is the array length.
The notation for array types and elements is similar to C:
Example
struct ArrayExample
{
uint8 header[256];
int16 numItems;
Element list[numItems];
};
Field header
is a fixed-length array of 256 bytes. Field list
is an array with n elements, where n is
the value of numItems
. Individual array elements may be referenced in expressions with the usual index
notation, e.g. list[2]
is the third element of the list
array.
An array type may have an implicit length indicated by an implicit
keyword and an empty pair of brackets. Implicit arrays must be at the end of the BLOB. It might also be the complete BLOB.
For encoding details see encoding guide.
Example
struct ImplicitArray
{
implicit Element list[];
};
The length of the list
array can be referenced as lengthof(list)
, see
lengthof Operator.
Deprecated since
2.4.0
An array type may have an automatic length indicated by an empty pair of brackets. In this case, the encoder will automatically store the array length into the bit stream.
Example
struct AutoArray
{
Element list[];
};
The length of the list
array can be referenced as lengthof(list)
, see
lengthof Operator.
Auto length arrays might be particularly useful if variable array length expression is a single field:
Example
struct AutoArrayCandidate
{
uint32 numElements;
Element list[numElements];
};
An array type can be packed indicated by a packed
keyword. The only supported packing algorithm is
delta compression which is applied for packed arrays of:
- all integer types (uint32, varuint, ...)
- enumeration types
- bitmask types
- compound types which contain integer types, enumeration, bitmask types or inner packable arrays
- all union types (even if do not contain any packable fields because selector is always packable integer type)
Example
struct PackedArrays
{
packed uint8 header[256];
packed int16 items[];
uint16 numItems;
packed Element elements[numItems];
};
Note: Packed implicit length arrays and packed arrays which are using as offsets are forbidden.
Note: Fields which are used as offsets will be never packed because it would break offsets feature.
Since
2.4.0
The align(n)
modifier can be used to force the decoder to skip 0..n-1
bits so that the bit offset from
the beginning of the stream is divisible by n
. n
may be any integer literal.
Alignment modifiers may be used in any structure type:
Example
struct AlignmentExample
{
bit:11 a;
align(32):
uint32 b;
};
The size of the AlignmentExample
type is 64 bits. Without the alignment modifier, the size would be 43 bits.
If a member with alignment is optional, alignment is optional as well:
Example
struct Example
{
bool hasOptional;
align(32):
int32 myOptionalField if hasOptional == true;
int32 myField;
};
In the above code example, if the member is optional (hasOptional == false
) then no align(32)
will be
executed and the size will be 33 bits.
The name of a member of integral type may be used as an offset on another member to indicate its byte offset from the beginning of the stream:
Example
struct Tile
{
TileHeader header;
uint32 stringOffset;
uint16 numFeatures;
stringOffset:
StringTable stringTable;
};
In this example, offset indicates that the value of stringOffset
contains the byte offset of member
stringTable
from the beginning of the stream.
Since offsets always refer to byte offsets, a given member within a structure type cannot have an offset if it is not guaranteed to be byte-aligned. To overcome this restriction, a byte alignment is inserted automatically:
Example
struct Tile
{
TileHeader header;
uint32 stringOffset;
uint16 numBits;
bit:1 bits[numBits];
stringOffset: // also implies an align(8)
StringTable stringTable;
};
If the member with offset is optional, the offset is optional as well:
Example
struct Example
{
uint32 byteOffset;
bool hasOptional;
byteOffset:
int32 myOptionalField if hasOptional == true;
int32 myField;
};
In the code example above, if the member is optional (hasOptional == false
) then no offset will be checked
and the size will be 65 bits.
Fields which are used as offsets cannot be used in expressions.
Each field can be used only once as an offset.
When all elements in an array should have offsets, a special notation can be used:
Example
struct IndexedInt32Array
{
uint32 offsets[10];
bit:1 spacer;
offsets[@index]:
int32 data[10];
};
In this example, @index
denotes the current index of the data array
. The use of this expression
in the array of offsets
indicates that the i-th element of the array offsets contains a byte offset
of i-th element of member data
calculated from the beginning of the stream.
Since offsets can refer only to byte offsets, each element of an array with indexed offset is automatically byte-aligned:
Example
struct IndexedBit5Array
{
uint32 offsets[2];
bit:1 spacer;
offsets[@index]: // implies align(8) before each data[i]
bit:5 data[2];
};
Fields which are used as indexed offsets cannot be used in expressions.
Each field can be used only once as an indexed offset.
The semantics of expression and the precedence rules for operators is the same as in Java, except where stated otherwise. Zserio has a number of special operators which will be explained in detail below.
The following Java operators have no counterpart in zserio: ++
, --
, >>>
and instanceof
.
The negation operator !
is defined for boolean expressions.
For integer expressions, there are +
(unary plus) and -
(unary minus).
The bitwise complement ~
is defined for integer and bitmask expressions.
The lengthof
operator may be applied to array members, bytes types or strings and returns the actual length
(i.e. number of elements of an array). Thus, given int32 a[5]
, the expression lengthof
a evaluates to 5
.
This is not particularly useful for fixed or variable length arrays, but it is the only way to refer to
the length of an auto length array.
The
lengthof
function returns the number of bytes in a string, encoded using UTF-8. For example, the length of the string "€" is3
, because it is encoded as three bytes in UTF-8.
Example
struct LengthOfOperator
{
uint8 autoArray[];
function uint32 getLengthOfAutoArray()
{
return lengthof(autoArray);
}
};
The valueof
operator may be applied to an enumeration or bitmask type and returns its actual value as
an integer value.
Example
struct ValueOfOperator
{
Color color;
function uint8 getValueOfColor()
{
return valueof(color);
}
};
enum uint8 Color
{
WHITE = 1,
BLACK = 2
};
The numbits(numValues)
operator is defined for unsigned integers as minimum number of bits required to encode
numValues
different values. The returned number is of type uint8
. The numbits
operator returns 0
if
applied to value 0
.
The following table shows the results of the numbits
operator applied some common values:
numbits(0) = 0
numbits(1) = 1
numbits(2) = 1
numbits(3) = 2
numbits(4) = 2
numbits(8) = 3
numbits(16) = 4
Example
struct NumBitsOperator
{
uint8 value8;
function uint8 getNumBits8()
{
return numbits(value8);
}
};
The integer arithmetic operations include +
(addition), -
(subtraction), *
(multiplication),
/
(division), %
(modulo). In addition, zserio also supports shift operators <<
and >>
.
The following relational operators for integer expressions are supported: ==
(equal to),
!=
(not equal to), <
(less than), <=
(less than or equal), >
(greater than),
>=
(greater than or equal).
The equality operators ==
and !=
may be applied to any type.
The boolean operators &&
(and) and ||
(or) may be applied to boolean expressions.
The bit operators &
(bitwise and), |
(bitwise or), ^
(bitwise exclusive or) may be applied to integer
types.
The postfix operators include []
(array index), ()
(instantiation with argument list or function call) and
.
(member access).
Operator isset
accepts bitmask and bitmask value. It returns true if bitmask contains given bitmask
value, i.e. if bitwise and (&
) of bitmask and bitmask value is equal to bitmask value.
Example
bitmask uint8 TestBitmask
{
INT,
STRING
};
struct IsSetOperator
{
TestBitmask testBitmask;
function bool hasInt()
{
return isset(testBitmask, INT);
}
function bool hasString()
{
return isset(testBitmask, TestBitmask.STRING);
}
function bool hasBoth()
{
return isset(testBitmask, INT | STRING);
}
};
The isset
operator in previous example can be rewritten using bitwise and operator with the completely
same behavior:
struct IsSetOperator
{
TestBitmask testBitmask;
function bool hasInt()
{
return ((testBitmask & TestBitmask.INT) == TestBitmask.INT);
}
function bool hasString()
{
return ((testBitmask & TestBitmask.STRING) == TestBitmask.STRING);
}
function bool hasBoth()
{
return ((testBitmask & (TestBitmask.INT | TestBitmask.STRING)) ==
(TestBitmask.INT | TestBitmask.STRING));
}
};
Note that the bitmask type used in the
isset
operator can be deduced, so it's allowed to discard the type name and a dot, e.g.isset(testBitmask, INT)
instead ofisset(testBitmask, TestBitmask.INT)
.
Since
2.8.0
A conditional expression booleanExpr ? expr1 : expr2
has the value of expr1
when booleanExpr
is true.
Otherwise, it has the value of expr2
.
In the following list, operators are grouped by precedence in descending order. Operators on the line have the highest precedence and are evaluated first. All operators on the same line have the same precedence and are evaluated left to right, except ternary operator which are evaluated right to left.
()
,[]
,.
lengthof
valueof
numbits
- unary
+
-
~
!
*
/
%
+
-
<<
>>
<
>
<=
>=
==
!=
&
^
|
&&
||
?
:
The dot operator can be used to access a member of a compound type.
The expression f.m
is valid if
f
is a field of a compound typeC
- The type
T
off
is a compound type T
has a member namedm
The value of the expression f.m
can be evaluated at runtime only if the member f
has been evaluated before.
Example
struct Header
{
uint16 version;
uint16 numSentences;
};
struct Message
{
Header header;
string sentences[header.numSentences];
};
Within the scope of the Message
type, header
refers to the field of type Header
, and
header.numSentences
is a member of that type.
The definition of a compound type may be augmented with a parameter list, similar to a parameter list in a Java method declaration. Each item of the parameter list has a type and a name. Within the body of the compound type definition, parameter names may be used as expressions of the corresponding type.
To use a parameterized type as a field type in another compound type, the parameterized type must be instantiated with an argument list matching the types of the parameter list.
Example
struct Header
{
uint32 version;
uint16 numItems;
};
struct Message
{
Header header;
Item(header) items[header.numItems];
};
struct Item(Header header)
{
uint16 param;
uint32 ExtraParam if header.version >= 10;
};
When the element type of an array is parameterized, a special notation can be used to pass different arguments to each element of the array:
Example
struct Database
{
uint16 numBlocks;
BlockHeader headers[numBlocks];
Block(headers[@index]) blocks[numBlocks];
};
struct BlockHeader
{
uint16 numItems;
uint32 offset;
};
struct Block(BlockHeader header)
{
header.offset:
int64 items[header.numItems];
};
The @index
denotes the current index of the blocks
array. The use of this expression in the argument list
for the Block
reference indicates that the i
-th element of the blocks
array is of type Block
instantiated with the i
-th header headers[i]
.
A subtype definition defines a new name for a given type. This is rather like a typedef
command in C:
Example
subtype uint16 BlockIndex;
struct Block
{
BlockIndex blockIndex;
uint32 data;
};
There are two types of comments in zserio language:
- Standard Comments
- Documentation Comments
Standard comments are intended only as the comments of zserio schema and are filtered out during compilation. Thus, generators have no access to these comments.
Documentation comments are not filtered out during compilation and they can be used by extensions (e.g. for generation of zserio schema documentation or for documentation of generated code). These comments might be associated to the subsequent type or field definition.
Documentation comments are further divided to two types:
- Classic Comments
- Markdown Comments
Zserio supports the standard comment syntax of Java or C++. Single line comments start with //
and extend
to the end of the line. A comments starting with /*
is terminated by the next occurrence of */
, which may
or may not be on the same line.
Example
// This is a single-line comment.
/* This is an example
of a multi-line comment
spanning three lines. */
To support generation of documentation, multi-line comments starting with /**
are treated as special
classic documentation comments. The idea and syntax is borrowed from Java(doc). A classic documentation comment
is associated to the following type or field definition. The classic documentation comment and the corresponding
definition may only be separated by whitespace.
Example
/**
* Traffic flow on roads.
*/
enum bit:2 Direction
{
/** No traffic flow allowed. */
NONE,
/** Traffic allowed from start to end node. */
POSITIVE,
/** Traffic allowed from end to start node. */
NEGATIVE,
/** Traffic allowed in both directions. */
BOTH
};
The classic documentation comments can contain special tags which is shown by the following example:
/**
* The tile contains a number of different elements
* grouped by feature classes, e.g. intersections, roads and
* so on...
*
* The presence of these members is indicated by the content
* mask in the header (please have a look at the
* following @see "documentation" headerDefinition page).
*
* @see headerDefinition
*
* @param level level number
* @param width width for the current tile
*
* @todo Update this comment.
*
* @deprecated
*/
The content of a classic documentation comment, excluding its delimiters, is parsed line by line. Each line is
stripped of leading whitespace, a sequence of asterisks (*
), and more whitespace, if present. After stripping,
a comment is composed of one or more paragraphs, followed by zero or more tag blocks. Paragraphs are separated
by blank lines. The text in paragraphs can contain HTML formatting tags like <ul>
, <li>
or </br>
directly.
A line starting with whitespace and a keyword preceded by an at-sign (@
) is the beginning of a tag.
The following sections describe all supported tags in the classic documentation comments in detail.
The see
tag defines the link in generated documentation and has the following format:
@see "TEXT_ALIAS" TYPE.FIELD
The TEXT_ALIAS
is text which will be shown in the generated documentation instead of the reference
TYPE.FIELD
. This alias text is optional and can be omitted.
The TYPE.FIELD
must be the valid reference to the field of the zserio type. The FIELD
definition is
optional and can be omitted.
The see
tag is the only tag which does not have to defined at the beginning of the line and can be embedded
directly in the comment text.
Example
/**
* Please see @see "black color" ColorEnumerationType.BLACK definition for more description.
*/
The param
tag is used for documenting the arguments of a parameterized type. This tag has the following
format:
@param PARAM_NAME PARAM_DESCRIPTION
The PARAM_NAME
defines the parameter name.
The PARAM_DESCRIPTION
contains the parameter description. This description can be defined on multiple lines.
Example
/**
* This type takes two arguments.
*
* @param arg1 The first argument.
* @param arg2 The second argument.
*/
struct ParamType(Foo arg1, Bar arg2)
{
...
};
The todo
tag is used for documenting the action which should be done in the future. This tag has
the following format:
@todo ACTION_DESCRIPTION
The ACTION_DESCRIPTION
contains arbitrary text. This text can be defined on multiple lines.
Example
/**
* The text of the comment.
*
* @todo Don't forget to update this comment!
*/
This tag assigns the documented zserio type as deprecated which means that this type is going to be invalid in future versions of the schema. It has the following format:
@deprecated
To support generation of documentation, multi-line comments starting with /*!
are treated as special
markdown documentation comments. It's recommended to terminate markdown comments with !*/
and Zserio
fires a warning when the comment is terminated differently.
Example
/*!
# Title for this section
The tile contains a number of different elements grouped by feature classes, e.g. intersections,
roads and so on...
This module defined the following structure:
Name | Description
------------| ----------------------------------------------
Structure1 | This describes Structure1 in detail.
Structure2 | This describes Structure2 in detail.
!*/
If whole markdown documentation comment is indented according to the starting /*!
which is single at separate
line, it is rendered without indentation.
Example
/*!
# Title for this section
The tile contains a number of different elements grouped by feature classes, e.g. intersections,
roads and so on...
This module defined the following structure:
Name | Description
------------| ----------------------------------------------
Structure1 | This describes Structure1 in detail.
Structure2 | This describes Structure2 in detail.
!*/
Since
2.1.0
Zserio also provides a way how to organize semantic restrictions for the schema which cannot be easily defined
using constraints. Rules are defined within rule groups and are described using
documentation comments. Each rule have it's own identifier which is standard Zserio identifier
extended by -
, :
and .
characters. The rule ID must be unique globally (using case insensitive
comparison) within all Zserio packages.
Example
rule_group SchemaRules
{
/*!
Description of the semantic rule.
![Rule 01](../assets/rule-01-picture.png)
!*/
rule "rule-01";
/*!
Another rule.
!*/
rule "rule-02";
};
Since
2.4.0
Complex zserio specifications should be split into multiple packages stored in separate source files. Every user-defined type belongs to a unique package. For backward compatibility, there is an unnamed default package used for files without an explicit package declaration. It is strongly recommended to use a package declaration in each zserio source file.
A package provides a lexical scope for types. Type names must be unique within a package, but a given type name
may be defined in more than one package. If a type named Coordinate
is defined in package com.acme.foo
, the
type can be globally identified by its fully qualified name com.acme.foo.Coordinate
, which is obtained by
prefixing the type name with the name of the defining package, joined by a dot. Another package
com.acme.bar
may also define a type named Coordinate
, having the fully qualified name
com.acme.bar.Coordinate
.
By default, types from other packages are not visible in the current package, unless they are imported explicitly. The package and import syntax and semantics follow the Java example.
Example
package map;
import common.geometry.*;
import common.featuretypes.*;
Import declarations only have any effect when there is a reference to a type or symbol not defined in the
current package. If package map defines its own Coordinate
type, any reference to that within package map
will be resolved to the local type map.Coordinate
, even when one or more of the imported packages also define
a type named Coordinate
.
On the other hand, if package map
references a Coordinate
type but does not define it, the import
declarations are used to resolve that type in one of the imported packages. In that case, the referenced type
must be matched by exactly one of the imported packages. It is obviously a semantic error if the type name is
defined in none of the packages. It is also an error if the type name is defined in two or more of the imported
packages. The order of the import declarations does not matter.
It is always possible to use the fully qualified name of a type, e.g. com.acme.bar.Coordinate
. This makes it
possible to import a type with the same name (e.g. Coordinate
) from more than one package or to import a type
with the same name as a type defined locally.
Individual types or symbols can be imported using their fully qualified name:
import common.geometry.Geometry;
This single import has precedence over any wildcard import. It prevents an ambiguity with
common.featuretypes.Geometry
. It is possible to import the same type from different packages but then
each usage of such type must be fully qualified. Using the unqualified name in this situation results in
a compilation error as the type is ambiguous.
Package and file names are closely related. Each package must be located in a separate file. The above example
declares a package map
stored in a source file map.zs
. The import declarations direct the parser to locate
and parse source files common/geometry.zs
and common/featuretypes.zs
.
Imported files may again contain import declarations. Cyclic import relations between packages are supported but should be avoided. The zserio parser takes care to parse each source file just once.
Although basic zserio language features provides a strong tool for binary data modeling, there might be situations when reducing duplications using generic programming would bring high benefits for the users.
The basic stone of generic programming in zserio are templates that allow zserio compound types (structure types, choice types, union types) to operate with generic types. Such generic types are called template parameters and they will be specified later during template instantiation as template arguments.
Because zserio must check correctness of all template instantiations and because zserio should support generators to almost any kind of programming language (even to language which does not support templates), the template instantiations must be resolved during zserio compilation.
Zserio templates got inspiration from C++ class templates concept and borrowed syntax from Java generics.
The compound type template is defined by normal compound type declaration together with template parameters
denoted by signs <>
:
Example
struct Field<T>
{
T value;
};
This example defines structure type template with one template parameter called T
. The template
parameter can be used anywhere as a generic type within lexical scope of the structure. This generic type will
be specified during template instantiation as follows:
Example
struct StructTemplatedField
{
Field<uint32> uint32Field;
};
This example instantiates the template Field
using argument uint32
. From the zserio language point of view,
the previous structure type template example is actually the same as the following "no-generics-programming"
example:
Example
struct Field_uint32
{
uint32 value;
};
struct StructTemplatedField
{
Field_uint32 uint32Field;
};
Parameterized types templates are supported with the very similar syntax. The parameterized types parameters can be template instantiations as well. The following example shows intuitive syntax for parameterized type templates:
Example
struct ParamHolder<T>
{
T param;
};
struct Parameterized<T>(ParamHolder<T> paramHolder)
{
string description;
paramHolder.param:
uint32 id;
};
struct StructTemplatedTypeArgument
{
ParamHolder<uint32> paramHolder;
Parameterized<uint32>(paramHolder) parameterized;
};
Templated template arguments are supported sa well. In another words, template arguments can be another template instantiations. The following example shows syntax for templated template arguments:
Example
struct Field<T>
{
T value;
};
struct Compound<T>
{
T value;
};
struct StructTemplatedTemplateArgument
{
Field<Compound<uint32>> compoundField;
};
Subtypes can be used for aliasing template types similarly to normal zserio types. The following example shows the usage:
Example
struct TestStructure<T>
{
T value;
};
subtype TestStructure<uint32> TestStructureSubtype;
The zserio template instantiations are generated by default in the package where the corresponding template is declared. This ensures that only one template instantiation is generated if it is used multiple times from different packages. This default behavior is necessary because template instantiations can be used as parameters across different packages.
However, there might be the situations where forcing package of the template instantiation would be beneficial.
Because of that, a new keyword instantiate
has been added to the zserio language.
Example
package template_declaration;
struct TestStructure<T>
{
T value;
};
package template_instantiation;
import template_declaration.*;
instantiate TestStructure<uint32> TestStructure32;
The instantiate
command from previous example means
- that the
TestStructure<uint32>
instantiation will be generated in the packagetemplate_instantiation
instead oftemplate_declaration
package and - that the
TestStructure<uint32>
instantiation will be named asTestStructure32
and - that new zserio type
TestStructure32
will be defined in the packagetemplate_instantiation
.
The instantiate types can be imported as other normal types. Thus, all imported instantiate types are visible in the package.
It's forbidden to have instantiate
command (specified or imported) to the same template instantiation
multiple times in the same package.
However, it's allowed to do so in the different packages (which are not imported). In this case, the same
template will be instantiated as different type in the other package. Because of that, the instantiate
command should be considered very carefully and used only if it's really necessary.
Since
1.4.0
Binary data streams defined by zserio are also good candidates to be used in RPC systems. Zserio introduces generic services directly in the language. A service type contains definitions of service methods.
Example
struct UserId
{
uint32 id;
};
struct User
{
uint32 id;
string name;
string surname;
string phoneNumber;
};
service Users
{
User getUser(UserId);
};
A service method must have a single response and single request type. When no response or request type is needed, an empty structure can be used. Currently only simple unary calls are supported by zserio.
The types must be non-parameterized compound types or bytes
. Parameterized types are not allowed since
the parameters are not stored in the bit stream. However parameterized types can be still used in the response
or request types' subtree. The bytes
type can be used when raw data format is needed - e.g. when the response
is an image.
Since
2.0.0
Aside of services, zserio also supports publish-subscribe messaging pattern. The pubsub type defines a Pub/Sub client.
Example
pubsub WeatherProvider
{
publish topic("weather/warnings") WeatherWarning warnings;
};
pubsub WeatherClient
{
subscribe topic("weather/warnings") WeatherWarning weatherWarnings;
};
struct WeatherWarning
{
string warningMessage;
};
The pubsub defines messages which can be either published or subscribed (or both). The example above defines a
WeatherProvider
pubsub type which defines a single message warnings
, which is published under the topic
named "weather/warnings" and the type of the message is WeatherWarnings
structure. Then it defines
a WeatherClient
pubsub type which has a single subscription weatherWarnings
for messages published under
topic "weather/warnings". The client expects that the type of messages arriving to weatherWarnings
subscription (i.e. published under the defined topic) are of the type WeatherWarning
.
Pubsub type can define a message in three ways:
topic("topic/definition") Type message
to both publish and subscribe a message,publish topic("topic/definition") Type message
to publish a message,subscribe topic("topic/definition") Type message
to subscribe a message.
In the Pub/Sub pattern, it is common to use wildcards for topic definitions in subscriptions. The wildcards format depends on a particular implementation. Zserio only provides a generic definition of Pub/Sub clients and doesn't manipulate with the topic definition string. It therefore depends on the particular Pub/Sub backend whether the wildcards are supported and how. See the MQTT standard as an example of a concrete Pub/Sub pattern specification.
Message type must be a non-parameterized compound type or bytes
. Parameterized types are not allowed since
the parameters are not stored in the bit stream. However parameterized types can be still used in the types'
subtree. The bytes
type can be used when raw data format is needed - e.g. when the message is an image.
Since
2.0.0
With its basic language features presented in the previous sections, zserio provides a rich language for modeling binary data streams, which are intended to be parsed sequentially. Direct access to members in the stream is usually not possible, except for specifying the offset of a given member. Navigation between semantically related members at different positions in the stream cannot be expressed at the stream level. Member insertions or updates are not supported.
All in all, the stream model is not an adequate approach for updatable databases in the gigabyte size range with lots of internal cross-references where fast access to individual members is required. In a desktop or server environment, it would be a natural approach to model such a database as a relational database using SQL. However, in an embedded environment with limited storage space and processing resources, a full-fledged relational schema is too heavy-weight. To have the best of both worlds, i.e. compact storage on the one hand and direct member access including updates on the other hand, one can adopt a hybrid data model: In this hybrid model, the high-level access structures are strictly relational, but most of the low-level data are stored in binary large objects (BLOBs), where the internal structure of each BLOB is modeled with zserio.
For example, we can model a digital map database as a collection of tiles resulting from a rectangular grid where the tiles are numbered row-wise. The database has a rather trivial schema:
CREATE TABLE europe (tileNum INTEGER PRIMARY KEY, tile BLOB NOT NULL);
Accessing or updating any given tile can simply be delegated to the relational DBMS, in case of this zserio extension, SQLite.
Assuming that the tile BLOBs have a reasonable size, each tile can be decoded on the fly to access the individual members within the tile. For seamless modelling of this hybrid approach, we decided to add relational extensions for SQLite to zserio. Some SQL concepts have been translated to zserio, others are transparent to zserio and can be embedded as literal strings to be passed to the SQLite engine.
An SQL table type is a special case of a compound type, where the members of the type correspond to the columns of a relational table.
In zserio it is possible to express the above example as follows:
Example
sql_table GeoMap
{
int32 tileId sql "PRIMARY KEY NOT NULL";
Tile tile sql "NOT NULL";
};
GeoMap europe;
GeoMap america;
It is important to note that the GeoMap
is a table type and not a table. A table is defined by the instance
europe
of type GeoMap
. Table types have no direct equivalent in SQL. They can be used to create tables
with identical structure and column names. Each instance of an sql_table
type in zserio translates to an
SQLite SQL table where the table name in the SQL schema is equal to the instance name in zserio. A member
definition may include an SQL constraint introduced by the keyword sql
, followed by a literal string which is
then passed to the SQLite engine.
Thus, the zserio instance america
results in the following SQL table:
CREATE TABLE america (tileNum INT PRIMARY KEY NOT NULL, tile BLOB NOT NULL);
It is also possible to use the zserio keyword sql
directly inside the table definition. The main use for this
syntax is to define a primary key spanning multiple fields.
Example
sql_table BusinessLocationTable
{
BusinessId businessId sql "NOT NULL";
CategoryId catId sql "NOT NULL";
Position position sql "UNIQUE";
int8 hasIcon;
sql "PRIMARY KEY(businessId, catId)";
};
SQL table types can be templated using the same syntax as other zserio compound types, see templates.
For the mapping of zserio types to SQL types, refer to SQL Types Mapping.
Virtual tables in zserio are an extension to the sql_table
. The following paragraph gives a definition
of a virtual table from the SQLite website:
A virtual table is an interface to an external storage or computation engine that appears to be a table but does not actually store information in the database file. In general, you can do anything with a virtual table that can be done with an ordinary table, except that you cannot create indices or triggers on a virtual table. Some virtual table implementations might impose additional restrictions. For example, many virtual tables are read-only.
The syntax is modeled as an extension to the sql_table
, where an optional module can be specified:
sql_table <tablename> using <modulename>
The following example creates a virtual table using SQLite's FTS5 module:
sql_table Pages using fts5
{
string title;
string body;
};
The following example creates a virtual table using the RTREE module:
sql_table TestTable using rtree
{
int32 id;
int32 minX;
int32 maxX;
int32 minY;
int32 maxY;
};
When Virtual tables are used in zserio some columns are automatically defined by the module used after the
using
keyword. The keyword sql_virtual
allows to add generated columns to documentation (HTML) so it will
be possible to find what type the column belongs or to what features it references. There will be no code
generated for these columns.
The table generation code will not contain these columns. This keyword is not limited to virtual tables only but it makes the most sense to just use it in virtual tables.
The syntax is modeled as an extension to the sql_table
column definition:
sql_virtual <type name> <column name>;
The following example creates a virtual table using the SQLite FTS5 module. It will omit to create the content column but allow to read or write to it.
Example
sql_table Pages using fts5
{
sql_virtual string content;
};
When a sql_table
member is an instance of a parameterized type, the application may want to derive
the parameter values from the context (e.g. other table columns), which is not available to the zserio decoder.
In this case, the type arguments shall be marked with the keyword explicit
to indicate that these values will
be set explicitly be the application. Otherwise, the decoder would complain about not being able to evaluate
the type arguments.
Example
struct Tile(uint8 level, uint8 width)
{
...
};
sql_table TileTable
{
uint32 tileId;
uint32 version;
Tile(explicit level, explicit width) tile;
};
To support the WITHOUT ROWID
optimization in SQLite, the sql_without_rowid
keyword is used in zserio.
A sql_without_rowid
keyword is always a part of the sql_table
type:
Example
sql_table WithoutRowIdTable
{
string word sql "PRIMARY KEY NOT NULL";
uint32 count;
sql_without_rowid;
};
A sql_without_rowid
keyword must be defined after all possible fields and SQL constraints inside the SQL
table.
A sql_without_rowid
keyword specified in sql_table
type causes to create a corresponding SQLite table
with omitting the special "rowid" column. This may bring space and performance advantages.
Creating a SQL table using the WITHOUT ROWID
optimization without specifying the primary key is considered
a compilation error.
Since an SQL table is always contained in an SQL database, we introduce a sql_database
type in zserio to
model databases. sql_table
instances may only be created as members of an sql_database
.
Example
sql_table GeoMap
{
// see above
};
sql_database TheWorld
{
GeoMap europe;
GeoMap america;
...
..
};
Zserio type | SQLite type |
---|---|
uint8, uint16, uint32, uint64 | INTEGER |
int8, int16, int32, int64 | INTEGER |
bit:n (n < 64) | INTEGER |
int:n (n <= 64) | INTEGER |
float16, float32, float64 | REAL |
varuint16, varuint32, varuint64, varuint | INTEGER |
varint16, varint32, varint64, varint | INTEGER |
bool | INTEGER |
string | TEXT |
enum | INTEGER |
bitmask | INTEGER |
struct | BLOB |
choice | BLOB |
union | BLOB |
Zserio annotations are syntactic metadata that can be added to Zserio sources. Annotations usually have no
direct effect but can be used by generators to modify generated code appropriately. Annotations starts with @
and are placed before a language element.
Annotation @deprecated
denotes that the language element is being deprecated and should not be used anymore.
Generators are allowed to generate a code which will fire a warning when an appropriate generated element is
used by an application.
enum uint8 Traffic
{
NONE = 0,
@deprecated HEAVY,
LIGHT
MID
};
Since
2.12.0
(supported only by enumeration items).
Annotation @removed
denotes that the language element was removed from the schema and is left there only for
the sake of backward compatibility. Generators are allowed to omit (or hide) an appropriate generated element
in a way that it will be still possible to read old BLOBS, while it should be forbidden to use such elements
during serialization.
enum uint8 Traffic
{
NONE = 0,
@removed HEAVY,
LIGHT
MID
};
Since
2.12.0
(supported only by enumeration items).
Although zserio tries to stay compatible with older versions, it may happen that it's necessary to
break the schema compatibility in order to improve the language or some feature. For these cases zserio provides
special language directive zserio_compatibility_version
which takes the requested version string as an
argument. The directive must be first command (except comments) in the source file.
zserio_compatibility_version("2.4.2");
package compatibility;
When the zserio_compatibility_version
directive is specified, zserio will check that all used features are
still available and fire an error in case that the current version of language is not compatible anymore.
Furthermore, the extensions can apply additional compatibility checks (e.g. that the generated code still produce binary compatible BLOBs).
Since
2.5.0
Zserio is based on the work of Godmar Back and was called DataScript at that time.
His work was taken up by the members of the Navigation Data Standard Association (an industry consortium of companies from the digital maps business) and had been developed internally until 2018.
While Back's reference implementation provided a great start, some language extensions were added to better suit the requirements of the NDS members.
As a major addition to the DataScript language, a relational extension had been introduced, which permits the definition of hybrid data models, where the high-level access structures are implemented by relational tables and indices, whereas the bulk data are stored in single columns as BLOBs with a format defined in DataScript, hence then named Relational DataScript. Since the Relational DataScript was used on top of a SQLite database also some SQLite specific language elements had been added during that time.
By 2018 the NDS consortium decided to open source the work done since they forked off from Godmar Back's reference implementation.
Since the name DataScript already was used by other projects for different purposes and the fact that it has never really been a script language anyhow, a new name needed to be found: zserio. An acronym for zero serialization overhead and pronounced with a silent "s".
The original reference implementation from which we derived zserio was using the BSD 3-clause license. This is the reason why all of the work described above is also released under BSD-3 license (see LICENSE.md file in root directory of this repo).
Copyright remains at Godmar Back and Navigation Data Standard e.V.