This document contains a detailed specification of the zserio schema language. The Zserio Language Overview document is targeted for developers who write zserio schema definitions.
Zserio is a serialization schema language for modeling binary datatypes, bitstreams or file formats. Based on the zserio language it is possible to automatically generate encoders and decoders for a given schema in various target languages (e.g. Java, C++, Python).
Zserio is similar to other serialization mechanism like Google's Protocol Buffers but does not use what is called a "wire-format". Zserio therefore gives full control to the developers and comes with no serialization overhead. It is a WYSIWYG serialization mechanism.
Zserio also features an extension for SQLite databases. With that extension it is possible to use SQLite as a backend store for data defined with the zserio language. SQLite tables, columns and BLOBs can be all described in zserio, giving the developer overall control of the data schema used in SQLite databases.
The zserio syntax for literal values is similar to the Java syntax. There are no character literals, only string literals with the usual escape syntax. Integer literals can use decimal, hexadecimal, octal or binary notation.
Type | Value |
---|---|
Boolean | true , false |
Decimal | 100 , 4711 , 255 , -3 , +2 |
Hexadecimal | 0xCAFEBABE , 0Xff , -0xEF |
Octal | 044 , 0377 , -010 |
Binary | 111b , 110b , 001B , -1010b |
Float16 | 3.14f , 31.4e-1f , 314e-2f |
Float32 | 3.14f , 31.4e-1f , 314e-2f |
Float64 | 3.14 , 0.314e+1 , 0.0314e2 |
String | "You" |
Hexadecimal digits and the x
prefix as well as the b
, e
and 'f' suffixes are case-insensitive.
Signing literals can be defined by -
or +
prefix. Signs are not applicable for string literals.
Zserio supports the following integer built-in types
Sign | Types |
---|---|
unsigned | uint8 , uint16 , uint32 , uint64 |
signed | int8 , int16 , int32 , int64 |
These types correspond to unsigned or signed integers represented as sequences of 8, 16, 32 or 64 bits,
respectively. Negative values are represented in two's complement, i.e. the hex byte FF is 255 as uint8
or -1 as int8
.
The default byte order is big endian. Thus, for multi-byte integers, the most significant byte comes first. Within each byte, the most significant bit comes first.
Example
The byte stream 02 01
(hex) interpreted as int16
has the decimal value 513. As a bit stream, this looks like
0000 0010 0000 0001
. Bit 0 is 0
, bit 15 is 1
.
An unsigned bit field type is denoted by bit:1
, bit:2
, ...
The colon must be followed by a positive integer literal, which indicates the length of the type in bits.
The length should not exceed 63 bits. An unsigned bit field type corresponds to an unsigned integer of the
given length. Thus, bit:16
and uint16
are equivalent. The value range of bit:n
is 0..2n-1.
Unsigned bitfield types of variable length (dynamic unsigned bitfield types) can be specified as bit<expr>
,
where expr
is an expression of integer type to be evaluated at run-time and should not exceed 64 bits.
A signed bit field type is denoted by int:1
, int:2
, ...
The colon must be followed by a positive integer literal, which indicates the length of the type in bits.
The length should not exceed 64 bits. A signed bit field type corresponds to a signed integer of the given
length. Thus, int:16
and int16
are equivalent. The value range of int:n
is -2n-1..2n-1-1.
Signed bitfield types of variable length (dynamic signed bitfield types) can be specified as int<expr>
,
where expr
is an expression of integer type to be evaluated at run-time and should not exceed 64 bits.
Floating point types are modeled after the IEEE 754 specification. The following types are supported:
-
float16
- Half-precision floating-point format stored in 16 bits using 1 bit for the sign, 5 bits for the exponent and 10 bits for the significand. -
float32
- Single-precision floating-point format stored in 32 bits using 1 bit for the sign, 8 bits for the exponent and 23 bits for the significand. -
float64
- Double-precision floating-point format stored in 64 bits using 1 bit for the sign, 11 bits for the exponent and 52 bits for the significand.
Variable integer types store integer values but the number of bytes used is dependent on the actual value stored
in the data type. The supported types are varint16
, varint32
, varint64
and varint
for
the signed values and varuint16
, varuint32
, varuint64
, varuint
and varsize
for the unsigned
signed values. This is a special type of integer that uses only the bytes needed to store the value.
The value ranges of the variable integer types are:
Data Type | Value Range | Max Bytes |
---|---|---|
varint16 | -16383 to 16383 |
2 |
varint32 | -268435455 to 268435455 |
4 |
varint64 | -72057594037927935 to 72057594037927935 |
8 |
varint | -9223372036854775808 to 9223372036854775807 |
9 |
varuint16 | 0 to 32767 |
2 |
varuint32 | 0 to 536870911 |
4 |
varuint64 | 0 to 144115188075855871 |
8 |
varuint | 0 to 18446744073709551615 |
9 |
varsize | 0 to 2147483647 |
5 |
Note that
varint
andvaruint
can handle allint64
anduint64
values respectively.
The internal layout of the variable integer types is:
Data Type | Byte Layout |
---|---|
varint16 | [byte 1]: 1 bit sign, 1 bit has next byte, 6 bits value |
[byte 2]: 8 bits value |
|
varuint16 | [byte 1]: 1 bit has next byte, 7 bits value |
[byte 2]: 8 bits value |
|
varint32 | [byte 1]: 1 bit sign, 1 bit has next byte, 6 bits value |
[byte 2]: 1 bit has next byte, 7 bits value |
|
[byte 3]: 1 bit has next byte, 7 bits value |
|
[byte 4]: 8 bits value |
|
varuint32 | [byte 1]: 1 bit has next byte, 7 bits value |
[byte 2]: 1 bit has next byte, 7 bits value |
|
[byte 3]: 1 bit has next byte, 7 bits value |
|
[byte 4]: 8 bits value |
|
varint64 | [byte 1]: 1 bit sign, 1 bit has next byte, 6 bits value |
[byte 2]: 1 bit has next byte, 7 bits value |
|
[byte 3]: 1 bit has next byte, 7 bits value |
|
[byte 4]: 1 bit has next byte, 7 bits value |
|
[byte 5]: 1 bit has next byte, 7 bits value |
|
[byte 6]: 1 bit has next byte, 7 bits value |
|
[byte 7]: 1 bit has next byte, 7 bits value |
|
[byte 8]: 8 bits value |
|
varuint64 | [byte 1]: 1 bit has next byte, 7 bits value |
[byte 2]: 1 bit has next byte, 7 bits value |
|
[byte 3]: 1 bit has next byte, 7 bits value |
|
[byte 4]: 1 bit has next byte, 7 bits value |
|
[byte 5]: 1 bit has next byte, 7 bits value |
|
[byte 6]: 1 bit has next byte, 7 bits value |
|
[byte 7]: 1 bit has next byte, 7 bits value |
|
[byte 8]: 8 bits value |
|
varint | [byte 1]: 1 bit sign, 1 bit has next byte, 6 bits value |
[byte 2]: 1 bit has next byte, 7 bits value |
|
[byte 3]: 1 bit has next byte, 7 bits value |
|
[byte 4]: 1 bit has next byte, 7 bits value |
|
[byte 5]: 1 bit has next byte, 7 bits value |
|
[byte 6]: 1 bit has next byte, 7 bits value |
|
[byte 7]: 1 bit has next byte, 7 bits value |
|
[byte 8]: 1 bit has next byte, 7 bits value |
|
[byte 9]: 8 bits value |
|
varuint | [byte 1]: 1 bit has next byte, 7 bits value |
[byte 2]: 1 bit has next byte, 7 bits value |
|
[byte 3]: 1 bit has next byte, 7 bits value |
|
[byte 4]: 1 bit has next byte, 7 bits value |
|
[byte 5]: 1 bit has next byte, 7 bits value |
|
[byte 6]: 1 bit has next byte, 7 bits value |
|
[byte 7]: 1 bit has next byte, 7 bits value |
|
[byte 8]: 1 bit has next byte, 7 bits value |
|
[byte 9]: 8 bits value |
|
varsize | [byte 1]: 1 bit has next byte, 7 bits value |
[byte 2]: 1 bit has next byte, 7 bits value |
|
[byte 3]: 1 bit has next byte, 7 bits value |
|
[byte 4]: 1 bit has next byte, 7 bits value |
|
[byte 5]: 8 bits value |
In zserio, booleans are denoted by bool
. A boolean is stored in a single bit. Both true
and false
are
available as built-in keywords that are stored as a 1 or 0, respectively.
Example
struct TestStructure
{
bool hasValue;
int16 value if hasValue == true;
};
A String type is denoted by string
. It is represented by a length field (stored as a varuint64) followed by
a sequence of bytes (8 bits) in UTF-8 encoding. The string type allows a reader to skip the byte sequence since
its length is known upfront.
Example
struct TestStructure
{
string textField;
};
External type is a zserio built-in type which format is not known by zserio. It is handled as arbitrary bit
sequence which is passed to application for further processing. It is denoted by zserio keyword extern
. It is
represented by a number of bits (stored as a varuint64) followed by a bit sequence.
Example
struct StructureWithExternalField
{
bit:3 numberA;
extern blob;
bit:7 numberC;
};
A constant is an immutable named value. The syntax and behavior is similar to C or C++. Their syntax is as follows:
const built-in-type NAME = literal;
Example
const bit:1 FALSE = 0;
const bit:1 TRUE = 1;
const int16 i = 1234;
const int32 j = -5678;
An enumeration type has a base type which is an integer type or a bit field type. The members of an enumeration have a name and a value which may be assigned explicitly or implicitly. A member that does not have an initializer gets assigned the value of its predecessor incremented by 1, or the value 0 if it is the first member.
Example
enum bit:3 Color
{
NONE = 000b,
RED = 010b,
BLUE,
BLACK = 111b
};
In the example above, BLUE
has the value 3. When decoding a member of type Color
, the decoder will read
3 bits from the stream and report an error when the integer value of these 3 bits is not one of 0, 2, 3 or 7.
An enumeration type provides its own lexical scope, similar to Java and dissimilar to C++. The member names
must be unique within each enumeration type, but may be reused in other contexts with different meanings.
Referring to the example, any other enumeration type Foo
may also contain a member named NONE
.
In expressions outside of the defining type, enumeration members must always be prefixed by the type name
and a dot, e.g. Color.NONE
.
The enumeration value represented by integer type can be referenced as valueof(enumeration)
,
see valueof Operator.
A bitmask type has a base type which is an unsigned integer or unsigned bit field type. The members of a bitmask have a name and a value which may be assigned explicitly or implicitly. A member that does not have an initializer gets assigned the value calculated from its predecessor by finding the first unused bit. If the unspecified member is the first one, it will be assigned to 1.
Example
bitmask uint8 Permission
{
EXECUTABLE,
READABLE = 0x02,
WRITABLE
};
In the example above, EXECUTABLE
is auto-assigned to 1, READABLE
is manually assigned to 2 and the
WRITABLE
is assigned by finding the first unsued bit to value 4.
A bitmask type provides its own lexical scope. The member names must be unique within each bitmask type.
In expressions outside of the defining type, bitmask members must alway be prefixed by the type name and a dot,
e.g. Permission.WRITABLE
.
Bitmasks support all basic bit operations:
&
bitwise and,|
bitwise or,^
bitwise xor,~
bitwise complement.
Example
bitmask bit:2 Availability
{
VERSION_NUMBER,
VERSION_STRING
};
struct Version(Availability availability)
{
uint32 versionNumber if (availability & Availability.VERSION_NUMBER) == Availability.VERSION_NUMBER;
string versionString if (availability & Availability.VERSION_STRING) == Availability.VERSION_STRING;
};
In the example above, the availability
parameter defines available version formats. versionNumber
and
versionString
are present only if the availability
contains the VERSION_NUMBER
and VERSION_STRING
masks
respectively.
Note that the bitmask can contain a member which is manually assigned to 0 and which represents an empty bitmask (e.g.
NONE = 0
). SuchNONE
mask can be useful in expressions.
The bitmask value represented by integer type can be referenced as valueof(Permission.EXECUTABLE)
,
see valueof Operator.
A structure type is the concatenation of its members. There is no padding or alignment between members.
Example
struct MyStructure
{
bit:4 a;
uint8 b;
bit:4 c;
};
This type has a total length of 16 bits or 2 bytes. As a bit stream, bits 0-3 correspond to member a
,
bits 4-11 represent an unsigned integer b
, followed by member c
in bits 12-15. Note that member b
overlaps a byte boundary, when the entire type is byte aligned. But MyStructure
may also be embedded into
another type where it may not be byte-aligned.
A choice type depends on a selector expression following the on
keyword. Each branch of the choice type is
preceded by one or more case labels with a literal value. After evaluating the selector expression, the decoder
will directly select the branch labeled with a literal value equal to the selector value.
Example
choice VarCoordXY(uint8 width) on width
{
case 8: CoordXY8 coord8;
case 16: CoordXY16 coord16;
case 24: CoordXY24 coord24;
case 32: CoordXY32 coord32;
};
In the example above, the selector expression refers to a parameter width
of a uint8
type (see
Parameterized Types).
A given branch of a choice may have more than one case label. In this case, the branch is selected when the selector value is equal to any of the case label values. A choice type may have a default branch which is selected when no case label matches the selector value. The decoder will throw an exception when there is no default branch and the selector does not match any case label. Any branch, including the default branch, may be empty, with a terminating semicolon directly following the label. It is good practice to insert a comment in this case. When the selector expression has an enumeration type, the enumeration type prefix may be omitted from the case label literals.
Example
choice AreaAttributes(AreaType type) on type
{
case AreaType.COUNTRY: // prefix "AreaType." is optional
case STATE:
case CITY:
RegionAttributes regionAttr;
case MAP:
/* empty */ ;
case ROAD:
RoadAttributes roadAttr;
default:
DefaultAttributes defaultAttr;
};
An union type corresponds to exactly one of its members, which are also called branches. Union type is an automatic choice type, which automatically handles the selector according to the last used branch (i.e. last called setter). The selector is stored in a bitstream automatically before the union branch data to allow selection of the proper branch during parsing. Position of the selector in bitstream is implementation defined.
When a specific handling of selector is needed (e.g. when the selector is already known in a parent), it might be better to use choice types instead of unions.
Example
union SimpleUnion
{
uint8 value8;
uint16 value16;
};
In this example, the union SimpleUnion
has two branches value8
and value16
. The syntax of a member
definition is the same as in structure types.
A constraint may be specified for any member of a compound type. After decoding a member with a constraint, the decoder checks the constraint and reports an error if the constraint is not satisfied.
Example
struct GraphicControlExtension
{
uint8 byteCount : byteCount == 4;
uint8 blockTerminator : blockTerminator == 0;
};
choice ChoiceConstraints(bool selector) on selector
{
case true:
uint8 value8 : value8 != 0;
case false:
uint16 value16 : value16 > 255;
};
Because constraint is a boolean expression, the following example is valid:
Example
struct TestStructure
{
bool isValueValid;
uint16 value : isValueValid;
};
A structure type may have optional members.
Example
struct ItemCount
{
uint8 count8;
uint16 count16 if count8 == 0xFF;
};
An optional member has an if
clause with a boolean expression. The member will be decoded only if
the expression evaluates to true at run-time.
Because optional member has an if
clause with a boolean expression, the following example is valid:
Example
struct TestStructure
{
bool hasValue;
uint16 value if hasValue;
};
An optional member can be defined without if
clause. In this case, a keyword optional
must be used before
the field definition:
Example
struct Container
{
int32 nonOptionalInt;
optional int32 autoOptionalInt;
};
An optional member defined by the keyword optional
will be decoded only if the member has been set.
A compound type may contain functions:
Example
struct ItemCount
{
uint8 count8;
uint16 count16 if count8 == 0xFF;
function uint16 getValue()
{
return (count8 == 0xFF) ? count16 : count8;
}
};
The return type of a function has to be a standard integer or compound type, and the function parameter list must be empty. The function body may contain nothing but a return statement with an expression matching the return type.
Functions are intended to provide no more than simple expression semantics. There are no plans to add more advanced type conversion or even procedural logic to zserio.
A structure type may contain default values for fields which are not arrays or compound types:
Example
enum uint8 BasicColor
{
BLACK,
WHITE,
RED
};
struct StructureDefaultValues
{
bool boolValue = true;
bit:4 bit4Value = 0x0F if boolValue == true;
int16 int16Value = 0x0BEE;
float16 float16Value = 1.23f;
float32 float32Value = 1.234f;
float64 float64Value = 1.2345;
string stringValue = "string";
BasicColor enumValue = BasicColor.BLACK;
};
The default values are used by the encoder if the value of corresponding field has not been set. So, there will be always a value written to the stream. This is in contrast to other serialization mechanisms where the decoders would generate the default value. Reason for this is the missing "wire format" in zserio which would add additional information in the stream for identifying fields and whether they are set or not.
An array type is like a sequence of members of the same type. The element type may be any other type, except an array type. (Two dimensional arrays can be emulated by wrapping the element type in a structure type.)
The length of an array is the number of elements, which may be fixed (i.e. set at compile time) or variable (set at runtime). The elements of an array have indices ranging from 0 to n-1, where n is the array length.
The notation for array types and elements is similar to C:
Example
struct ArrayExample
{
uint8 header[256];
int16 numItems;
Element list[numItems];
};
Field header
is a fixed-length array of 256 bytes. Field list
is an array with n elements, where n is
the value of numItems
. Individual array elements may be referenced in expressions with the usual index
notation, e.g. list[2]
is the third element of the list
array.
An array type may have an implicit length indicated by an implicit
keyword and an empty pair of brackets.
In this case, the decoder will continue matching instances of the element type until the end of the stream is
reached. Implicit arrays must be at the end of the BLOB. It might also be the complete BLOB.
Example
struct ImplicitArray
{
implicit Element list[];
};
The length of the list
array can be referenced as lengthof(list)
, see
lengthof Operator.
An array type may have an automatic length indicated by an empty pair of brackets. In this case, the encoder will automatically store the array length into the bit stream.
Example
struct AutoArray
{
Element list[];
};
The length of the list
array can be referenced as lengthof(list)
, see
lengthof Operator.
Auto length arrays might be particularly useful if variable array length expression is a single field:
Example
struct AutoArrayCandidate
{
uint32 numElements;
Element list[numElements];
};
The align(n)
modifier can be used to force the decoder to skip 0..n-1
bits so that the bit offset from
the beginning of the stream is divisible by n
. n
may be any integer literal.
Alignment modifiers may be used in any structure type:
Example
struct AlignmentExample
{
bit:11 a;
align(32):
uint32 b;
};
The size of the AlignmentExample
type is 64 bits. Without the alignment modifier, the size would be 43 bits.
If a member with alignment is optional, alignment is optional as well:
Example
struct Example
{
bool hasOptional;
align(32):
int32 myOptionalField if hasOptional == true;
int32 myField;
};
In the above code example, if the member is optional (hasOptional == false
) then no align(32)
will be
executed and the size will be 33 bits.
The name of a member of integral type may be used as an offset on another member to indicate its byte offset from the beginning of the stream:
Example
struct Tile
{
TileHeader header;
uint32 stringOffset;
uint16 numFeatures;
stringOffset:
StringTable stringTable;
};
In this example, offset indicates that the value of stringOffset
contains the byte offset of member
stringTable
from the beginning of the stream.
Offsets are checked by decoders and encoders automatically. Offsets can be set before encoding automatically.
Since offsets always refer to byte offsets, a given member within a structure type cannot have an offset if it is not guaranteed to be byte-aligned. To overcome this restriction, a byte alignment is inserted automatically:
Example
struct Tile
{
TileHeader header;
uint32 stringOffset;
uint16 numBits;
bit:1 bits[numBits];
stringOffset: // also implies an align(8)
StringTable stringTable;
};
If the member with offset is optional, the offset is optional as well:
Example
struct Example
{
uint32 byteOffset;
bool hasOptional;
byteOffset:
int32 myOptionalField if hasOptional == true;
int32 myField;
};
In the code example above, if the member is optional (hasOptional == false
) then no offset will be checked
and the size will be 65 bits.
When all elements in an array should have offsets, a special notation can be used:
Example
struct IndexedInt32Array
{
uint32 offsets[10];
bit:1 spacer;
offsets[@index]:
int32 data[10];
};
In this example, @index
denotes the current index of the data array
. The use of this expression
in the array of offsets
indicates that the i-th element of the array offsets contains a byte offset
of i-th element of member data
calculated from the beginning of the stream.
Since offsets can refer only to byte offsets, each element of an array with indexed offset is automatically byte-aligned:
Example
struct IndexedBit5Array
{
uint32 offsets[2];
bit:1 spacer;
offsets[@index]: // implies align(8) before each data[i]
bit:5 data[2];
};
The size of the IndexedBit5Array
type will be 64+1+7+5+3+5=85 bits. The size of offset array data
will be
5+3+5=13 bits.
The semantics of expression and the precedence rules for operators is the same as in Java, except where stated otherwise. Zserio has a number of special operators which will be explained in detail below.
The following Java operators have no counterpart in zserio: ++
, --
, >>>
and instanceof
.
The negation operator !
is defined for boolean expressions.
For integer expressions, there are +
(unary plus) and -
(unary minus).
The bitwise complement ~
is defined for integer and bitmask expressions.
The lengthof
operator may be applied to an array member and returns the actual length (i.e. number
of elements of an array. Thus, given int32 a[5]
, the expression lengthof
a evaluates to 5
. This is not
particularly useful for fixed or variable length arrays, but it is the only way to refer to the length of an
implicit length array.
Example
struct LengthOfOperator
{
uint8 implicitArray[];
function uint32 getLengthOfImplicitArray()
{
return lengthof(implicitArray);
}
};
The valueof
operator may be applied to an enumeration or bitmask type and returns its actual value as
an integer value.
Example
struct ValueOfOperator
{
Color color;
function uint8 getValueOfColor()
{
return valueof(color);
}
};
enum uint8 Color
{
WHITE = 1,
BLACK = 2
};
The numbits(numValues)
operator is defined for unsigned integers as minimum number of bits required to encode
numValues
different values. The returned number is of type uint8
. The numbits
operator returns 0
if
applied to value 0
.
The following table shows the results of the numbits
operator applied some common values:
numbits(0) = 0
numbits(1) = 1
numbits(2) = 1
numbits(3) = 2
numbits(4) = 2
numbits(8) = 3
numbits(16) = 4
Example
struct NumBitsOperator
{
uint8 value8;
function uint8 getNumBits8()
{
return numbits(value8);
}
};
The integer arithmetic operations include +
(addition), -
(subtraction), *
(multiplication),
/
(division), %
(modulo). In addition, zserio also supports shift operators <<
and >>
.
The following relational operators for integer expressions are supported: ==
(equal to),
!=
(not equal to), <
(less than), <=
(less than or equal), >
(greater than),
>=
(greater than or equal).
The equality operators ==
and !=
may be applied to any type.
The boolean operators &&
(and) and ||
(or) may be applied to boolean expressions.
The bit operators &
(bitwise and), |
(bitwise or), ^
(bitwise exclusive or) may be applied to integer
types.
The postfix operators include []
(array index), ()
(instantiation with argument list or function call) and
.
(member access).
A conditional expression booleanExpr ? expr1 : expr2
has the value of expr1
when booleanExpr
is true.
Otherwise, it has the value of expr2
.
In the following list, operators are grouped by precedence in descending order. Operators on the line have the highest precedence and are evaluated first. All operators on the same line have the same precedence and are evaluated left to right, except ternary operator which are evaluated right to left.
()
,[]
,.
lengthof
valueof
numbits
- unary
+
-
~
!
*
/
%
+
-
<<
>>
<
>
<=
>=
==
!=
&
^
|
&&
||
?
:
The dot operator can be used to access a member of a compound type.
The expression f.m
is valid if
f
is a field of a compound typeC
- The type
T
off
is a compound type T
has a member namedm
The value of the expression f.m
can be evaluated at runtime only if the member f
has been evaluated before.
Example
struct Header
{
uint16 version;
uint16 numSentences;
};
struct Message
{
Header header;
string sentences[header.numSentences];
};
Within the scope of the Message
type, header
refers to the field of type Header
, and
header.numSentences
is a member of that type.
The definition of a compound type may be augmented with a parameter list, similar to a parameter list in a Java method declaration. Each item of the parameter list has a type and a name. Within the body of the compound type definition, parameter names may be used as expressions of the corresponding type.
To use a parameterized type as a field type in another compound type, the parameterized type must be instantiated with an argument list matching the types of the parameter list.
Example
struct Header
{
uint32 version;
uint16 numItems;
};
struct Message
{
Header header;
Item(header) items[header.numItems];
};
struct Item(Header header)
{
uint16 param;
uint32 ExtraParam if header.version >= 10;
};
When the element type of an array is parameterized, a special notation can be used to pass different arguments to each element of the array:
Example
struct Database
{
uint16 numBlocks;
BlockHeader headers[numBlocks];
Block(headers[@index]) blocks[numBlocks];
};
struct BlockHeader
{
uint16 numItems;
uint32 offset;
};
struct Block(BlockHeader header)
{
header.offset:
int64 items[header.numItems];
};
The @index
denotes the current index of the blocks
array. The use of this expression in the argument list
for the Block
reference indicates that the i
-th element of the blocks
array is of type Block
instantiated with the i
-th header headers[i]
.
A subtype definition defines a new name for a given type. This is rather like a typedef
command in C:
Example
subtype uint16 BlockIndex;
struct Block
{
BlockIndex blockIndex;
uint32 data;
};
Zserio supports the standard comment syntax of Java or C++. Single line comments start with //
and extend
to the end of the line. A comments starting with /*
is terminated by the next occurrence of */
, which may
or may not be on the same line.
Example
// This is a single-line comment.
/* This is an example
of a multi-line comment
spanning three lines. */
To support inline documentation within a zserio module, multi-line comments starting with /**
are treated as
special documentation comments. The idea and syntax is borrowed from Java(doc). A documentation comment is
associated to the following type or field definition. The documentation comment and the corresponding
definition may only be separated by whitespace.
Example
/**
* Traffic flow on roads.
*/
enum bit:2 Direction
{
/** No traffic flow allowed. */
NONE,
/** Traffic allowed from start to end node. */
POSITIVE,
/** Traffic allowed from end to start node. */
NEGATIVE,
/** Traffic allowed in both directions. */
BOTH
};
The documentation comments can contain special tags which is shown by the following example:
/**
* The tile contains a number of different elements
* grouped by feature classes, e.g. intersections, roads and
* so on...
*
* The presence of these members is indicated by the content
* mask in the header (please have a look at the
* following @see "documentation" headerDefinition page).
*
* @see headerDefinition
*
* @param level level number
* @param width width for the current tile
*
* @todo Update this comment.
*
* @deprecated
*/
The content of a documentation comment, excluding its delimiters, is parsed line by line. Each line is stripped
of leading whitespace, a sequence of asterisks (*
), and more whitespace, if present. After stripping,
a comment is composed of one or more paragraphs, followed by zero or more tag blocks. Paragraphs are separated
by blank lines. The text in paragraphs can contain HTML formatting tags like <ul>
, <li>
or </br>
directly.
A line starting with whitespace and a keyword preceded by an at-sign (@
) is the beginning of a tag.
The following sections describe all supported tags in the documentation comments in detail.
The see
tag defines the link in generated documentation and has the following format:
@see "TEXT_ALIAS" TYPE.FIELD
The TEXT_ALIAS
is text which will be shown in the generated documentation instead of the reference
TYPE.FIELD
. This alias text is optional and can be omitted.
The TYPE.FIELD
must be the valid reference to the field of the zserio type. The FIELD
definition is
optional and can be omitted.
The see
tag is the only tag which does not have to defined at the beginning of the line and can be embedded
directly in the comment text.
Example
/**
* Please see @see "black color" ColorEnumerationType.BLACK definition for more description.
*/
The param
tag is used for documenting the arguments of a parameterized type. This tag has the following
format:
@param PARAM_NAME PARAM_DESCRIPTION
The PARAM_NAME
defines the parameter name.
The PARAM_DESCRIPTION
contains the parameter description. This description can be defined on multiple lines.
Example
/**
* This type takes two arguments.
*
* @param arg1 The first argument.
* @param arg2 The second argument.
*/
struct ParamType(Foo arg1, Bar arg2)
{
...
};
The todo
tag is used for documenting the action which should be done in the future. This tag has
the following format:
@todo ACTION_DESCRIPTION
The ACTION_DESCRIPTION
contains arbitrary text. This text can be defined on multiple lines.
Example
/**
* The text of the comment.
*
* @todo Don't forget to update this comment!
*/
This tag assigns the documented zserio type as deprecated which means that this type is going to be invalid in future versions of the schema. It has the following format:
@deprecated
Complex zserio specifications should be split into multiple packages stored in separate source files. Every user-defined type belongs to a unique package. For backward compatibility, there is an unnamed default package used for files without an explicit package declaration. It is strongly recommended to use a package declaration in each zserio source file.
A package provides a lexical scope for types. Type names must be unique within a package, but a given type name
may be defined in more than one package. If a type named Coordinate
is defined in package com.acme.foo
, the
type can be globally identified by its fully qualified name com.acme.foo.Coordinate
, which is obtained by
prefixing the type name with the name of the defining package, joined by a dot. Another package
com.acme.bar
may also define a type named Coordinate
, having the fully qualified name
com.acme.bar.Coordinate
.
By default, types from other packages are not visible in the current package, unless they are imported explicitly. The package and import syntax and semantics follow the Java example.
Example
package map;
import common.geometry.*;
import common.featuretypes.*;
Import declarations only have any effect when there is a reference to a type or symbol not defined in the
current package. If package map defines its own Coordinate
type, any reference to that within package map
will be resolved to the local type map.Coordinate
, even when one or more of the imported packages also define
a type named Coordinate
.
On the other hand, if package map
references a Coordinate
type but does not define it, the import
declarations are used to resolve that type in one of the imported packages. In that case, the referenced type
must be matched by exactly one of the imported packages. It is obviously a semantic error if the type name is
defined in none of the packages. It is also an error if the type name is defined in two or more of the imported
packages. The order of the import declarations does not matter.
It is always possible to use the fully qualified name of a type, e.g. com.acme.bar.Coordinate
. This makes it
possible to import a type with the same name (e.g. Coordinate
) from more than one package or to import a type
with the same name as a type defined locally.
Individual types or symbols can be imported using their fully qualified name:
import common.geometry.Geometry;
This single import has precedence over any wildcard import. It prevents an ambiguity with
common.featuretypes.Geometry
. It is possible to import the same type from different packages but then
each usage of such type must be fully qualified. Using the unqualified name in this situation results in
a compilation error as the type is ambiguous.
Package and file names are closely related. Each package must be located in a separate file. The above example
declares a package map
stored in a source file map.zs
. The import declarations direct the parser to locate
and parse source files common/geometry.zs
and common/featuretypes.zs
.
Imported files may again contain import declarations. Cyclic import relations between packages are supported but should be avoided. The zserio parser takes care to parse each source file just once.
Although basic zserio language features provides a strong tool for binary data modeling, there might be situations when reducing duplications using generic programming would bring high benefits for the users.
The basic stone of generic programming in zserio are templates that allow zserio compound types (structure types, choice types, union types) to operate with generic types. Such generic types are called template parameters and they will be specified later during template instantiation as template arguments.
Because zserio must check correctness of all template instantiations and because zserio should support generators to almost any kind of programming language (even to language which does not support templates), the template instantiations must be resolved during zserio compilation.
Zserio templates got inspiration from C++ class templates concept and borrowed syntax from Java generics.
The compound type template is defined by normal compound type declaration together with template parameters
denoted by signs <>
:
Example
struct Field<T>
{
T value;
};
This example defines structure type template with one template parameter called T
. The template
parameter can be used anywhere as a generic type within lexical scope of the structure. This generic type will
be specified during template instantiation as follows:
Example
struct StructTemplatedField
{
Field<uint32> uint32Field;
};
This example instantiates the template Field
using argument uint32
. From the zserio language point of view,
the previous structure type template example is actually the same as the following "no-generics-programming"
example:
Example
struct Field_uint32
{
uint32 value;
};
struct StructTemplatedField
{
Field_uint32 uint32Field;
};
Parameterized types templates are supported with the very similar syntax. The parameterized types parameters can be template instantiations as well. The following example shows intuitive syntax for parameterized type templates:
Example
struct ParamHolder<T>
{
T param;
};
struct Parameterized<T>(ParamHolder<T> paramHolder)
{
string description;
paramHolder.param:
uint32 id;
};
struct StructTemplatedTypeArgument
{
ParamHolder<uint32> paramHolder;
Parameterized<uint32>(paramHolder) parameterized;
};
Templated template arguments are supported sa well. In another words, template arguments can be another template instantiations. The following example shows syntax for templated template arguments:
Example
struct Field<T>
{
T value;
};
struct Compound<T>
{
T value;
};
struct StructTemplatedTemplateArgument
{
Field<Compound<uint32>> compoundField;
};
Subtypes can be used for aliasing template types similarly to normal zserio types. The following example shows the usage:
Example
struct TestStructure<T>
{
T value;
};
subtype TestStructure<uint32> TestStructureSubtype;
The zserio template instantiations are generated by default in the package where the corresponding template is declared. This ensures that only one template instantiation is generated if it is used multiple times from different packages. This default behavior is necessary because template instantiations can be used as parameters across different packages.
However, there might be the situations where forcing package of the template instantiation would be beneficial.
Because of that, a new keyword instantiate
has been added to the zserio language.
Example
package template_declaration;
struct TestStructure<T>
{
T value;
};
package template_instantiation;
import template_declaration.*;
instantiate TestStructure<uint32> TestStructure32;
The instantiate
command from previous example means
- that the
TestStructure<uint32>
instantiation will be generated in the packagetemplate_instantiation
instead oftemplate_declaration
package and - that the
TestStructure<uint32>
instantiation will be named asTestStructure32
and - that new zserio type
TestStructure32
will be defined in the packagetemplate_instantiation
.
The instantiate types can be imported as other normal types. Thus, all imported instantiate types are visible in the package.
It's forbidden to have instantiate
command (specified or imported) to the same template instantiation
multiple times in the same package.
However, it's allowed to do so in the different packages (which are not imported). In this case, the same
template will be instantiated as different type in the other package. Because of that, the instantiate
command should be considered very carefully and used only if it's really necessary.
Binary data streams defined by zserio are also good candidates to be used in RPC systems. Zserio introduces generic services directly in the language. A service type contains definitions of service methods.
Example
struct UserId
{
uint32 id;
};
struct User
{
uint32 id;
string name;
string surname;
string phoneNumber;
};
service Users
{
User getUser(UserId);
};
A service method must have a single response and single request type. When no response or request type is needed, an empty structure can be used. Currently only simple unary calls are supported by zserio.
The types must be non-parameterized compound types. Parameterized types are not allowed since the parameters are not stored in the bit stream. However parameterized types can be still used in the response or request types' subtree.
Aside of services, zserio also supports publish-subscribe messaging pattern. The pubsub type defines a Pub/Sub client.
Example
pubsub WeatherProvider
{
publish topic("weather/warnings") WeatherWarning warnings;
};
pubsub WeatherClient
{
subscribe topic("weather/warnings") WeatherWarning weatherWarnings;
};
struct WeatherWarning
{
string warningMessage;
};
The pubsub defines messages which can be either published or subscribed (or both). The example above defines a
WeatherProvider
pubsub type which defines a single message warnings
, which is published under the topic
named "weather/warnings" and the type of the message is WeatherWarnings
structure. Then it defines
a WeatherClient
pubsub type which has a single subscription weatherWarnings
for messages published under
topic "weather/warnings". The client expects that the type of messages arriving to weatherWarnings
subscription (i.e. published under the defined topic) are of the type WeatherWarning
.
Pubsub type can define a message in three ways:
topic("topic/definition") Type message
to both publish and subscribe a message,publish topic("topic/definition") Type message
to publish a message,subscribe topic("topic/definition") Type message
to subscribe a message.
In the Pub/Sub pattern, it is common to use wildcards for topic definitions in subscriptions. The wildcards format depends on a particular implementation. Zserio only provides a generic definition of Pub/Sub clients and doesn't manipulate with the topic definition string. It therefore depends on the particular Pub/Sub backend whether the wildcards are supported and how. See the MQTT standard as an example of a concrete Pub/Sub pattern specification.
Message type must be a non-parameterized compound type. Parameterized types are not allowed since the parameters are not stored in the bit stream. However parameterized types can be still used in the type's subtree.
With its basic language features presented in the previous sections, zserio provides a rich language for modeling binary data streams, which are intended to be parsed sequentially. Direct access to members in the stream is usually not possible, except for specifying the offset of a given member. Navigation between semantically related members at different positions in the stream cannot be expressed at the stream level. Member insertions or updates are not supported.
All in all, the stream model is not an adequate approach for updatable databases in the gigabyte size range with lots of internal cross-references where fast access to individual members is required. In a desktop or server environment, it would be a natural approach to model such a database as a relational database using SQL. However, in an embedded environment with limited storage space and processing resources, a full-fledged relational schema is too heavy-weight. To have the best of both worlds, i.e. compact storage on the one hand and direct member access including updates on the other hand, one can adopt a hybrid data model: In this hybrid model, the high-level access structures are strictly relational, but most of the low-level data are stored in binary large objects (BLOBs), where the internal structure of each BLOB is modeled with zserio.
For example, we can model a digital map database as a collection of tiles resulting from a rectangular grid where the tiles are numbered row-wise. The database has a rather trivial schema:
CREATE TABLE europe (tileNum INTEGER PRIMARY KEY, tile BLOB NOT NULL);
Accessing or updating any given tile can simply be delegated to the relational DBMS, in case of this zserio extension, SQLite.
Assuming that the tile BLOBs have a reasonable size, each tile can be decoded on the fly to access the individual members within the tile. For seamless modelling of this hybrid approach, we decided to add relational extensions for SQLite to zserio. Some SQL concepts have been translated to zserio, others are transparent to zserio and can be embedded as literal strings to be passed to the SQLite engine.
An SQL table type is a special case of a compound type, where the members of the type correspond to the columns of a relational table.
In zserio it is possible to express the above example as follows:
Example
sql_table GeoMap
{
int32 tileId sql "PRIMARY KEY NOT NULL";
Tile tile sql "NOT NULL";
};
GeoMap europe;
GeoMap america;
It is important to note that the GeoMap
is a table type and not a table. A table is defined by the instance
europe
of type GeoMap
. Table types have no direct equivalent in SQL. They can be used to create tables
with identical structure and column names. Each instance of an sql_table
type in zserio translates to an
SQLite SQL table where the table name in the SQL schema is equal to the instance name in zserio. A member
definition may include an SQL constraint introduced by the keyword sql
, followed by a literal string which is
then passed to the SQLite engine.
Thus, the zserio instance america
results in the following SQL table:
CREATE TABLE america (tileNum INT PRIMARY KEY NOT NULL, tile BLOB NOT NULL);
It is also possible to use the zserio keyword sql
directly inside the table definition. The main use for this
syntax is to define a primary key spanning multiple fields.
Example
sql_table BusinessLocationTable
{
BusinessId businessId sql "NOT NULL";
CategoryId catId sql "NOT NULL";
Position position sql "UNIQUE";
int8 hasIcon;
sql "PRIMARY KEY(businessId, catId)";
};
SQL table types can be templated using the same syntax as other zserio compound types, see templates.
For the mapping of zserio types to SQL types, refer to SQL Types Mapping.
Virtual tables in zserio are an extension to the sql_table
. The following paragraph gives a definition
of a virtual table from the SQLite website:
A virtual table is an interface to an external storage or computation engine that appears to be a table but does not actually store information in the database file. In general, you can do anything with a virtual table that can be done with an ordinary table, except that you cannot create indices or triggers on a virtual table. Some virtual table implementations might impose additional restrictions. For example, many virtual tables are read-only.
The syntax is modeled as an extension to the sql_table
, where an optional module can be specified:
sql_table <tablename> using <modulename>
The following example creates a virtual table using SQLite's FTS5 module:
sql_table Pages using fts5
{
string title;
string body;
};
The following example creates a virtual table using the RTREE module:
sql_table TestTable using rtree
{
int32 id;
int32 minX;
int32 maxX;
int32 minY;
int32 maxY;
};
When Virtual tables are used in zserio some columns are automatically defined by the module used after the
using
keyword. The keyword sql_virtual
allows to add generated columns to documentation (HTML) so it will
be possible to find what type the column belongs or to what features it references. There will be no code
generated for these columns.
The table generation code will not contain these columns. This keyword is not limited to virtual tables only but it makes the most sense to just use it in virtual tables.
The syntax is modeled as an extension to the sql_table
column definition:
sql_virtual <type name> <column name>;
The following example creates a virtual table using the SQLite FTS5 module. It will omit to create the content column but allow to read or write to it.
Example
sql_table Pages using fts5
{
sql_virtual string content;
};
When a sql_table
member is an instance of a parameterized type, the application may want to derive
the parameter values from the context (e.g. other table columns), which is not available to the zserio decoder.
In this case, the type arguments shall be marked with the keyword explicit
to indicate that these values will
be set explicitly be the application. Otherwise, the decoder would complain about not being able to evaluate
the type arguments.
Example
struct Tile(uint8 level, uint8 width)
{
...
};
sql_table TileTable
{
uint32 tileId;
uint32 version;
Tile(explicit level, explicit width) tile;
};
To support the WITHOUT ROWID
optimization in SQLite, the sql_without_rowid
keyword is used in zserio.
A sql_without_rowid
keyword is always a part of the sql_table
type:
Example
sql_table WithoutRowIdTable
{
string word sql "PRIMARY KEY NOT NULL";
uint32 count;
sql_without_rowid;
};
A sql_without_rowid
keyword must be defined after all possible fields and SQL constraints inside the SQL
table.
A sql_without_rowid
keyword specified in sql_table
type causes to create a corresponding SQLite table
with omitting the special "rowid" column. This may bring space and performance advantages.
Creating a SQL table using the WITHOUT ROWID
optimization without specifying the primary key is considered
a compilation error.
Since an SQL table is always contained in an SQL database, we introduce a sql_database
type in zserio to
model databases. sql_table
instances may only be created as members of an sql_database
.
Example
sql_table GeoMap
{
// see above
};
sql_database TheWorld
{
GeoMap europe;
GeoMap america;
...
..
};
Zserio type | SQLite type |
---|---|
uint8, uint16, uint32, uint64 | INTEGER |
int8, int16, int32, int64 | INTEGER |
bit:n (n < 64) | INTEGER |
int:n (n <= 64) | INTEGER |
float16, float32, float64 | REAL |
varuint16, varuint32, varuint64, varuint | INTEGER |
varint16, varint32, varint64, varint | INTEGER |
bool | INTEGER |
string | TEXT |
enum | INTEGER |
bitmask | INTEGER |
struct | BLOB |
choice | BLOB |
union | BLOB |
Zserio is based on the work of Godmar Back and was called DataScript at that time.
His work was taken up by the members of the Navigation Data Standard Association (an industry consortium of companies from the digital maps business) and had been developed internally until 2018.
While Back's reference implementation provided a great start, some language extensions were added to better suit the requirements of the NDS members.
As a major addition to the DataScript language, a relational extension had been introduced, which permits the definition of hybrid data models, where the high-level access structures are implemented by relational tables and indices, whereas the bulk data are stored in single columns as BLOBs with a format defined in DataScript, hence then named Relational DataScript. Since the Relational DataScript was used on top of a SQLite database also some SQLite specific language elements had been added during that time.
By 2018 the NDS consortium decided to open source the work done since they forked off from Godmar Back's reference implementation.
Since the name DataScript already was used by other projects for different purposes and the fact that it has never really been a script language anyhow, a new name needed to be found: zserio. An acronym for zero serialization overhead and pronounced with a silent "s".
The original reference implementation from which we derived zserio was using the BSD 3-clause license. This is the reason why all of the work described above is also released under BSD-3 license (see LICENSE.md file in root directory of this repo).
Copyright remains at Godmar Back and Navigation Data Standard e.V.