You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the current key-value pair IR format, we only have one type of key-value pairs. As we planned to extend the current format, we decided to split input key-value pairs into two categories:
Auto-generated key-value pairs: added by logging libraries, served as metadata of the log event, i.e., the timestamp of the log event.
User-generated key-value pairs: user data specified in their logging statement.
This requires the underlying serialization/deserialization to maintain two key namespaces to differentiate auto-generated keys from user-generated keys. The reason is that the same key may exist in both pairs. For example, they both can have a key named “timestamp.” These two namespaces will be implemented as two individual schema trees inside the serializer/deserializer.
To fully support this feature, we also need to update the serialization/deserialization APIs to receive/return user-generated kv pairs and auto-generated kv pairs as different msgpack objects.
Possible implementation
The tricky part is how we serialize schema tree node IDs. The stream maintains two schema trees: one for the auto-generated keys, and one for the user-generated keys. When encoding a schema tree node ID, we don’t want to create two sets of header bytes for two trees because:
We want to reuse serialization/deserialization logic as much as possible to reduce code duplication;
The implementation of two trees is the same, we just need a way to differentiate which tree the node ID refers to.
Therefore, we used signed encoded node IDs to differentiate two schema trees. The convention we use is the following:
If the encoded ID i has a non-negative value (>= 0), it belongs to the user-generated key schema tree, and the actual node ID in the tree is i.
If the encoded ID i has a negative value (< 0), this ID belongs to the auto-generated-key schema tree, and the actual node ID in the tree is ~i, where ~ is the complement operator. This is essentially called one's complement
We do not take the absolute value |i| of the negative encoded value because we might need to refer to the root, which has a numerical ID 0, before encoding. One's complement allows us to refer to 0 using hex value 0xFFFF
The text was updated successfully, but these errors were encountered:
Request
In the current key-value pair IR format, we only have one type of key-value pairs. As we planned to extend the current format, we decided to split input key-value pairs into two categories:
This requires the underlying serialization/deserialization to maintain two key namespaces to differentiate auto-generated keys from user-generated keys. The reason is that the same key may exist in both pairs. For example, they both can have a key named “timestamp.” These two namespaces will be implemented as two individual schema trees inside the serializer/deserializer.
To fully support this feature, we also need to update the serialization/deserialization APIs to receive/return user-generated kv pairs and auto-generated kv pairs as different msgpack objects.
Possible implementation
The tricky part is how we serialize schema tree node IDs. The stream maintains two schema trees: one for the auto-generated keys, and one for the user-generated keys. When encoding a schema tree node ID, we don’t want to create two sets of header bytes for two trees because:
Therefore, we used signed encoded node IDs to differentiate two schema trees. The convention we use is the following:
0xFFFF
The text was updated successfully, but these errors were encountered: