ffi: Add support for auto-generated keys in key-value pair IR format. #556

LinZhihao-723 · 2024-10-13T07:42:09Z

Request

In the current key-value pair IR format, we only have one type of key-value pairs. As we planned to extend the current format, we decided to split input key-value pairs into two categories:

Auto-generated key-value pairs: added by logging libraries, served as metadata of the log event, i.e., the timestamp of the log event.
User-generated key-value pairs: user data specified in their logging statement.

This requires the underlying serialization/deserialization to maintain two key namespaces to differentiate auto-generated keys from user-generated keys. The reason is that the same key may exist in both pairs. For example, they both can have a key named “timestamp.” These two namespaces will be implemented as two individual schema trees inside the serializer/deserializer.

To fully support this feature, we also need to update the serialization/deserialization APIs to receive/return user-generated kv pairs and auto-generated kv pairs as different msgpack objects.

Possible implementation

The tricky part is how we serialize schema tree node IDs. The stream maintains two schema trees: one for the auto-generated keys, and one for the user-generated keys. When encoding a schema tree node ID, we don’t want to create two sets of header bytes for two trees because:

We want to reuse serialization/deserialization logic as much as possible to reduce code duplication;
The implementation of two trees is the same, we just need a way to differentiate which tree the node ID refers to.

Therefore, we used signed encoded node IDs to differentiate two schema trees. The convention we use is the following:

If the encoded ID i has a non-negative value (>= 0), it belongs to the user-generated key schema tree, and the actual node ID in the tree is i.
If the encoded ID i has a negative value (< 0), this ID belongs to the auto-generated-key schema tree, and the actual node ID in the tree is ~i, where ~ is the complement operator. This is essentially called one's complement
- We do not take the absolute value |i| of the negative encoded value because we might need to refer to the root, which has a numerical ID 0, before encoding. One's complement allows us to refer to 0 using hex value 0xFFFF

The text was updated successfully, but these errors were encountered:

LinZhihao-723 added the enhancement New feature or request label Oct 13, 2024

LinZhihao-723 self-assigned this Nov 6, 2024

LinZhihao-723 mentioned this issue Nov 30, 2024

feat: Add ClpKeyValuePairStreamHandler to support logging dictionary type log events into CLP key-value pair IR format. y-scope/clp-loglib-py#46

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ffi: Add support for auto-generated keys in key-value pair IR format. #556

ffi: Add support for auto-generated keys in key-value pair IR format. #556

LinZhihao-723 commented Oct 13, 2024

ffi: Add support for auto-generated keys in key-value pair IR format. #556

ffi: Add support for auto-generated keys in key-value pair IR format. #556

Comments

LinZhihao-723 commented Oct 13, 2024

Request

Possible implementation