Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ffi: Add support for auto-generated keys in key-value pair IR format. #556

Open
LinZhihao-723 opened this issue Oct 13, 2024 · 0 comments
Open
Assignees
Labels
enhancement New feature or request

Comments

@LinZhihao-723
Copy link
Member

Request

In the current key-value pair IR format, we only have one type of key-value pairs. As we planned to extend the current format, we decided to split input key-value pairs into two categories:

  • Auto-generated key-value pairs: added by logging libraries, served as metadata of the log event, i.e., the timestamp of the log event.
  • User-generated key-value pairs: user data specified in their logging statement.

This requires the underlying serialization/deserialization to maintain two key namespaces to differentiate auto-generated keys from user-generated keys. The reason is that the same key may exist in both pairs. For example, they both can have a key named “timestamp.” These two namespaces will be implemented as two individual schema trees inside the serializer/deserializer.

To fully support this feature, we also need to update the serialization/deserialization APIs to receive/return user-generated kv pairs and auto-generated kv pairs as different msgpack objects.

Possible implementation

The tricky part is how we serialize schema tree node IDs. The stream maintains two schema trees: one for the auto-generated keys, and one for the user-generated keys. When encoding a schema tree node ID, we don’t want to create two sets of header bytes for two trees because:

  • We want to reuse serialization/deserialization logic as much as possible to reduce code duplication;
  • The implementation of two trees is the same, we just need a way to differentiate which tree the node ID refers to.

Therefore, we used signed encoded node IDs to differentiate two schema trees. The convention we use is the following:

  • If the encoded ID i has a non-negative value (>= 0), it belongs to the user-generated key schema tree, and the actual node ID in the tree is i.
  • If the encoded ID i has a negative value (< 0), this ID belongs to the auto-generated-key schema tree, and the actual node ID in the tree is ~i, where ~ is the complement operator. This is essentially called one's complement
    • We do not take the absolute value |i| of the negative encoded value because we might need to refer to the root, which has a numerical ID 0, before encoding. One's complement allows us to refer to 0 using hex value 0xFFFF
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant