-
As we approach adding support for lists in SIER, we need to decide how to prefix lists. The two primary options are byte-length or item-length. Byte LengthThis would have a list start with the number of bytes for all the list items. This would help with skipping parsing lists if the contents are not of interest. Item LengthThe beginning of the list would be the number of items in the list. This would help if we were interested in the number of items in a list beyond actually parsing it. Fixed length itemsIf we had the constraint that all items in a list needed to be the same length, we could prefix the list with the number of items and infer the number of bytes from the constraint. This would require variable-length items to be pointers to the following data and may not easily allow skipping data anyway. This is similar to what Cap'n Proto does. BothWe could additionally have both byte-length and item-length as the prefix. This would add extra byte overhead and parse time, but allow for both features. It's unclear how we would handle the two values disagreeing. ConclusionAfter laying these out, I think the best option is byte-length prefixing. It's more likely we'll want to skip over a list when parsing than to know how many items are in the list without actually parsing the items. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
I spent some time thinking about this, particularly what we discussed about:
How significant would this be? One thing that comes to mind is to ship with this extra overhead and later optimise to one of those. I'm just trying to avoid any premature optimisation until we understand what is the most common use case. If you feel confident enough that byte-length suits our needs, I'm behind it. |
Beta Was this translation helpful? Give feedback.
I spent some time thinking about this, particularly what we discussed about:
How significant would this be? One thing that comes to mind is to ship with this extra overhead and later optimise to one of those. I'm just trying to avoid any premature optimisation until we understand what is the most common use case.
If you feel confident enough that byte-length suits our needs, I'm behind it.