Should lists in SIER be byte-length prefixed or item-length prefixed? #15

shelbyd · 2021-07-13T20:37:42Z

shelbyd
Jul 13, 2021

As we approach adding support for lists in SIER, we need to decide how to prefix lists. The two primary options are byte-length or item-length.

Byte Length

This would have a list start with the number of bytes for all the list items. This would help with skipping parsing lists if the contents are not of interest.

Item Length

The beginning of the list would be the number of items in the list. This would help if we were interested in the number of items in a list beyond actually parsing it.

Fixed length items

If we had the constraint that all items in a list needed to be the same length, we could prefix the list with the number of items and infer the number of bytes from the constraint. This would require variable-length items to be pointers to the following data and may not easily allow skipping data anyway. This is similar to what Cap'n Proto does.

Both

We could additionally have both byte-length and item-length as the prefix. This would add extra byte overhead and parse time, but allow for both features. It's unclear how we would handle the two values disagreeing.

Conclusion

After laying these out, I think the best option is byte-length prefixing. It's more likely we'll want to skip over a list when parsing than to know how many items are in the list without actually parsing the items.

Answered by frm

Jul 13, 2021

I spent some time thinking about this, particularly what we discussed about:

This would add extra byte overhead and parse time

How significant would this be? One thing that comes to mind is to ship with this extra overhead and later optimise to one of those. I'm just trying to avoid any premature optimisation until we understand what is the most common use case.

If you feel confident enough that byte-length suits our needs, I'm behind it.

View full answer

frm · 2021-07-13T20:44:01Z

frm
Jul 13, 2021

I spent some time thinking about this, particularly what we discussed about:

This would add extra byte overhead and parse time

How significant would this be? One thing that comes to mind is to ship with this extra overhead and later optimise to one of those. I'm just trying to avoid any premature optimisation until we understand what is the most common use case.

If you feel confident enough that byte-length suits our needs, I'm behind it.

1 reply

shelbyd Jul 16, 2021
Author

The additional byte overhead is very minor so not a strong reason. I'm more concerned about the case where byte-length and item-length disagree. I don't think there's a good way to handle it other than failing to parse.

I'm going with byte-length prefixing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should lists in SIER be byte-length prefixed or item-length prefixed? #15

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Should lists in SIER be byte-length prefixed or item-length prefixed? #15

shelbyd Jul 13, 2021

Byte Length

Item Length

Fixed length items

Both

Conclusion

Replies: 1 comment · 1 reply

frm Jul 13, 2021

shelbyd Jul 16, 2021 Author

shelbyd
Jul 13, 2021

Replies: 1 comment 1 reply

frm
Jul 13, 2021

shelbyd Jul 16, 2021
Author