Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add message streaming support #518

Merged
merged 9 commits into from
Aug 29, 2023
Merged

Conversation

JoshuaLeivers
Copy link
Contributor

Adds support for serializing/deserializing messages and their components to/from streams. Where possible, existing methods now use this functionality internally, minimising code size.

This is useful as a standalone feature, and is also a step towards the functionality requested in #482

As part of this, the following functions have been created/changed:

  • betterproto.dump_varint - this function encodes a value as a varint and dumps it into the provided stream. It is mostly the same as the existing betterproto.encode_varint was, but based around streams and with some additional error checking.

  • betterproto.encode_varint - this existing function has the same effects, but now uses betterproto.dump_varint internally, keeping code size and complexity down.

  • betterproto.size_varint - this function calculates the size of the varint for a given value without actually serializing it. This may be useful to some, and similar functions exist in the official C++ and other implementations. It is also used internally by other new functionality to reduce memory and time usage compared to simply running len(...) after serializing a varint.

  • betterproto._len_preprocessed_single - calculates the size of the value that would be returned by _preprocess_single without fully serializing it. Used internally by new functionality to reduce memory and time usage over simply serializing and then checking the size of it.

  • betterproto._len_single - similar to above, but for _serialize_single.

  • betterproto.load_varint - loads a varint from a stream and decodes its value. Mostly the same as decode_varint already was, but based around streams.

  • betterproto.decode_varint - existing function, has the same functionality as it did previously, but now uses load_varint internally to keep code size and complexity down.

  • betterproto.load_fields - does the same as parse_fields, but loads the fields from the provided stream, rather than a bytes object. Used internally by Message.load.

  • betterproto.Message.dump - does the same as Message.__bytes__ already did, but dumps the results to a stream rather than a bytes object.

  • betterproto.Message.__bytes__ - does as it already did, but now uses Message.dump internally to reduce code size and complexity.

  • betterproto.Message.__len__ - returns the size of the encoded message - i.e. does the same as len(bytes(message)) without fully serializing the message, reducing time and memory usage.

  • betterproto.Message.load - loads and parses a binary encoded message from a stream. Similar to Message.parse, but retrieving the data from a stream rather than a bytes object.

  • betterproto.Message.parse - does as it already did, but now uses Message.load internally to reduce code size and complexity.

Adds support for serializing/deserializing messages and their
components to/from streams. Where possible, existing methods now
use this functionality internally, minimising code size.

This is useful as a standalone feature, and is also a step towards
the functionality requested in danielgtaylor#482

As part of this, the following functions have been created/changed:
- `betterproto.dump_varint` - this function encodes a value as a
    varint and dumps it into the provided stream. It is mostly the
    same as the existing `betterproto.encode_varint` was, but based
    around streams and with some additional error checking.

- `betterproto.encode_varint` - this existing function has the same
    effects, but now uses `betterproto.dump_varint` internally,
    keeping code size and complexity down.

- `betterproto.size_varint` - this function calculates the size of
    the varint for a given value without actually serializing
    it. This may be useful to some, and similar functions exist in
    the official C++ and other implementations. It is also used
    internally by other new functionality to reduce memory and time
    usage compared to simply running `len(...)` after serializing a
    varint.

- `betterproto._len_preprocessed_single` - calculates the size of
    the value that would be returned by `_preprocess_single`
    without fully serializing it. Used internally by new
    functionality to reduce memory and time usage over simply
    serializing and then checking the size of it.

- `betterproto._len_single` - similar to above, but for
    `_serialize_single`.

- `betterproto.load_varint` - loads a varint from a stream and
    decodes its value. Mostly the same as `decode_varint`
    already was, but based around streams.

- `betterproto.decode_varint` - existing function, has the same
    functionality as it did previously, but now uses `load_varint`
    internally to keep code size and complexity down.

- `betterproto.load_fields` - does the same as `parse_fields`, but
    loads the fields from the provided stream, rather than a
    `bytes` object. Used internally by `Message.load`.

- `betterproto.Message.dump` - does the same as `Message.__bytes__`
    already did, but dumps the results to a stream rather than a
    `bytes` object.

- `betterproto.Message.__bytes__` - does as it already did, but now
    uses `Message.dump` internally to reduce code size and
    complexity.

- `betterproto.Message.__len__` - returns the size of the encoded
    message - i.e. does the same as `len(bytes(message))` without
    fully serializing the message, reducing time and memory usage.

- `betterproto.Message.load` - loads and parses a binary encoded
    message from a stream. Similar to `Message.parse`, but
    retrieving the data from a stream rather than a `bytes` object.

- `betterproto.Message.parse` - does as it already did, but now
    uses `Message.load` internally to reduce code size and
    complexity.
Also adds return type hint.
JoshuaLeivers and others added 2 commits August 21, 2023 11:12
This should improve performance while not significantly impacting readability.

Co-authored-by: James Hilton-Balfe <[email protected]>
The change from using Generator to using Iterator for return
type hints was in error due to a misreading of the Python docs.
`Message.parse(...)` will now accept a `_typeshed.ReadableBuffer`
rather than only a `bytes` object.
@Gobot1234
Copy link
Collaborator

Thanks for all the work on this

@JoshuaLeivers
Copy link
Contributor Author

Hi, was there anything left over on this PR for me to do? Just wanted to check up on how close it is to being merged, or if there's anything preventing it 🙂

@Gobot1234 Gobot1234 merged commit 8659c51 into danielgtaylor:master Aug 29, 2023
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants