add new trait to encode values to bytes #274

antonilol · 2024-08-17T14:06:35Z

allow choosing return type of BytesEncode

allow any type that holds bytes instead of just Cow<'_, [u8]>
allow any type that holds an error instead of just BoxedError
add hint to use writer instead of returning bytes (zero_copy method),
retuns BoxedError for now
hint to using writers for serde serialization types

Pull Request

Related issues

What does this PR do?

Introduces a new trait that allows more efficient implementations of encoding values to bytes by allowing users to return any AsRef<[u8]> + Into<Vec<u8>> type, possibly borrowed.

TODOs

fix tests
fix remaining deprecation warnings (first I only fixed the errors so the code compiles again)
fix cookbook
FixedSizeBytes: use references or not? values are expected to be small so no reference seems like the best
statically dispatchable error values are still converted to dyn std::error::Error (Make the error returned by the BytesDecode/BytesEncode trait customizable #166 uses a generic Error type, seems like the best solution)
optimize methods like Database::range where byte types are converted to Vec<u8>, possibly unnecessary because the type maybe already is owned (does not depend on 'a), like [u8; N]?
actually use it, in methods like Database::put (see "Example Usage" in Reduce allocations in BytesEncode #257 for an example)

Kerollmops · 2024-08-17T15:00:12Z

Hey @antonilol 👋

Thank you again for your help and PR. I am wondering if we could follow a little more #257 and provide a "writer" (maybe start with a Vec<u8>) to experiment. So that we can return either owned or borrowed data but the writer could directly be a buffer provided by LMDB, cf: Database::put_reserved.

Would it be possible to:

Implement a version of the trait from Reduce allocations in BytesEncode #257
Directly remove the previous trait (not deprecate it)

antonilol · 2024-08-17T15:12:09Z

I am wondering if we could follow a little more #257 and provide a "writer" (maybe start with a Vec<u8>) to experiment. So that we can return either owned or borrowed data but the writer could directly be a buffer provided by LMDB, cf: Database::put_reserved.

This is certainly possible, what I have in mind right now is a method that by default forwards to to_bytes, but can be optionally implemented if desired.

Would it be possible to:

Implement a version of the trait from Reduce allocations in BytesEncode #257

With ZERO_COPY associated const and size hint function?

Directly remove the previous trait (not deprecate it)

I sure can, with the original names (BytesEncode, EItem, bytes_encode)?

Kerollmops · 2024-08-17T17:07:12Z

With ZERO_COPY associated const and size hint function?

No, with the zero_copy method. Not the const one.

I sure can, with the original names (BytesEncode, EItem, bytes_encode)?

Yup, would be perfect.

antonilol · 2024-08-17T18:29:44Z

No, with the zero_copy method. Not the const one.

Do you maybe mean the encode_writer method? Right now I have BytesEncode::bytes_encode_into_writer which is like it, it uses &mut Vec<u8> instead of &mut impl Write ~~(for now because of error handling)~~. I see no zero_copy method in that linked issue.

antonilol · 2024-08-18T18:06:28Z

In a5c96f1 I used Either (a generic 2 value enum) to return different error types without using dynamic dispatch (dyn StdError), this could also solve the issue in #166 with the function that decodes 2 different types and thus needs to handle 2 different errors.

A new Either like 2 value enum instead of Either will work too if the dependency is an issue.

Kerollmops · 2024-08-19T11:24:56Z

In a5c96f1 I used Either (a generic 2 value enum) to return different error types without using dynamic dispatch (dyn StdError), this could also solve the issue in #166 with the function that decodes 2 different types and thus needs to handle 2 different errors.

A new Either like 2 value enum instead of Either will work too if the dependency is an issue.

Thank you for that, but I would prefer we keep these PR changes only for trait improvement. So that it's easier to review as less code changes. I'm also not a huge fan of returning a Result<Cow<[u8]>, Either<Self::Error, io::Error>>, but let's not discuss this here but on #165 (or #166).

Do you maybe mean the encode_writer method? Right now I have BytesEncode::bytes_encode_into_writer which is like it, it uses &mut Vec<u8> instead of &mut impl Write ~~(for now because of error handling)~~. I see no zero_copy method in that linked issue.

I was talking about the zero_copy method I am talking about in my other comment.

antonilol · 2024-08-20T16:38:29Z

Thank you for that, but I would prefer we keep these PR changes only for trait improvement. So that it's easier to review as less code changes. I'm also not a huge fan of returning a Result<Cow<[u8]>, Either<Self::Error, io::Error>>, but let's not discuss this here but on #165 (or #166).

When accepting std::io::Write implementation to write to errors need to be handled (returned) in some way. I can revert it back to using &mut Vec<u8> instead, for which extend_from_slice can be used (will never return errors) instead of Write::write_all, or leave out bytes_encode_into_writer completely.

I was talking about the zero_copy method I am talking about in my other comment.

Do you mean dynamically choosing to borrow or return an owned type? This is still possible: to implement the trait the return value (ReturnBytes) need to implement Into<Vec<u8>> and AsRef<[u8]>, which Cow<[u8]> does. No functionality is lost with this new trait.

Kerollmops · 2024-08-20T18:25:29Z

When accepting std::io::Write implementation to write to errors need to be handled (returned) in some way. I can revert it back to using &mut Vec<u8> instead, for which extend_from_slice can be used (will never return errors) instead of Write::write_all, or leave out bytes_encode_into_writer completely.

Can you Box the error for now? We will figure that out later when we have a good trait.

Do you mean dynamically choosing to borrow or return an owned type? This is still possible: to implement the trait the return value (ReturnBytes) need to implement Into<Vec<u8>> and AsRef<[u8]>, which Cow<[u8]> does. No functionality is lost with this new trait.

In issue #257, we are talking about a BytesEncode::ZERO_COPY associated const type that is a boolean, allowing optimizations by avoiding allocating a vector. However, I proposed changing this const to a method instead BytesEncode::zero_copy so that we can dynamically decide whether this value can be used or must be serialized into a buffer.

antonilol · 2024-08-20T21:11:14Z

Can you Box the error for now? We will figure that out later when we have a good trait.

Sure, done

In issue #257, we are talking about a BytesEncode::ZERO_COPY associated const type that is a boolean, allowing optimizations by avoiding allocating a vector. However, I proposed changing this const to a method instead BytesEncode::zero_copy so that we can dynamically decide whether this value can be used or must be serialized into a buffer.

Now I get it, thanks for the patience in explaining it :), added the method to the trait

- allow any type that holds bytes instead of just `Cow<'_, [u8]>` - allow any type that holds an error instead of just `BoxedError` - add hint to use writer instead of returning bytes (zero_copy method), retuns BoxedError for now - hint to using writers for serde serialization types

antonilol · 2025-07-14T15:46:00Z

BytesEncode::size_hint should also be needed as lmdb wants to know the size before giving the reserved space (right?).
I assume then that the writer method will only be used when zero_copy outputs false, and size_hint outputs Some(_). The same can be achieved by combining the 2 into one method, that outputs None when bytes_encode does not copies or a size hint can't be computed, and Some(size_hint) when bytes_encode does copies and a size hint can be computed.
This can save some double work needed for certain types that need the same computation for zero_copy and size_hint.

Should I add this, and if yes, how should that function be called? Thinking about writer_size_hint, can also be just size_hint but with clear docs on returning None for cheap-to-encode types, etc.
Instead of Option<usize> some custom type can also be used that is more clear what it does, like

pub enum EncodingHint {
    PreferWriter { size_hint: usize },
    PreferBytesEncode,
}

Let me know what you think about this!

antonilol force-pushed the minimal-alloc-encoding branch from 4c13edb to 98c1031 Compare August 17, 2024 14:07

antonilol force-pushed the minimal-alloc-encoding branch 3 times, most recently from bee195f to a5c96f1 Compare August 18, 2024 17:57

antonilol mentioned this pull request Aug 20, 2024

Make the error returned by the BytesDecode/BytesEncode trait customizable #166

Draft

antonilol force-pushed the minimal-alloc-encoding branch from 4536a2f to 3519c51 Compare August 20, 2024 21:23

antonilol force-pushed the minimal-alloc-encoding branch 3 times, most recently from 216f028 to 4bb0dfe Compare July 13, 2025 18:09

antonilol added 2 commits July 13, 2025 20:12

ignore recently added clippy lint for now

3bdb9ac

antonilol force-pushed the minimal-alloc-encoding branch from 4bb0dfe to 3bdb9ac Compare July 13, 2025 18:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add new trait to encode values to bytes #274

add new trait to encode values to bytes #274

Uh oh!

antonilol commented Aug 17, 2024 •

edited

Loading

Uh oh!

Kerollmops commented Aug 17, 2024

Uh oh!

antonilol commented Aug 17, 2024

Uh oh!

Kerollmops commented Aug 17, 2024

Uh oh!

antonilol commented Aug 17, 2024 •

edited

Loading

Uh oh!

antonilol commented Aug 18, 2024

Uh oh!

Kerollmops commented Aug 19, 2024

Uh oh!

antonilol commented Aug 20, 2024

Uh oh!

Kerollmops commented Aug 20, 2024

Uh oh!

antonilol commented Aug 20, 2024

Uh oh!

antonilol commented Jul 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

add new trait to encode values to bytes #274

Are you sure you want to change the base?

add new trait to encode values to bytes #274

Uh oh!

Conversation

antonilol commented Aug 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request

Related issues

What does this PR do?

TODOs

Uh oh!

Kerollmops commented Aug 17, 2024

Uh oh!

antonilol commented Aug 17, 2024

Uh oh!

Kerollmops commented Aug 17, 2024

Uh oh!

antonilol commented Aug 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

antonilol commented Aug 18, 2024

Uh oh!

Kerollmops commented Aug 19, 2024

Uh oh!

antonilol commented Aug 20, 2024

Uh oh!

Kerollmops commented Aug 20, 2024

Uh oh!

antonilol commented Aug 20, 2024

Uh oh!

antonilol commented Jul 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

antonilol commented Aug 17, 2024 •

edited

Loading

antonilol commented Aug 17, 2024 •

edited

Loading