Skip to content

Conversation

@antonilol
Copy link
Contributor

@antonilol antonilol commented Aug 17, 2024

allow choosing return type of BytesEncode

  • allow any type that holds bytes instead of just Cow<'_, [u8]>
  • allow any type that holds an error instead of just BoxedError
  • add hint to use writer instead of returning bytes (zero_copy method),
    retuns BoxedError for now
  • hint to using writers for serde serialization types

Pull Request

Related issues

#165, #257

What does this PR do?

Introduces a new trait that allows more efficient implementations of encoding values to bytes by allowing users to return any AsRef<[u8]> + Into<Vec<u8>> type, possibly borrowed.

TODOs

  • fix tests
  • fix remaining deprecation warnings (first I only fixed the errors so the code compiles again)
  • fix cookbook
  • FixedSizeBytes: use references or not? values are expected to be small so no reference seems like the best
  • statically dispatchable error values are still converted to dyn std::error::Error (Make the error returned by the BytesDecode/BytesEncode trait customizable #166 uses a generic Error type, seems like the best solution)
  • optimize methods like Database::range where byte types are converted to Vec<u8>, possibly unnecessary because the type maybe already is owned (does not depend on 'a), like [u8; N]?
  • actually use it, in methods like Database::put (see "Example Usage" in Reduce allocations in BytesEncode #257 for an example)

@antonilol antonilol force-pushed the minimal-alloc-encoding branch from 4c13edb to 98c1031 Compare August 17, 2024 14:07
@Kerollmops
Copy link
Member

Hey @antonilol 👋

Thank you again for your help and PR. I am wondering if we could follow a little more #257 and provide a "writer" (maybe start with a Vec<u8>) to experiment. So that we can return either owned or borrowed data but the writer could directly be a buffer provided by LMDB, cf: Database::put_reserved.

Would it be possible to:

@antonilol
Copy link
Contributor Author

I am wondering if we could follow a little more #257 and provide a "writer" (maybe start with a Vec<u8>) to experiment. So that we can return either owned or borrowed data but the writer could directly be a buffer provided by LMDB, cf: Database::put_reserved.

This is certainly possible, what I have in mind right now is a method that by default forwards to to_bytes, but can be optionally implemented if desired.

Would it be possible to:

With ZERO_COPY associated const and size hint function?

  • Directly remove the previous trait (not deprecate it)

I sure can, with the original names (BytesEncode, EItem, bytes_encode)?

@Kerollmops
Copy link
Member

With ZERO_COPY associated const and size hint function?

No, with the zero_copy method. Not the const one.

I sure can, with the original names (BytesEncode, EItem, bytes_encode)?

Yup, would be perfect.

@antonilol
Copy link
Contributor Author

antonilol commented Aug 17, 2024

No, with the zero_copy method. Not the const one.

Do you maybe mean the encode_writer method? Right now I have BytesEncode::bytes_encode_into_writer which is like it, it uses &mut Vec<u8> instead of &mut impl Write (for now because of error handling). I see no zero_copy method in that linked issue.

@antonilol antonilol force-pushed the minimal-alloc-encoding branch 3 times, most recently from bee195f to a5c96f1 Compare August 18, 2024 17:57
@antonilol
Copy link
Contributor Author

In a5c96f1 I used Either (a generic 2 value enum) to return different error types without using dynamic dispatch (dyn StdError), this could also solve the issue in #166 with the function that decodes 2 different types and thus needs to handle 2 different errors.

A new Either like 2 value enum instead of Either will work too if the dependency is an issue.

@Kerollmops
Copy link
Member

In a5c96f1 I used Either (a generic 2 value enum) to return different error types without using dynamic dispatch (dyn StdError), this could also solve the issue in #166 with the function that decodes 2 different types and thus needs to handle 2 different errors.

A new Either like 2 value enum instead of Either will work too if the dependency is an issue.

Thank you for that, but I would prefer we keep these PR changes only for trait improvement. So that it's easier to review as less code changes. I'm also not a huge fan of returning a Result<Cow<[u8]>, Either<Self::Error, io::Error>>, but let's not discuss this here but on #165 (or #166).

Do you maybe mean the encode_writer method? Right now I have BytesEncode::bytes_encode_into_writer which is like it, it uses &mut Vec<u8> instead of &mut impl Write (for now because of error handling). I see no zero_copy method in that linked issue.

I was talking about the zero_copy method I am talking about in my other comment.

@antonilol
Copy link
Contributor Author

Thank you for that, but I would prefer we keep these PR changes only for trait improvement. So that it's easier to review as less code changes. I'm also not a huge fan of returning a Result<Cow<[u8]>, Either<Self::Error, io::Error>>, but let's not discuss this here but on #165 (or #166).

When accepting std::io::Write implementation to write to errors need to be handled (returned) in some way. I can revert it back to using &mut Vec<u8> instead, for which extend_from_slice can be used (will never return errors) instead of Write::write_all, or leave out bytes_encode_into_writer completely.

I was talking about the zero_copy method I am talking about in my other comment.

Do you mean dynamically choosing to borrow or return an owned type? This is still possible: to implement the trait the return value (ReturnBytes) need to implement Into<Vec<u8>> and AsRef<[u8]>, which Cow<[u8]> does. No functionality is lost with this new trait.

@Kerollmops
Copy link
Member

When accepting std::io::Write implementation to write to errors need to be handled (returned) in some way. I can revert it back to using &mut Vec<u8> instead, for which extend_from_slice can be used (will never return errors) instead of Write::write_all, or leave out bytes_encode_into_writer completely.

Can you Box the error for now? We will figure that out later when we have a good trait.

Do you mean dynamically choosing to borrow or return an owned type? This is still possible: to implement the trait the return value (ReturnBytes) need to implement Into<Vec<u8>> and AsRef<[u8]>, which Cow<[u8]> does. No functionality is lost with this new trait.

In issue #257, we are talking about a BytesEncode::ZERO_COPY associated const type that is a boolean, allowing optimizations by avoiding allocating a vector. However, I proposed changing this const to a method instead BytesEncode::zero_copy so that we can dynamically decide whether this value can be used or must be serialized into a buffer.

@antonilol
Copy link
Contributor Author

Can you Box the error for now? We will figure that out later when we have a good trait.

Sure, done

In issue #257, we are talking about a BytesEncode::ZERO_COPY associated const type that is a boolean, allowing optimizations by avoiding allocating a vector. However, I proposed changing this const to a method instead BytesEncode::zero_copy so that we can dynamically decide whether this value can be used or must be serialized into a buffer.

Now I get it, thanks for the patience in explaining it :), added the method to the trait

@antonilol antonilol force-pushed the minimal-alloc-encoding branch from 4536a2f to 3519c51 Compare August 20, 2024 21:23
@antonilol antonilol force-pushed the minimal-alloc-encoding branch 3 times, most recently from 216f028 to 4bb0dfe Compare July 13, 2025 18:09
- allow any type that holds bytes instead of just `Cow<'_, [u8]>`
- allow any type that holds an error instead of just `BoxedError`
- add hint to use writer instead of returning bytes (zero_copy method),
  retuns BoxedError for now
- hint to using writers for serde serialization types
@antonilol antonilol force-pushed the minimal-alloc-encoding branch from 4bb0dfe to 3bdb9ac Compare July 13, 2025 18:16
@antonilol
Copy link
Contributor Author

BytesEncode::size_hint should also be needed as lmdb wants to know the size before giving the reserved space (right?).
I assume then that the writer method will only be used when zero_copy outputs false, and size_hint outputs Some(_). The same can be achieved by combining the 2 into one method, that outputs None when bytes_encode does not copies or a size hint can't be computed, and Some(size_hint) when bytes_encode does copies and a size hint can be computed.
This can save some double work needed for certain types that need the same computation for zero_copy and size_hint.

Should I add this, and if yes, how should that function be called? Thinking about writer_size_hint, can also be just size_hint but with clear docs on returning None for cheap-to-encode types, etc.
Instead of Option<usize> some custom type can also be used that is more clear what it does, like

pub enum EncodingHint {
    PreferWriter { size_hint: usize },
    PreferBytesEncode,
}

Let me know what you think about this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants