Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Serialization of custom types in query params #613

Open
zolstein opened this issue Dec 14, 2024 · 3 comments
Open

[Feature Request] Serialization of custom types in query params #613

zolstein opened this issue Dec 14, 2024 · 3 comments

Comments

@zolstein
Copy link

zolstein commented Dec 14, 2024

Currently, the driver only supports query parameters of the following types:

  • Primitives
  • A handful of struct types which map to specific DB types
  • Pointers to a supported parameter type
  • Slices with elements of a supported parameter type
  • Maps with string keys and values of a supported parameter type
  • Interfaces, so long as the concrete values are a supported parameter type

It would be useful to be able to define a way to serialize a custom type, then pass values of that type in as part of the parameters set and have the serialization happen directly. (Likely, this would most-commonly be serializing custom struct types as maps, though other cases could exist.) Currently, the only alternative is to convert values of the custom types into supported types, then construct a parameter set using only these values. However, this process is relatively slow, memory inefficient, and causes a large number of allocations, especially when doing batch data insert operations.

I propose making the following changes to the API:

  • Move packstream.Packer, or an interface with an appropriate subset of its methods, into the public API.
  • Do one of the following:
    • Define an interface with a single method to serialize a type - if a value conforming to this interface is found during parameter serialization, serialize it using this method.
      • Ex:
      type Neo4jSerializer interface {
          SerializeNeo4j(*packstream.Packer) error
      }
      
    • Provide a way to register a function to serialize values of a particular type - during serialization, if a value is encountered for which a function has been registered, serialize it using that function.
      • Ex:
      type SerializeFunc[T any] func(*packstream.Packer, T)
      func RegisterSerializer[T any](SerializeFunc[T])
      
      • It may make sense for the registration to be global, or to be defined on the driver.

As far as I can tell, neither option should have backward compatibility concerns. The latter would not break anything because unless custom functions are registered it would behave the same. The former would only change the behavior if a conflicting method was already defined on a type in client code, but because the function uses a type that's not currently in the public API this is not possible.

The former option is simpler but more limiting in several ways, whereas the latter has a slightly more complicated API (and likely a slightly more complicated implementation) but is more flexible in the way that serialization could be set up. Either should be implementable with relatively low overhead on top of the existing serialization code - I'm not sure if one would be faster than the other.

One issue that can't be fixed without at least technically making a backward-incompatible change is the fact that the API always expects query parameters to be passed in as a map[string]any. This is awkward, since it means the outermost map cannot be created through custom serialization. However, it's probably not worth trying to change this, at least until the next breaking API version. At least from a performance standpoint the impact is minimal - in most cases, most of the impact will come from lists of sub-maps rather than large outermost maps, and the problem can be worked around relatively easily. Additionally, leaving it as-is sidesteps the problem of dealing with a root parameter type that doesn't serialize to a map.

A tangential consideration is that it could be useful to allow clients to define serialization mappings for structs using tags. However, it would be hard to write efficient code to handle the reflective mapping, and this would only support specific ways of mapping structs, so I would argue that defining fully custom logic is useful regardless.

@zolstein
Copy link
Author

I posted a draft PR with a potential implementation (using the interface option) here.

@robsdedude
Copy link
Member

robsdedude commented Dec 17, 2024

Hi and thank you very much for reaching out and even taking the time to draft a PR. I'm currently quite busy, so I'll leave you with just bullet points of my thoughts. If some of this is too vague or shortened please ask.

  • generally nice idea that has been floated before (in other drivers and within the team)
  • making it abstract enough to also work with HTTP and other protocols we might want to introduce: not sure
  • needs to be able to handle addition of new types in the future (without breaking changes)
  • needs to be able to handle different bolt versions encoding types differently
  • packstream is very low-level and making the API safe and easy to use will be very difficult. Example: user announces dict header size 5, but only sends 4 key-value pairs, or non-string keys, or 9 values (the next value would go into the map as value, even though it wasn't meant to). Similarly with structs and lists.

The last 2 points are my biggest concerns. Making this API public in a way that it's flexible enough for us to change the bolt protocol in ways that can't be foreseen (other drivers don't expose such low-level API and can thus abstract away many things in non-breaking ways when bolt changes) while not making this API the biggest footgun seems very difficult.

@zolstein
Copy link
Author

These are fair points. You make a compelling case that the right API for this should probably be somewhat higher-level than the Packer. (Although, I would hope not too much higher level - ideally it wouldn't introduce any abstraction that would cause too much extra overhead.)

Re: "needs to be able to handle different bolt versions encoding types differently"

I ran into this issue in the prospective PR with time.Time. It was easy enough to solve by moving the "useUtc" config into the packer, though in general that solution might not scale well to too many variations and it does add complexity to the Packer. I can see how it might make sense to keep the Packer simple and build configureable logic into a higher layer. Another possibility is that different types could provide different implementations of the Packer interface (probably with a different name) for different bolt versions - it seemed like a lot of extra overhead for just useUtc, but in general it could be a more robust solution.

Re: "packstream is very low-level and making the API safe and easy to use will be very difficult."

This is also fair. One idea that jumped to mind is the possibility of a packstream-like API, with some of the following additions:

  • The option of passing in an any value and falling back to reflective processing.
  • Using a state-machine to track the status of open maps / lists, so that the serializer can check that the structure is well-formed at the end of processing.

There is a tradeoff, though, between trying to make the API safe and making it as efficient as possible. On some level, it makes sense to expose an API for the fastest possible serialization, even if it is possible to misuse, so long as people aren't expected to use it.

Perhaps another option that could sidestep a lot of this design would be to allow the client to wholesale replace the serialization code using a very general interface, then leave it to them to provide the implementation with the best tradeoff of safety and performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants