Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate generating models as regular dart classes and not extension types #136

Open
jakemac53 opened this issue Nov 7, 2024 · 7 comments

Comments

@jakemac53
Copy link
Contributor

With the new query invalidation approach, the vast majority of objects are actually never sent over the wire. Yet, we are paying a large cost during creation to make them "free" to serialize and deserialize. All accesses to all fields are also relatively slow compared to just accessing regular fields on dart objects.

I suspect that we actually should abandon the extension types and just generate real classes, so that we can generate hash codes much more cheaply. We can still use the binary JSON format though.

A side benefit of this is it would likely make it easier for us to control a centralized implementation of the query and filtering logic. We can make these types implementable via lazy getters, and only read the fields that were asked for in the query during serialization.

cc @davidmorgan

@davidmorgan
Copy link
Contributor

Hmmm I've actually been thinking the reverse, that extension types might be a win for performance in other places too.

This is mostly due to pkgs/dart_model/benchmark which appears to show that it's faster to use a JsonBufferBuilder than SDK maps even if you are converting to text in the end

       SdkMapsJsonWireBenchmark: 1622ms, 7177227 bytes
   BuilderMapsJsonWireBenchmark: 1345ms, 7177227 bytes

and that it's faster to iterate over the JsonBufferBuilder maps than SDK maps, too

       ProcessSdkMapsJsonWireBenchmark:  417ms, hash 23186292
ProcessBuilderMapsBuilderWireBenchmark:  339ms, hash 23186292

Worth noting though that in both cases it's faster still to use the data directly without storing it anywhere in between

           LazyMapsJsonWireBenchmark: 1014ms, 7177227 bytes
  ProcessLazyMapsBufferWireBenchmark:  274ms, hash 23186292

--the equivalent for computing hash codes would be that we have a mode for building a Model which doesn't store anything, it just hashes. We could then rerun and actually keep the model data only if the hash changed.

An entirely different avenue for exploration is the overlap between models, it may be that we can arrange that duplicate pieces are only built and hashed once: something like, we build one host-side model with hashes and pull parts out of it to answer queries. If we do that we can easily just compute the hash first before going back to also grab the data if needed.

I could be wrong about any/all this, good thing it's measurable :)

@jakemac53
Copy link
Contributor Author

jakemac53 commented Nov 8, 2024

faster to use a JsonBufferBuilder than SDK maps even if you are converting to text in the end

I am not suggesting using SDK maps. I am suggesting using real Dart classes with fields, and separate logic to serialize/deserialize them to whatever format. Essentially, what the current macro implementation does.

@davidmorgan
Copy link
Contributor

Yes, it's not directly comparable.

Real classes might be worth trying.

But neither extension types nor real classes can be as fast as computing the hash code without allocating anything, or re-using hash codes calculated earlier. So it might not be the first thing to try, depending on how much work they all are :)

@jakemac53
Copy link
Contributor Author

jakemac53 commented Nov 8, 2024

So would the idea be to basically have a write-only map implementation that we use to back the model when doing query invalidation? And it doesn't actually write to a map, just accumulates a hash?

@jakemac53
Copy link
Contributor Author

jakemac53 commented Nov 8, 2024

and that it's faster to iterate over the JsonBufferBuilder maps than SDK maps, too

Fwiw I noticed this is using entries to iterate which is cheating a bit so it might be misleading. For example if we merge two models together, that ends up doing a lookup by key for all entries in both maps.

@davidmorgan
Copy link
Contributor

So would the idea be to basically have a write-only map implementation that we use to back the model when doing query invalidation? And it doesn't actually write to a map, just accumulates a hash?

Something like that, yes.

We could also generate an alternative version of all the model that stores nothing, and have a copy+paste of all the processing code that imports that instead :)

Or we could structure the model creation code in a pluggable way as discussed previously w.r.t. ways to create the data with only properties that have been queried for.

Fwiw I noticed this is using entries to iterate which is cheating a bit so it might be misleading. For example if we merge two models together, that ends up doing a lookup by key for all entries in both maps.

Yes, it's a "best case" benchmark, not necessarily realistic :)

@jakemac53
Copy link
Contributor Author

jakemac53 commented Nov 8, 2024

Yes, it's a "best case" benchmark, not necessarily realistic :)

Note that entries for SDK maps is also the slowest way to iterate them. Just for fun I edited the benchmark to iterate the keys and look them up, which shows just how big the cliff can be here:

ProcessSdkMapsJsonWireBenchmark:  355ms, hash 23186292
ProcessLazyMapsBufferWireBenchmark: 36195ms, hash 23186292
ProcessBuilderMapsBuilderWireBenchmark: 57194ms, hash 23186292

I believe this benchmark is also an extreme worst case, with many entries, but still 🤣 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants