Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve TableProperties parsing performance #535

Open
zachschuermann opened this issue Nov 25, 2024 · 0 comments
Open

improve TableProperties parsing performance #535

zachschuermann opened this issue Nov 25, 2024 · 0 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@zachschuermann
Copy link
Collaborator

Some context moved from the original PR regarding the match statement to parse TableProperties:

I think you're correct and the rust compiler doesn't (at least in 1.73 stable) implement anything fancy here - according to quick chatGPT inquiry

after doing the simple thing, we could optimize by:

  • leverage custom Deserialize from serde for constant-time lookup (/linear-time loop).
    • pros: well-known efficient solution, elegant annotations on the struct for e.g. renaming the field for deserialization
    • cons: Extra overhead from serde considering we have such a constrained case of only ever deserializing from a HashMap<String, String>, code complexity of custom Deserialize
  • leverage HashMap-based lookup instead of the match for constant-time. could even use a crate like phf to get compile-time generation of perfect hash functions for string matching.
    • pros: relatively well-known efficient solution, straightforward/understandable
    • cons: increasing code size/complexity increase (similar to serde approach above)

ref: #453 (comment)

@zachschuermann zachschuermann added enhancement New feature or request good first issue Good for newcomers labels Nov 25, 2024
zachschuermann added a commit that referenced this issue Nov 25, 2024
## What changes are proposed in this pull request?
New `TableProperties` struct parses the `Metadata` action's
`configuration` into a strongly-typed configuration struct. Instead of
previously manually querying specific keys (e.g. checking ColumnMapping
key) we can leverage the configuration to appropriately handle DVs,
column mapping, etc. in a structured manner.

Pragmatically, the changes are:
1. New `TableProperties` struct along with new
`Snapshot::get_table_properties()` API
a. Note that every field is optional and additionally there is an
`unknown_properties` field which includes a hashmap of all fields that
did not parse to a known table property.
2. `table_properties` module including deserialization submodule to
implement functions to deserialize the fields of `TableProperties`

### Future work
- optimize/test/improve interval parsing:
#507
- introduce a facility to query protocol + table properties to unify
checks on e.g. column mapping mode/dv enablement/etc.
#508
- docs #519 
- #535 

## This PR affects the following public APIs
- New `TableProperties` and `Snapshot::get_table_properties` API

## How was this change tested?
new ut

Credit: @roeap for the original PR which this is based on #222
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant