-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enforce schema of ProductionRuns DataFrame. #35
Conversation
The `ProductionRun` dataclass contains some fields that in turn are dataclasses themselves. When such fields are turned into DataFrame columns, they are flattened such that every field contained in the sub-objects becomes a column in the final DataFrame. Since some of these fields are actually optional, also the field name itself will turn into a column which in practice always is `None`. This PR removes such useless columns from the DataFrame.
CI is failing, but I'm wondering whether this is an improvement. With this patch, we are introducing polymorphic responses for one appliance depending on the timeframe being queried. If you wanted to build an application on top of the SDK, the client / application would need to perform response shape validation on every response. We should rather keep the response schema and thus the data frame schema static. |
For reference: we have found this issue: which makes clear that we need a stable DataFrame schema. I'll turn the PR into draft mode and will tackle it as well. |
Coverage resultsUpdate on 2024-01-31 14:40:51.938011184 +0000 This is the coverage report for commit dd1d066
|
@denizs Please have a look at a03b686. I haven't updated tests nor polished it, this is just a PoC how we can enforce the schema of our data frames. It's based on |
Looks good! Just a nit: When I renamed |
Makes sense, will do. I only wanted to get early validation for the idea because it's quite some code / complexity we have to add. I'll polish everything and request review from you once it's done. Thanks :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thanks for picking it up so quickly.
When I renamed level to path in my head, the code got easier to follow.
Coming from the pandas side of things, level was actually more intuitive for me than path but both works 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
The
ProductionRun
dataclass contains some fields that in turn are dataclasses themselves. When such fields are turned into DataFrame columns, they are flattened such that every field contained in the sub-objects becomes a column in the final DataFrame. Since some of these fields are actually optional, also the field name itself will turn into a column which in practice always isNone
. This PR removes such useless columns from the DataFrame.Moreover, if any of the optional dataclass fields is empty for all production runs, then their sub fields would not become part of the DataFrame. This can be annoying when building services on top of the SDK, because you cannot assume that a column would be part of the DataFrame, thus requiring checks everywhere. Therefore, we now enforce the schema of the returned DataFrame. This means, that irrespective of the underlying data, all nested fields will be always be present as flattened columns in the DataFrame.