Enforce schema of ProductionRuns DataFrame. #35

daniel-k · 2024-01-24T14:17:06Z

The ProductionRun dataclass contains some fields that in turn are dataclasses themselves. When such fields are turned into DataFrame columns, they are flattened such that every field contained in the sub-objects becomes a column in the final DataFrame. Since some of these fields are actually optional, also the field name itself will turn into a column which in practice always is None. This PR removes such useless columns from the DataFrame.

Moreover, if any of the optional dataclass fields is empty for all production runs, then their sub fields would not become part of the DataFrame. This can be annoying when building services on top of the SDK, because you cannot assume that a column would be part of the DataFrame, thus requiring checks everywhere. Therefore, we now enforce the schema of the returned DataFrame. This means, that irrespective of the underlying data, all nested fields will be always be present as flattened columns in the DataFrame.

The `ProductionRun` dataclass contains some fields that in turn are dataclasses themselves. When such fields are turned into DataFrame columns, they are flattened such that every field contained in the sub-objects becomes a column in the final DataFrame. Since some of these fields are actually optional, also the field name itself will turn into a column which in practice always is `None`. This PR removes such useless columns from the DataFrame.

denizs · 2024-01-25T08:50:03Z

CI is failing, but I'm wondering whether this is an improvement. With this patch, we are introducing polymorphic responses for one appliance depending on the timeframe being queried. If you wanted to build an application on top of the SDK, the client / application would need to perform response shape validation on every response. We should rather keep the response schema and thus the data frame schema static.

daniel-k · 2024-01-25T13:09:28Z

For reference: we have found this issue:

https://github.com/enlyze/platform-issues/issues/150

which makes clear that we need a stable DataFrame schema. I'll turn the PR into draft mode and will tackle it as well.

github-actions · 2024-01-25T13:12:17Z

Coverage results

Update on 2024-01-31 14:40:51.938011184 +0000

This is the coverage report for commit dd1d066

Name                                                                                Stmts   Miss  Cover   Missing
-----------------------------------------------------------------------------------------------------------------
.tox/py/lib/python3.12/site-packages/enlyze/__init__.py                                 4      0   100%
.tox/py/lib/python3.12/site-packages/enlyze/api_clients/base.py                        65      0   100%
.tox/py/lib/python3.12/site-packages/enlyze/api_clients/production_runs/client.py      20      0   100%
.tox/py/lib/python3.12/site-packages/enlyze/api_clients/production_runs/models.py      51      0   100%
.tox/py/lib/python3.12/site-packages/enlyze/api_clients/timeseries/client.py           19      0   100%
.tox/py/lib/python3.12/site-packages/enlyze/api_clients/timeseries/models.py           34      0   100%
.tox/py/lib/python3.12/site-packages/enlyze/auth.py                                    13      0   100%
.tox/py/lib/python3.12/site-packages/enlyze/client.py                                  86      0   100%
.tox/py/lib/python3.12/site-packages/enlyze/constants.py                                6      0   100%
.tox/py/lib/python3.12/site-packages/enlyze/errors.py                                   3      0   100%
.tox/py/lib/python3.12/site-packages/enlyze/models.py                                 110      0   100%
.tox/py/lib/python3.12/site-packages/enlyze/schema.py                                  26      0   100%
.tox/py/lib/python3.12/site-packages/enlyze/validators.py                              38      0   100%
-----------------------------------------------------------------------------------------------------------------
TOTAL                                                                                 475      0   100%

4 empty files skipped.

daniel-k · 2024-01-29T15:20:40Z

@denizs Please have a look at a03b686. I haven't updated tests nor polished it, this is just a PoC how we can enforce the schema of our data frames. It's based on typing.get_type_hints(MyDataClass) and assembles a flat schema similar similar to pandas.json_normalize() but since it uses type information, it doesn't depend on every field permutation being present in the data.

denizs · 2024-01-30T09:05:38Z

@denizs Please have a look at a03b686. I haven't updated tests nor polished it, this is just a PoC how we can enforce the schema of our data frames. It's based on typing.get_type_hints(MyDataClass) and assembles a flat schema similar similar to pandas.json_normalize() but since it uses type information, it doesn't depend on every field permutation being present in the data.

Looks good! Just a nit: When I renamed level to path in my head, the code got easier to follow.

daniel-k · 2024-01-30T09:28:58Z

When I renamed level to path in my head, the code got easier to follow.

Makes sense, will do. I only wanted to get early validation for the idea because it's quite some code / complexity we have to add. I'll polish everything and request review from you once it's done. Thanks :)

clehensen

Looks good! Thanks for picking it up so quickly.

When I renamed level to path in my head, the code got easier to follow.

Coming from the pandas side of things, level was actually more intuitive for me than path but both works 👍

src/enlyze/models.py

clehensen

LGTM 🚀

daniel-k mentioned this pull request Jan 24, 2024

Fix PR-comment GitHub Action by using our forked version. #36

Merged

Merge remote-tracking branch 'origin/master' into fix/empty-column-in-df

532b76b

work around stupid windows limitation

5097944

daniel-k marked this pull request as draft January 25, 2024 13:09

PoC stable dataframe schema

a03b686

daniel-k added 5 commits January 31, 2024 11:07

wip old schema code

4628255

simplify and ensure schema

ea1a90a

make flake8 happy

aa18e69

make mypy happy as well

5476537

fix docs

fc46439

daniel-k changed the title ~~Exclude always-NA columns in ProductionRuns DataFrame.~~ Enfore schema of ProductionRuns DataFrame. Jan 31, 2024

daniel-k changed the title ~~Enfore schema of ProductionRuns DataFrame.~~ Enforce schema of ProductionRuns DataFrame. Jan 31, 2024

daniel-k added 2 commits January 31, 2024 14:05

Merge remote-tracking branch 'origin/master' into fix/empty-column-in-df

3191316

remove development leftovers

dd1d066

daniel-k marked this pull request as ready for review January 31, 2024 14:54

daniel-k requested review from denizs and clehensen January 31, 2024 14:54

clehensen reviewed Feb 1, 2024

View reviewed changes

src/enlyze/models.py Show resolved Hide resolved

daniel-k requested a review from clehensen February 1, 2024 09:28

clehensen approved these changes Feb 1, 2024

View reviewed changes

daniel-k merged commit 617ba2d into master Feb 1, 2024
9 checks passed

daniel-k deleted the fix/empty-column-in-df branch February 1, 2024 14:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enforce schema of ProductionRuns DataFrame. #35

Enforce schema of ProductionRuns DataFrame. #35

daniel-k commented Jan 24, 2024 •

edited

Loading

denizs commented Jan 25, 2024 •

edited

Loading

daniel-k commented Jan 25, 2024

github-actions bot commented Jan 25, 2024 •

edited

Loading

daniel-k commented Jan 29, 2024

denizs commented Jan 30, 2024

daniel-k commented Jan 30, 2024

clehensen left a comment

clehensen left a comment

Enforce schema of ProductionRuns DataFrame. #35

Enforce schema of ProductionRuns DataFrame. #35

Conversation

daniel-k commented Jan 24, 2024 • edited Loading

denizs commented Jan 25, 2024 • edited Loading

daniel-k commented Jan 25, 2024

github-actions bot commented Jan 25, 2024 • edited Loading

Coverage results

daniel-k commented Jan 29, 2024

denizs commented Jan 30, 2024

daniel-k commented Jan 30, 2024

clehensen left a comment

Choose a reason for hiding this comment

clehensen left a comment

Choose a reason for hiding this comment

daniel-k commented Jan 24, 2024 •

edited

Loading

denizs commented Jan 25, 2024 •

edited

Loading

github-actions bot commented Jan 25, 2024 •

edited

Loading