-
-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ensure all the corrections get tags and add the begining of a rate base asset #3214
Conversation
the correction corrects the calculations of the parent (operating expense) and its child subcomponents. if we were calculating the expense i would want to include the correction but i don't want it if we are just grabbing the reported value
src/pudl/output/ferc1.py
Outdated
# get the factoid name to grab the right part of the table | ||
xbrl_factoid_name = pudl.transform.ferc1.FERC1_TFR_CLASSES[ | ||
"core_ferc1__yearly_operating_expenses_sched320" | ||
]().params.xbrl_factoid_name | ||
# First grab the cash on hand out of the operating expense table. | ||
# then prep it for concating. Calculate cash on hand & add tags | ||
cash_working_capital = ( | ||
core_ferc1__yearly_operating_expenses_sched320[ | ||
core_ferc1__yearly_operating_expenses_sched320[xbrl_factoid_name] | ||
== "operations_and_maintenance_expenses_electric" | ||
] | ||
.assign( | ||
dollar_value=lambda x: x.dollar_value / 8, | ||
xbrl_factoid="cash_on_hand", # newly definied (do we need to add it anywhere?) | ||
tags_rate_base_category="net_working_capital", | ||
tags_aggregatable_utility_type="electric", | ||
table_name="core_ferc1__yearly_operating_expenses_sched320", | ||
) | ||
.drop(columns=[xbrl_factoid_name]) | ||
# the assets/liabilites both use ending_balance for its main $$ column | ||
.rename(columns={"dollar_value": "ending_balance"}) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jrea-rmi this is the little process to grab the operations_and_maintenance_expenses_electric
line from the operating expense table and convert it into working capital/cash on hand. Before adding it to the in_rate_base
exploded data. Does this look right to you generally? Do the tags look okay & sufficient? I figured this should be in_service
as plant status or have a plant function. but lmk it's obviously easy to add in there as an additional column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this looks right to me generally!
I'd go with xbrl_factoid as "cash_working_capital" rather than "cash_on_hand".
I don't know if it needs plant status or plant function; my initial thought is those don't need to be added. I thought in_service etc. only applied to utility plant assets, and plant function only applied to plants in service. But I could see an expanded definition of those if you think there should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay great! I'll change it to "cash_working_capital". And I think your right on the status and function - or at least that sounds plausible to me. I just wanted to make sure!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is in draft form still so you obviously still plan on changing it. Overall this makes sense to me, I think. I might be totally off base w my understanding, so please correct me!
Also! We should write tests for this behavior 🙂
src/pudl/output/ferc1.py
Outdated
@@ -1191,7 +1191,22 @@ def _out_ferc1__explosion_tags(table_dimensions_ferc1) -> pd.DataFrame: | |||
.reset_index() | |||
.drop(columns=["notes"]) | |||
) | |||
return tags_all | |||
# Add the correction records to the tags with the same tags as the parent | |||
idx = list(NodeId._fields) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we're re-using this list(NodeId._fields)
in a bunch of places, including line 1187, we might as well give it a meaningful name (set_index(idx)
doesn't really tell you more than set_index(adfadfadfa)
).
This could be node_id_fields = list(NodeId._fields)
maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes you are so right idx
is a completely meaningless name. this list(NodeId._fields)
is used 13 (!) times throughout this module. In this context it's really just in index columns or primary key columns of the tags. so maybe tag_idx
? in other contexts its used as the pk/idx of a calculation component or to identify a node in the calculation forest. but in the tag space we aren't really working with networks and so caling them node_id's feels a little wrong.
src/pudl/output/ferc1.py
Outdated
return tags_all | ||
# Add the correction records to the tags with the same tags as the parent | ||
idx = list(NodeId._fields) | ||
correction_index = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is "all un-corrected factoids, indexed by the NodeId fields," right? And we want to generate a bunch of tags for the corrected factoids by:
- look at all the tags we have
- inner merge "tags we have" with "all factoids we know about", using the NodeId fields as a join key + dropping all the random
table_dimensions
columns - add "_correction" to each factoid name & concat it onto all of our tags - tada!
Some questions:
- Do we expect there to be tags for factoids that aren't in the table_dimensions? If not, can we get away with taking every tag + making a
_correction
version of it? - Are there factoids we expect to not have a
_correction
partner? - If we still need to filter out the tags we have that don't correspond to something in
table_dimensions
, should we use something more liketable_dimensions[node_id_fields].join(tags_all)
to make the intent of the code clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm these are all excellent questions and i think your suggestion in your first question is correct. IF what i was trying to do here was actually the right thing to do be doing. I think i could replace all of this stuff in here with corrections = tags_all.assign(xbrl_factoid=lambda x: x.xbrl_factoid + "_correction")
BUT I think what we really need to be doing is saying for a given calculated record, do all of the child component records contain those same tags? if so give the correction record for the parent fact those tags.
I think i was starting to attempt that using the table dims but i don't think that's the right method. hm... i could probably do it with the calculation component table but it might be simpler actually using the tree/network methods.
src/pudl/output/ferc1.py
Outdated
== "operations_and_maintenance_expenses_electric" | ||
] | ||
.assign( | ||
dollar_value=lambda x: x.dollar_value / 8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it could use the vectorized division operator, should we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean just use x.dollar_value.divide(8)
? or do you mean taking this out of the assign altogether and doing cash_working_capital.loc[:,"dollar_value"] = cash_working_capital.dollar_value.divide(8)
?
…lc componets have same tags
Just see if we can get an annotated forest at all right now. TODO: test for tag propagation behavior. Co-authored-by: Christina Gosnell <[email protected]>
hey @jrea-rmi ! do you know if we need to apply any sign changes to any of these input tables before squishing them together in this rate base table? I am mostly asking because the annual sum of both the assets and liabilities are positive. total rate base(i would add a legend but there are almost 200 utilities and so the legend makes the graphs un-readable) Components of Rate base by source tablecode to make the above plotsimport matplotlib.colors as mcolors
import matplotlib.pyplot as plt
from dagster import AssetKey
from pudl.etl import defs if you have a rate base table materialized: out_ferc1__yearly_rate_base = defs.load_asset_value(AssetKey("out_ferc1__yearly_rate_base")) otherwise you can grab this pickled table: annual_rb_by_util = out_ferc1__yearly_rate_base.groupby(["report_year", "utility_id_ferc1"],as_index=False)[["ending_balance"]].sum()
annual_rb_by_util_wide = annual_rb_by_util.pivot(index='report_year', columns='utility_id_ferc1', values='ending_balance')
non_white_colors = [color for color in mcolors.CSS4_COLORS.keys() if "white" not in color]
new_colors = non_white_colors * 4
new_colors = new_colors[0:len(annual_rb_by_util_wide.columns)]
annual_rb_by_util_wide.plot(kind='bar', stacked=True, color=new_colors)
plt.legend([])
plt.title("Annual Sum of Rate Base by Utility")
plt.show() annual_rb_by_util = out_ferc1__yearly_rate_base.groupby(["report_year", "utility_id_ferc1","table_name"],as_index=False)[["ending_balance"]].sum()
annual_rb_by_util = out_ferc1__yearly_rate_base.groupby(["report_year", "utility_id_ferc1","table_name"],as_index=False)[["ending_balance"]].sum()
annual_rb_by_util_wide = annual_rb_by_util.pivot(index=["table_name",'report_year'], columns='utility_id_ferc1', values='ending_balance')
for table in annual_rb_by_util.table_name.unique():
annual_rb_by_util_wide.loc[table].plot(kind='bar', stacked=True, color=new_colors)
plt.legend([])
plt.title(f"Annual Sum of Rate Base by Utility from {table}")
plt.show() notes to investigate
|
yes, this needs sign changes applied to the liabilities side of the balance sheet, everything from that table that's labeled as in rate base is an offset to rate base rather than a positive contribution to it. I am confused by the magnitude of what's in rate base from the liabilities side of the balance sheet - I expect it to be much lower than the assets side. I am also confused by the plant in service table values being so small - I expect those to be larger than accumulated depreciation and the largest component of rate base. And finally the operating expenses table shouldn't have anything in rate base - these expenses would show up in revenue requirement but not in capital cost breakdown. |
src/pudl/output/ferc1.py
Outdated
def _get_tag(annotated_forest, node, tag_name): | ||
return annotated_forest.nodes.get(node, {}).get("tags", {}).get(tag_name, pd.NA) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was just a lil helper function to get the tag or a null because omigosh as you can see it is a lil complicated because of the layered-ness and option for the node to not exist or the tag to not exist etc. I suppose it could also be:
annotated_forest.nodes.get(node, {"tags", {tag_name: pd.NA}})["tags"][tag_name]
return annotated_forest | ||
|
||
|
||
def check_tag_propagation_compared_to_compiled_tags( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i added two check_*
functions here. they are both being called rn in out_ferc1__yearly_rate_base
. really these should be validation tests i believe for both the rate base table and the two exploded tables detailed in the args. but rn these aren't sql tables and word on the street is that we can't pull dagster assets into validation tests so the migration of these checks as validation tests should be post #3310
i could put it into the exploded_table_asset_factory
and just skip it for the income table. thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling them in out_ferc1__yearly_rate_base seems fine, though you could try stuffing them in an asset check in the factory. I think either is fine.
@jrea-rmi thanks for these insights!! this is great info.
Okay, grand i'll change that sign. Is that the only input that needs a sign change?
yea i was also a little surprised by that. I will try to investigate to see if there anything weird there and will report back.
ditto on the will investigate and report back!
the operating expense table values are just the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good! Spent a bit of time messing with a non-recursive solution to rootward propagation, which I think is easier to reason about / maintain going into the future. Let me know what you think!
Also, I have one major question just about what our expected behavior is, plus some typo cleanup.
return annotated_forest | ||
|
||
|
||
def check_tag_propagation_compared_to_compiled_tags( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling them in out_ferc1__yearly_rate_base seems fine, though you could try stuffing them in an asset check in the factory. I think either is fine.
Co-authored-by: Dazhong Xia <[email protected]>
I believe so yes, the only sign change when combining tables into rate base is to flip sign of liabilities side of balance sheet. I'll stay tuned for investigation of the liabilities and plant in service values. And okay, that makes sense on inclusion of operating expenses table for cash working capital component of rate base! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! You had mentioned wanting to coerce a tuple to a NodeId
, though the code works now without coercion - I think it's probably worth changing, but either way works for me.
@cmgosnell @jdangerx If this table isn't ready to be published in the DB, maybe we should give it a |
Overview
Closes #3203.
What problem does this address?
we didn't have a single table that included every component of each utility's rate base. Now we do 😎 . in the process i learned that the correction records were not all getting the same tags as the tags for the calculated values. so i made sure all corrections had the same tags of their parent calc record.
What did you change?
oos
Testing
How did you make sure this worked? How can a reviewer verify this?
To-do list