Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ensure all the corrections get tags and add the begining of a rate base asset #3214

Merged
merged 27 commits into from
Feb 2, 2024

Conversation

cmgosnell
Copy link
Member

@cmgosnell cmgosnell commented Jan 4, 2024

Overview

Closes #3203.

What problem does this address?
we didn't have a single table that included every component of each utility's rate base. Now we do 😎 . in the process i learned that the correction records were not all getting the same tags as the tags for the calculated values. so i made sure all corrections had the same tags of their parent calc record.

What did you change?

  • ensured the correction records had tags
  • added a new rate base table

oos

  • make this new table a db output (going to wait on this until there is some more validation done)

Testing

How did you make sure this worked? How can a reviewer verify this?

To-do list

Base automatically changed from dev to main January 5, 2024 04:14
the correction corrects the calculations of the parent
(operating expense) and its child subcomponents. if we were calculating
the expense i would want to include the correction but i don't want it if
we are just grabbing the reported value
Comment on lines 2821 to 2842
# get the factoid name to grab the right part of the table
xbrl_factoid_name = pudl.transform.ferc1.FERC1_TFR_CLASSES[
"core_ferc1__yearly_operating_expenses_sched320"
]().params.xbrl_factoid_name
# First grab the cash on hand out of the operating expense table.
# then prep it for concating. Calculate cash on hand & add tags
cash_working_capital = (
core_ferc1__yearly_operating_expenses_sched320[
core_ferc1__yearly_operating_expenses_sched320[xbrl_factoid_name]
== "operations_and_maintenance_expenses_electric"
]
.assign(
dollar_value=lambda x: x.dollar_value / 8,
xbrl_factoid="cash_on_hand", # newly definied (do we need to add it anywhere?)
tags_rate_base_category="net_working_capital",
tags_aggregatable_utility_type="electric",
table_name="core_ferc1__yearly_operating_expenses_sched320",
)
.drop(columns=[xbrl_factoid_name])
# the assets/liabilites both use ending_balance for its main $$ column
.rename(columns={"dollar_value": "ending_balance"})
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrea-rmi this is the little process to grab the operations_and_maintenance_expenses_electric line from the operating expense table and convert it into working capital/cash on hand. Before adding it to the in_rate_base exploded data. Does this look right to you generally? Do the tags look okay & sufficient? I figured this should be in_service as plant status or have a plant function. but lmk it's obviously easy to add in there as an additional column.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this looks right to me generally!

I'd go with xbrl_factoid as "cash_working_capital" rather than "cash_on_hand".

I don't know if it needs plant status or plant function; my initial thought is those don't need to be added. I thought in_service etc. only applied to utility plant assets, and plant function only applied to plants in service. But I could see an expanded definition of those if you think there should be.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay great! I'll change it to "cash_working_capital". And I think your right on the status and function - or at least that sounds plausible to me. I just wanted to make sure!

Copy link
Member

@jdangerx jdangerx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is in draft form still so you obviously still plan on changing it. Overall this makes sense to me, I think. I might be totally off base w my understanding, so please correct me!

Also! We should write tests for this behavior 🙂

@@ -1191,7 +1191,22 @@ def _out_ferc1__explosion_tags(table_dimensions_ferc1) -> pd.DataFrame:
.reset_index()
.drop(columns=["notes"])
)
return tags_all
# Add the correction records to the tags with the same tags as the parent
idx = list(NodeId._fields)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we're re-using this list(NodeId._fields) in a bunch of places, including line 1187, we might as well give it a meaningful name (set_index(idx) doesn't really tell you more than set_index(adfadfadfa)).

This could be node_id_fields = list(NodeId._fields) maybe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes you are so right idx is a completely meaningless name. this list(NodeId._fields) is used 13 (!) times throughout this module. In this context it's really just in index columns or primary key columns of the tags. so maybe tag_idx? in other contexts its used as the pk/idx of a calculation component or to identify a node in the calculation forest. but in the tag space we aren't really working with networks and so caling them node_id's feels a little wrong.

return tags_all
# Add the correction records to the tags with the same tags as the parent
idx = list(NodeId._fields)
correction_index = (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is "all un-corrected factoids, indexed by the NodeId fields," right? And we want to generate a bunch of tags for the corrected factoids by:

  1. look at all the tags we have
  2. inner merge "tags we have" with "all factoids we know about", using the NodeId fields as a join key + dropping all the random table_dimensions columns
  3. add "_correction" to each factoid name & concat it onto all of our tags - tada!

Some questions:

  1. Do we expect there to be tags for factoids that aren't in the table_dimensions? If not, can we get away with taking every tag + making a _correction version of it?
  2. Are there factoids we expect to not have a _correction partner?
  3. If we still need to filter out the tags we have that don't correspond to something in table_dimensions, should we use something more like table_dimensions[node_id_fields].join(tags_all) to make the intent of the code clearer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm these are all excellent questions and i think your suggestion in your first question is correct. IF what i was trying to do here was actually the right thing to do be doing. I think i could replace all of this stuff in here with corrections = tags_all.assign(xbrl_factoid=lambda x: x.xbrl_factoid + "_correction")

BUT I think what we really need to be doing is saying for a given calculated record, do all of the child component records contain those same tags? if so give the correction record for the parent fact those tags.

I think i was starting to attempt that using the table dims but i don't think that's the right method. hm... i could probably do it with the calculation component table but it might be simpler actually using the tree/network methods.

src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
== "operations_and_maintenance_expenses_electric"
]
.assign(
dollar_value=lambda x: x.dollar_value / 8,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it could use the vectorized division operator, should we?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean just use x.dollar_value.divide(8)? or do you mean taking this out of the assign altogether and doing cash_working_capital.loc[:,"dollar_value"] = cash_working_capital.dollar_value.divide(8)?

src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
@cmgosnell
Copy link
Member Author

cmgosnell commented Jan 26, 2024

hey @jrea-rmi ! do you know if we need to apply any sign changes to any of these input tables before squishing them together in this rate base table? I am mostly asking because the annual sum of both the assets and liabilities are positive.

total rate base

(i would add a legend but there are almost 200 utilities and so the legend makes the graphs un-readable)
image

Components of Rate base by source table

image
image
image
image
image
image

code to make the above plots

import matplotlib.colors as mcolors
import matplotlib.pyplot as plt
from dagster import AssetKey
from pudl.etl import defs

if you have a rate base table materialized:

out_ferc1__yearly_rate_base = defs.load_asset_value(AssetKey("out_ferc1__yearly_rate_base"))

otherwise you can grab this pickled table:
out_ferc1__yearly_rate_base.pkl.zip

annual_rb_by_util = out_ferc1__yearly_rate_base.groupby(["report_year", "utility_id_ferc1"],as_index=False)[["ending_balance"]].sum()
annual_rb_by_util_wide = annual_rb_by_util.pivot(index='report_year', columns='utility_id_ferc1', values='ending_balance')

non_white_colors = [color for color in mcolors.CSS4_COLORS.keys() if "white" not in color]
new_colors = non_white_colors * 4
new_colors = new_colors[0:len(annual_rb_by_util_wide.columns)]


annual_rb_by_util_wide.plot(kind='bar', stacked=True, color=new_colors)
plt.legend([])
plt.title("Annual Sum of Rate Base by Utility")
plt.show()
annual_rb_by_util = out_ferc1__yearly_rate_base.groupby(["report_year", "utility_id_ferc1","table_name"],as_index=False)[["ending_balance"]].sum()

annual_rb_by_util = out_ferc1__yearly_rate_base.groupby(["report_year", "utility_id_ferc1","table_name"],as_index=False)[["ending_balance"]].sum()
annual_rb_by_util_wide = annual_rb_by_util.pivot(index=["table_name",'report_year'], columns='utility_id_ferc1', values='ending_balance')
for table in annual_rb_by_util.table_name.unique():
    annual_rb_by_util_wide.loc[table].plot(kind='bar', stacked=True, color=new_colors)
    plt.legend([])
    plt.title(f"Annual Sum of Rate Base by Utility from {table}")
    plt.show()

notes to investigate

- [x] bunch of nothing in assets and liabilities from 2005-2020 #3300
- [ ] why basically nothing in `core_ferc1__yearly_depreciation_by_function_sched219`in 2021??
- [ ] why is 2004 so weird looking?

src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
@jrea-rmi
Copy link
Collaborator

hey @jrea-rmi ! do you know if we need to apply any sign changes to any of these input tables before squishing them together in this rate base table? I am mostly asking because the annual sum of both the assets and liabilities are positive.

total rate base

(i would add a legend but there are almost 200 utilities and so the legend makes the graphs un-readable) image

Components of Rate base by source table

image image image image image image

code to make the above plots

import matplotlib.colors as mcolors
import matplotlib.pyplot as plt
from dagster import AssetKey
from pudl.etl import defs

if you have a rate base table materialized:

out_ferc1__yearly_rate_base = defs.load_asset_value(AssetKey("out_ferc1__yearly_rate_base"))

otherwise you can grab this pickled table: out_ferc1__yearly_rate_base.pkl.zip

annual_rb_by_util = out_ferc1__yearly_rate_base.groupby(["report_year", "utility_id_ferc1"],as_index=False)[["ending_balance"]].sum()

non_white_colors = [color for color in mcolors.CSS4_COLORS.keys() if "white" not in color]
new_colors = non_white_colors * 4
new_colors = new_colors[0:len(annual_rb_by_util_wide.columns)]

annual_rb_by_util_wide = annual_rb_by_util.pivot(index='report_year', columns='utility_id_ferc1', values='ending_balance')
annual_rb_by_util_wide.plot(kind='bar', stacked=True, color=new_colors)
plt.legend([])
plt.title("Annual Sum of Rate Base by Utility")
plt.show()
annual_rb_by_util = out_ferc1__yearly_rate_base.groupby(["report_year", "utility_id_ferc1","table_name"],as_index=False)[["ending_balance"]].sum()

annual_rb_by_util = out_ferc1__yearly_rate_base.groupby(["report_year", "utility_id_ferc1","table_name"],as_index=False)[["ending_balance"]].sum()
annual_rb_by_util_wide = annual_rb_by_util.pivot(index=["table_name",'report_year'], columns='utility_id_ferc1', values='ending_balance')
for table in annual_rb_by_util.table_name.unique():
    annual_rb_by_util_wide.loc[table].plot(kind='bar', stacked=True, color=new_colors)
    plt.legend([])
    plt.title(f"Annual Sum of Rate Base by Utility from {table}")
    plt.show()

notes to investigate

- [x] bunch of nothing in assets and liabilities from 2005-2020 #3300
- [ ] why basically nothing in `core_ferc1__yearly_depreciation_by_function_sched219`in 2021??
- [ ] why is 2004 so weird looking?

yes, this needs sign changes applied to the liabilities side of the balance sheet, everything from that table that's labeled as in rate base is an offset to rate base rather than a positive contribution to it.

I am confused by the magnitude of what's in rate base from the liabilities side of the balance sheet - I expect it to be much lower than the assets side.

I am also confused by the plant in service table values being so small - I expect those to be larger than accumulated depreciation and the largest component of rate base.

And finally the operating expenses table shouldn't have anything in rate base - these expenses would show up in revenue requirement but not in capital cost breakdown.

src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
Comment on lines 2814 to 2815
def _get_tag(annotated_forest, node, tag_name):
return annotated_forest.nodes.get(node, {}).get("tags", {}).get(tag_name, pd.NA)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was just a lil helper function to get the tag or a null because omigosh as you can see it is a lil complicated because of the layered-ness and option for the node to not exist or the tag to not exist etc. I suppose it could also be:

annotated_forest.nodes.get(node, {"tags", {tag_name: pd.NA}})["tags"][tag_name]

return annotated_forest


def check_tag_propagation_compared_to_compiled_tags(
Copy link
Member Author

@cmgosnell cmgosnell Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i added two check_* functions here. they are both being called rn in out_ferc1__yearly_rate_base. really these should be validation tests i believe for both the rate base table and the two exploded tables detailed in the args. but rn these aren't sql tables and word on the street is that we can't pull dagster assets into validation tests so the migration of these checks as validation tests should be post #3310

i could put it into the exploded_table_asset_factory and just skip it for the income table. thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling them in out_ferc1__yearly_rate_base seems fine, though you could try stuffing them in an asset check in the factory. I think either is fine.

@cmgosnell
Copy link
Member Author

@jrea-rmi thanks for these insights!! this is great info.

yes, this needs sign changes applied to the liabilities side of the balance sheet, everything from that table that's labeled as in rate base is an offset to rate base rather than a positive contribution to it.

Okay, grand i'll change that sign. Is that the only input that needs a sign change?

I am confused by the magnitude of what's in rate base from the liabilities side of the balance sheet - I expect it to be much lower than the assets side.

yea i was also a little surprised by that. I will try to investigate to see if there anything weird there and will report back.

I am also confused by the plant in service table values being so small - I expect those to be larger than accumulated depreciation and the largest component of rate base.

ditto on the will investigate and report back!

And finally the operating expenses table shouldn't have anything in rate base - these expenses would show up in revenue requirement but not in capital cost breakdown.

the operating expense table values are just the operations_and_maintenance_expenses_electric / 8 values that got added after the fact for cash_working_capital.

Copy link
Member

@jdangerx jdangerx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good! Spent a bit of time messing with a non-recursive solution to rootward propagation, which I think is easier to reason about / maintain going into the future. Let me know what you think!

Also, I have one major question just about what our expected behavior is, plus some typo cleanup.

test/unit/output/ferc1_test.py Outdated Show resolved Hide resolved
test/unit/output/ferc1_test.py Show resolved Hide resolved
test/unit/output/ferc1_test.py Outdated Show resolved Hide resolved
test/unit/output/ferc1_test.py Outdated Show resolved Hide resolved
test/unit/output/ferc1_test.py Outdated Show resolved Hide resolved
src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
src/pudl/output/ferc1.py Show resolved Hide resolved
src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
return annotated_forest


def check_tag_propagation_compared_to_compiled_tags(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling them in out_ferc1__yearly_rate_base seems fine, though you could try stuffing them in an asset check in the factory. I think either is fine.

src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
@jrea-rmi
Copy link
Collaborator

@jrea-rmi thanks for these insights!! this is great info.

yes, this needs sign changes applied to the liabilities side of the balance sheet, everything from that table that's labeled as in rate base is an offset to rate base rather than a positive contribution to it.

Okay, grand i'll change that sign. Is that the only input that needs a sign change?

I am confused by the magnitude of what's in rate base from the liabilities side of the balance sheet - I expect it to be much lower than the assets side.

yea i was also a little surprised by that. I will try to investigate to see if there anything weird there and will report back.

I am also confused by the plant in service table values being so small - I expect those to be larger than accumulated depreciation and the largest component of rate base.

ditto on the will investigate and report back!

And finally the operating expenses table shouldn't have anything in rate base - these expenses would show up in revenue requirement but not in capital cost breakdown.

the operating expense table values are just the operations_and_maintenance_expenses_electric / 8 values that got added after the fact for cash_working_capital.

I believe so yes, the only sign change when combining tables into rate base is to flip sign of liabilities side of balance sheet.

I'll stay tuned for investigation of the liabilities and plant in service values.

And okay, that makes sense on inclusion of operating expenses table for cash working capital component of rate base!

Copy link
Member

@jdangerx jdangerx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! You had mentioned wanting to coerce a tuple to a NodeId, though the code works now without coercion - I think it's probably worth changing, but either way works for me.

src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
test/unit/output/ferc1_test.py Show resolved Hide resolved
@cmgosnell cmgosnell marked this pull request as ready for review February 2, 2024 19:18
@cmgosnell cmgosnell added this pull request to the merge queue Feb 2, 2024
Merged via the queue into main with commit 7d2b312 Feb 2, 2024
13 checks passed
@cmgosnell cmgosnell deleted the explode-rate-base branch February 2, 2024 21:18
@zaneselvans
Copy link
Member

zaneselvans commented Feb 3, 2024

@cmgosnell @jdangerx If this table isn't ready to be published in the DB, maybe we should give it a _out prefix, so that people aren't confused when they see it in the data dictionary and can't find it in the database? (Like me. I was the confused person.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Make a utility rate base table using exploded assets
4 participants