ensure all the corrections get tags and add the begining of a rate base asset #3214

cmgosnell · 2024-01-04T18:27:32Z

Overview

Closes #3203.

What problem does this address?
we didn't have a single table that included every component of each utility's rate base. Now we do 😎 . in the process i learned that the correction records were not all getting the same tags as the tags for the calculated values. so i made sure all corrections had the same tags of their parent calc record.

What did you change?

ensured the correction records had tags
added a new rate base table

oos

make this new table a db output (going to wait on this until there is some more validation done)

Testing

How did you make sure this worked? How can a reviewer verify this?

To-do list

Give feedback

Review the PR yourself and call out any questions or issues you have
Make sure full ETL runs & make pytest-integration-full passes locally
For major data coverage & analysis changes, run data validation tests
If updating analyses or data processing functions: make sure to update or write data validation tests
Update the release notes: reference the PR and related issues.
Options

…se asset

the correction corrects the calculations of the parent (operating expense) and its child subcomponents. if we were calculating the expense i would want to include the correction but i don't want it if we are just grabbing the reported value

cmgosnell · 2024-01-10T21:50:23Z

src/pudl/output/ferc1.py

+    # get the factoid name to grab the right part of the table
+    xbrl_factoid_name = pudl.transform.ferc1.FERC1_TFR_CLASSES[
+        "core_ferc1__yearly_operating_expenses_sched320"
+    ]().params.xbrl_factoid_name
+    # First grab the cash on hand out of the operating expense table.
+    # then prep it for concating. Calculate cash on hand & add tags
+    cash_working_capital = (
+        core_ferc1__yearly_operating_expenses_sched320[
+            core_ferc1__yearly_operating_expenses_sched320[xbrl_factoid_name]
+            == "operations_and_maintenance_expenses_electric"
+        ]
+        .assign(
+            dollar_value=lambda x: x.dollar_value / 8,
+            xbrl_factoid="cash_on_hand",  # newly definied (do we need to add it anywhere?)
+            tags_rate_base_category="net_working_capital",
+            tags_aggregatable_utility_type="electric",
+            table_name="core_ferc1__yearly_operating_expenses_sched320",
+        )
+        .drop(columns=[xbrl_factoid_name])
+        # the assets/liabilites both use ending_balance for its main $$ column
+        .rename(columns={"dollar_value": "ending_balance"})
+    )


@jrea-rmi this is the little process to grab the operations_and_maintenance_expenses_electric line from the operating expense table and convert it into working capital/cash on hand. Before adding it to the in_rate_base exploded data. Does this look right to you generally? Do the tags look okay & sufficient? I figured this should be in_service as plant status or have a plant function. but lmk it's obviously easy to add in there as an additional column.

Yes this looks right to me generally!

I'd go with xbrl_factoid as "cash_working_capital" rather than "cash_on_hand".

I don't know if it needs plant status or plant function; my initial thought is those don't need to be added. I thought in_service etc. only applied to utility plant assets, and plant function only applied to plants in service. But I could see an expanded definition of those if you think there should be.

okay great! I'll change it to "cash_working_capital". And I think your right on the status and function - or at least that sounds plausible to me. I just wanted to make sure!

jdangerx

I know this is in draft form still so you obviously still plan on changing it. Overall this makes sense to me, I think. I might be totally off base w my understanding, so please correct me!

Also! We should write tests for this behavior 🙂

jdangerx · 2024-01-12T20:15:20Z

src/pudl/output/ferc1.py

@@ -1191,7 +1191,22 @@ def _out_ferc1__explosion_tags(table_dimensions_ferc1) -> pd.DataFrame:
        .reset_index()
        .drop(columns=["notes"])
    )
-    return tags_all
+    # Add the correction records to the tags with the same tags as the parent
+    idx = list(NodeId._fields)


I think if we're re-using this list(NodeId._fields) in a bunch of places, including line 1187, we might as well give it a meaningful name (set_index(idx) doesn't really tell you more than set_index(adfadfadfa)).

This could be node_id_fields = list(NodeId._fields) maybe?

ah yes you are so right idx is a completely meaningless name. this list(NodeId._fields) is used 13 (!) times throughout this module. In this context it's really just in index columns or primary key columns of the tags. so maybe tag_idx? in other contexts its used as the pk/idx of a calculation component or to identify a node in the calculation forest. but in the tag space we aren't really working with networks and so caling them node_id's feels a little wrong.

jdangerx · 2024-01-12T20:28:55Z

src/pudl/output/ferc1.py

-    return tags_all
+    # Add the correction records to the tags with the same tags as the parent
+    idx = list(NodeId._fields)
+    correction_index = (


This is "all un-corrected factoids, indexed by the NodeId fields," right? And we want to generate a bunch of tags for the corrected factoids by:

look at all the tags we have

inner merge "tags we have" with "all factoids we know about", using the NodeId fields as a join key + dropping all the random table_dimensions columns

add "_correction" to each factoid name & concat it onto all of our tags - tada!

Some questions:

Do we expect there to be tags for factoids that aren't in the table_dimensions? If not, can we get away with taking every tag + making a _correction version of it?

Are there factoids we expect to not have a _correction partner?

If we still need to filter out the tags we have that don't correspond to something in table_dimensions, should we use something more like table_dimensions[node_id_fields].join(tags_all) to make the intent of the code clearer?

hmm these are all excellent questions and i think your suggestion in your first question is correct. IF what i was trying to do here was actually the right thing to do be doing. I think i could replace all of this stuff in here with corrections = tags_all.assign(xbrl_factoid=lambda x: x.xbrl_factoid + "_correction")

BUT I think what we really need to be doing is saying for a given calculated record, do all of the child component records contain those same tags? if so give the correction record for the parent fact those tags.

I think i was starting to attempt that using the table dims but i don't think that's the right method. hm... i could probably do it with the calculation component table but it might be simpler actually using the tree/network methods.

src/pudl/output/ferc1.py

jdangerx · 2024-01-12T20:35:38Z

src/pudl/output/ferc1.py

+            == "operations_and_maintenance_expenses_electric"
+        ]
+        .assign(
+            dollar_value=lambda x: x.dollar_value / 8,


This seems like it could use the vectorized division operator, should we?

do you mean just use x.dollar_value.divide(8)? or do you mean taking this out of the assign altogether and doing cash_working_capital.loc[:,"dollar_value"] = cash_working_capital.dollar_value.divide(8)?

src/pudl/package_data/ferc1/xbrl_factoid_plant_status_tags.csv

…lc componets have same tags

src/pudl/output/ferc1.py

Just see if we can get an annotated forest at all right now. TODO: test for tag propagation behavior. Co-authored-by: Christina Gosnell <[email protected]>

…rsive method

cmgosnell · 2024-01-26T18:01:58Z

hey @jrea-rmi ! do you know if we need to apply any sign changes to any of these input tables before squishing them together in this rate base table? I am mostly asking because the annual sum of both the assets and liabilities are positive.

total rate base

(i would add a legend but there are almost 200 utilities and so the legend makes the graphs un-readable)

Components of Rate base by source table

code to make the above plots

import matplotlib.colors as mcolors
import matplotlib.pyplot as plt
from dagster import AssetKey
from pudl.etl import defs

if you have a rate base table materialized:

out_ferc1__yearly_rate_base = defs.load_asset_value(AssetKey("out_ferc1__yearly_rate_base"))

otherwise you can grab this pickled table:
out_ferc1__yearly_rate_base.pkl.zip

annual_rb_by_util = out_ferc1__yearly_rate_base.groupby(["report_year", "utility_id_ferc1"],as_index=False)[["ending_balance"]].sum()
annual_rb_by_util_wide = annual_rb_by_util.pivot(index='report_year', columns='utility_id_ferc1', values='ending_balance')

non_white_colors = [color for color in mcolors.CSS4_COLORS.keys() if "white" not in color]
new_colors = non_white_colors * 4
new_colors = new_colors[0:len(annual_rb_by_util_wide.columns)]


annual_rb_by_util_wide.plot(kind='bar', stacked=True, color=new_colors)
plt.legend([])
plt.title("Annual Sum of Rate Base by Utility")
plt.show()

annual_rb_by_util = out_ferc1__yearly_rate_base.groupby(["report_year", "utility_id_ferc1","table_name"],as_index=False)[["ending_balance"]].sum()

annual_rb_by_util = out_ferc1__yearly_rate_base.groupby(["report_year", "utility_id_ferc1","table_name"],as_index=False)[["ending_balance"]].sum()
annual_rb_by_util_wide = annual_rb_by_util.pivot(index=["table_name",'report_year'], columns='utility_id_ferc1', values='ending_balance')
for table in annual_rb_by_util.table_name.unique():
    annual_rb_by_util_wide.loc[table].plot(kind='bar', stacked=True, color=new_colors)
    plt.legend([])
    plt.title(f"Annual Sum of Rate Base by Utility from {table}")
    plt.show()

notes to investigate

- [x] bunch of nothing in assets and liabilities from 2005-2020 #3300
- [ ] why basically nothing in `core_ferc1__yearly_depreciation_by_function_sched219`in 2021??
- [ ] why is 2004 so weird looking?

src/pudl/output/ferc1.py

test/unit/output/ferc1_test.py

jrea-rmi · 2024-01-29T18:43:02Z

hey @jrea-rmi ! do you know if we need to apply any sign changes to any of these input tables before squishing them together in this rate base table? I am mostly asking because the annual sum of both the assets and liabilities are positive.

total rate base

(i would add a legend but there are almost 200 utilities and so the legend makes the graphs un-readable)

Components of Rate base by source table

code to make the above plots

import matplotlib.colors as mcolors
import matplotlib.pyplot as plt
from dagster import AssetKey
from pudl.etl import defs

if you have a rate base table materialized:

out_ferc1__yearly_rate_base = defs.load_asset_value(AssetKey("out_ferc1__yearly_rate_base"))

otherwise you can grab this pickled table: out_ferc1__yearly_rate_base.pkl.zip

annual_rb_by_util = out_ferc1__yearly_rate_base.groupby(["report_year", "utility_id_ferc1"],as_index=False)[["ending_balance"]].sum()

non_white_colors = [color for color in mcolors.CSS4_COLORS.keys() if "white" not in color]
new_colors = non_white_colors * 4
new_colors = new_colors[0:len(annual_rb_by_util_wide.columns)]

annual_rb_by_util_wide = annual_rb_by_util.pivot(index='report_year', columns='utility_id_ferc1', values='ending_balance')
annual_rb_by_util_wide.plot(kind='bar', stacked=True, color=new_colors)
plt.legend([])
plt.title("Annual Sum of Rate Base by Utility")
plt.show()

annual_rb_by_util = out_ferc1__yearly_rate_base.groupby(["report_year", "utility_id_ferc1","table_name"],as_index=False)[["ending_balance"]].sum()

annual_rb_by_util = out_ferc1__yearly_rate_base.groupby(["report_year", "utility_id_ferc1","table_name"],as_index=False)[["ending_balance"]].sum()
annual_rb_by_util_wide = annual_rb_by_util.pivot(index=["table_name",'report_year'], columns='utility_id_ferc1', values='ending_balance')
for table in annual_rb_by_util.table_name.unique():
    annual_rb_by_util_wide.loc[table].plot(kind='bar', stacked=True, color=new_colors)
    plt.legend([])
    plt.title(f"Annual Sum of Rate Base by Utility from {table}")
    plt.show()

notes to investigate

- [x] bunch of nothing in assets and liabilities from 2005-2020 #3300
- [ ] why basically nothing in `core_ferc1__yearly_depreciation_by_function_sched219`in 2021??
- [ ] why is 2004 so weird looking?

yes, this needs sign changes applied to the liabilities side of the balance sheet, everything from that table that's labeled as in rate base is an offset to rate base rather than a positive contribution to it.

I am confused by the magnitude of what's in rate base from the liabilities side of the balance sheet - I expect it to be much lower than the assets side.

I am also confused by the plant in service table values being so small - I expect those to be larger than accumulated depreciation and the largest component of rate base.

And finally the operating expenses table shouldn't have anything in rate base - these expenses would show up in revenue requirement but not in capital cost breakdown.

src/pudl/output/ferc1.py

cmgosnell · 2024-01-30T16:24:55Z

src/pudl/output/ferc1.py

+    def _get_tag(annotated_forest, node, tag_name):
+        return annotated_forest.nodes.get(node, {}).get("tags", {}).get(tag_name, pd.NA)


this was just a lil helper function to get the tag or a null because omigosh as you can see it is a lil complicated because of the layered-ness and option for the node to not exist or the tag to not exist etc. I suppose it could also be:

annotated_forest.nodes.get(node, {"tags", {tag_name: pd.NA}})["tags"][tag_name]

cmgosnell · 2024-01-30T16:27:55Z

src/pudl/output/ferc1.py

+    return annotated_forest
+
+
+def check_tag_propagation_compared_to_compiled_tags(


i added two check_* functions here. they are both being called rn in out_ferc1__yearly_rate_base. really these should be validation tests i believe for both the rate base table and the two exploded tables detailed in the args. but rn these aren't sql tables and word on the street is that we can't pull dagster assets into validation tests so the migration of these checks as validation tests should be post #3310

i could put it into the exploded_table_asset_factory and just skip it for the income table. thoughts?

Calling them in out_ferc1__yearly_rate_base seems fine, though you could try stuffing them in an asset check in the factory. I think either is fine.

cmgosnell · 2024-01-30T16:33:41Z

@jrea-rmi thanks for these insights!! this is great info.

yes, this needs sign changes applied to the liabilities side of the balance sheet, everything from that table that's labeled as in rate base is an offset to rate base rather than a positive contribution to it.

Okay, grand i'll change that sign. Is that the only input that needs a sign change?

I am confused by the magnitude of what's in rate base from the liabilities side of the balance sheet - I expect it to be much lower than the assets side.

yea i was also a little surprised by that. I will try to investigate to see if there anything weird there and will report back.

I am also confused by the plant in service table values being so small - I expect those to be larger than accumulated depreciation and the largest component of rate base.

ditto on the will investigate and report back!

And finally the operating expenses table shouldn't have anything in rate base - these expenses would show up in revenue requirement but not in capital cost breakdown.

the operating expense table values are just the operations_and_maintenance_expenses_electric / 8 values that got added after the fact for cash_working_capital.

jdangerx

Mostly looks good! Spent a bit of time messing with a non-recursive solution to rootward propagation, which I think is easier to reason about / maintain going into the future. Let me know what you think!

Also, I have one major question just about what our expected behavior is, plus some typo cleanup.

test/unit/output/ferc1_test.py

src/pudl/output/ferc1.py

jdangerx · 2024-01-31T01:19:25Z

src/pudl/output/ferc1.py

+    return annotated_forest
+
+
+def check_tag_propagation_compared_to_compiled_tags(


Calling them in out_ferc1__yearly_rate_base seems fine, though you could try stuffing them in an asset check in the factory. I think either is fine.

src/pudl/output/ferc1.py

Co-authored-by: Dazhong Xia <[email protected]>

jrea-rmi · 2024-01-31T17:30:43Z

@jrea-rmi thanks for these insights!! this is great info.

yes, this needs sign changes applied to the liabilities side of the balance sheet, everything from that table that's labeled as in rate base is an offset to rate base rather than a positive contribution to it.

Okay, grand i'll change that sign. Is that the only input that needs a sign change?

I am confused by the magnitude of what's in rate base from the liabilities side of the balance sheet - I expect it to be much lower than the assets side.

yea i was also a little surprised by that. I will try to investigate to see if there anything weird there and will report back.

I am also confused by the plant in service table values being so small - I expect those to be larger than accumulated depreciation and the largest component of rate base.

ditto on the will investigate and report back!

And finally the operating expenses table shouldn't have anything in rate base - these expenses would show up in revenue requirement but not in capital cost breakdown.

the operating expense table values are just the operations_and_maintenance_expenses_electric / 8 values that got added after the fact for cash_working_capital.

I believe so yes, the only sign change when combining tables into rate base is to flip sign of liabilities side of balance sheet.

I'll stay tuned for investigation of the liabilities and plant in service values.

And okay, that makes sense on inclusion of operating expenses table for cash working capital component of rate base!

jdangerx

Looks good to me! You had mentioned wanting to coerce a tuple to a NodeId, though the code works now without coercion - I think it's probably worth changing, but either way works for me.

src/pudl/output/ferc1.py

test/unit/output/ferc1_test.py

zaneselvans · 2024-02-03T02:17:05Z

@cmgosnell @jdangerx If this table isn't ready to be published in the DB, maybe we should give it a _out prefix, so that people aren't confused when they see it in the data dictionary and can't find it in the database? (Like me. I was the confused person.)

cmgosnell added 2 commits January 4, 2024 13:25

ensure all the corrections get tags and add the begining of a rate ba…

3a9548c

…se asset

Merge branch 'dev' into explode-rate-base

86f8f63

Base automatically changed from dev to main January 5, 2024 04:14

cmgosnell added 2 commits January 10, 2024 09:44

Merge branch 'main' into explode-rate-base

1169f4f

Add in cash on hand as an additional factoid into rate base table

f33aa82

cmgosnell requested a review from jdangerx January 10, 2024 20:48

cmgosnell added 2 commits January 10, 2024 16:35

add documentation for rate base table

ea8301e

remove _correction record from the expense.

24cf1cf

the correction corrects the calculations of the parent (operating expense) and its child subcomponents. if we were calculating the expense i would want to include the correction but i don't want it if we are just grabbing the reported value

cmgosnell commented Jan 10, 2024

View reviewed changes

Merge branch 'main' into explode-rate-base

9a93f63

jdangerx requested changes Jan 12, 2024

View reviewed changes

attempt to associate tags with _correction factoids when all child ca…

6d41c5c

…lc componets have same tags

cmgosnell commented Jan 16, 2024

View reviewed changes

src/pudl/output/ferc1.py Outdated Show resolved Hide resolved

jdangerx and others added 8 commits January 17, 2024 16:21

Add a simple XbrlCalculationForest test.

80ccf5d

Just see if we can get an annotated forest at all right now. TODO: test for tag propagation behavior. Co-authored-by: Christina Gosnell <[email protected]>

WIP: write down some to-dos for test cases.

50615cb

Get leafward propagation working

1ee6c7c

Merge branch 'main' into explode-rate-base

8bc4a96

first pass of adding leafward tags one layer and an attempt at a recu…

9b19f8b

…rsive method

integrate the recursive tag propagation method

d1347c1

Merge branch 'main' into explode-rate-base

a23d87a

remove old correction tagging and standardize unit tests a bit

d1a42b4

cmgosnell added 2 commits January 26, 2024 17:13

remove metadata from forest builder and cleanup unit tests

829757a

Merge branch 'main' into explode-rate-base

d5c2b69

cmgosnell requested a review from jdangerx January 26, 2024 22:35

cmgosnell commented Jan 26, 2024

View reviewed changes

src/pudl/output/ferc1.py Outdated Show resolved Hide resolved

Merge branch 'main' into explode-rate-base

e975331

cmgosnell commented Jan 29, 2024

View reviewed changes

src/pudl/output/ferc1.py Outdated Show resolved Hide resolved

cmgosnell commented Jan 29, 2024

View reviewed changes

test/unit/output/ferc1_test.py Outdated Show resolved Hide resolved

cmgosnell commented Jan 29, 2024

View reviewed changes

test/unit/output/ferc1_test.py Outdated Show resolved Hide resolved

cmgosnell mentioned this pull request Jan 29, 2024

why is basically nothing in the rate base table from core_ferc1__yearly_depreciation_by_function_sched219in 2021? #3309

Closed

cmgosnell added 2 commits January 30, 2024 10:54

add "validation" checks and standardize null tag behavior`

33fa1ef

Merge branch 'main' into explode-rate-base

8341299

cmgosnell commented Jan 30, 2024

View reviewed changes

src/pudl/output/ferc1.py Outdated Show resolved Hide resolved

cmgosnell commented Jan 30, 2024

View reviewed changes

src/pudl/output/ferc1.py Show resolved Hide resolved

cmgosnell commented Jan 30, 2024

View reviewed changes

light cleaning

0f3b654

jdangerx requested changes Jan 31, 2024

View reviewed changes

cmgosnell and others added 3 commits January 31, 2024 07:39

root boose docs!

3e5c2cd

Co-authored-by: Dazhong Xia <[email protected]>

respond to dazhong's comments

b8758dd

Merge branch 'main' into explode-rate-base

d93d46c

jdangerx approved these changes Feb 1, 2024

View reviewed changes

src/pudl/output/ferc1.py Outdated Show resolved Hide resolved

src/pudl/output/ferc1.py Outdated Show resolved Hide resolved

test/unit/output/ferc1_test.py Show resolved Hide resolved

cmgosnell added 2 commits February 2, 2024 09:02

Merge branch 'main' into explode-rate-base

17a5fe4

add a test about pruned nodes and add the NodeId(*n) into the orphans

da8df11

cmgosnell marked this pull request as ready for review February 2, 2024 19:18

cmgosnell added this pull request to the merge queue Feb 2, 2024

Merged via the queue into main with commit 7d2b312 Feb 2, 2024
13 checks passed

cmgosnell deleted the explode-rate-base branch February 2, 2024 21:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ensure all the corrections get tags and add the begining of a rate base asset #3214

ensure all the corrections get tags and add the begining of a rate base asset #3214

cmgosnell commented Jan 4, 2024 •

edited

Loading

To-do list

cmgosnell Jan 10, 2024

jrea-rmi Jan 11, 2024

cmgosnell Jan 11, 2024

jdangerx left a comment

jdangerx Jan 12, 2024

cmgosnell Jan 15, 2024

jdangerx Jan 12, 2024

cmgosnell Jan 15, 2024

jdangerx Jan 12, 2024

cmgosnell Jan 15, 2024

cmgosnell commented Jan 26, 2024 •

edited

Loading

jrea-rmi commented Jan 29, 2024

total rate base

Components of Rate base by source table

code to make the above plots

notes to investigate

cmgosnell Jan 30, 2024

cmgosnell Jan 30, 2024 •

edited

Loading

jdangerx Jan 31, 2024

cmgosnell commented Jan 30, 2024

jdangerx left a comment

jdangerx Jan 31, 2024

jrea-rmi commented Jan 31, 2024

jdangerx left a comment

zaneselvans commented Feb 3, 2024 •

edited

Loading

		def _get_tag(annotated_forest, node, tag_name):
		return annotated_forest.nodes.get(node, {}).get("tags", {}).get(tag_name, pd.NA)

		return annotated_forest


		def check_tag_propagation_compared_to_compiled_tags(

ensure all the corrections get tags and add the begining of a rate base asset #3214

ensure all the corrections get tags and add the begining of a rate base asset #3214

Conversation

cmgosnell commented Jan 4, 2024 • edited Loading

Overview

Testing

To-do list

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jdangerx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cmgosnell commented Jan 26, 2024 • edited Loading

total rate base

Components of Rate base by source table

code to make the above plots

notes to investigate

jrea-rmi commented Jan 29, 2024

total rate base

Components of Rate base by source table

code to make the above plots

notes to investigate

Choose a reason for hiding this comment

cmgosnell Jan 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cmgosnell commented Jan 30, 2024

jdangerx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jrea-rmi commented Jan 31, 2024

jdangerx left a comment

Choose a reason for hiding this comment

zaneselvans commented Feb 3, 2024 • edited Loading

cmgosnell commented Jan 4, 2024 •

edited

Loading

cmgosnell commented Jan 26, 2024 •

edited

Loading

cmgosnell Jan 30, 2024 •

edited

Loading

zaneselvans commented Feb 3, 2024 •

edited

Loading