Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHMSA transmission part P #3279

Merged
merged 4 commits into from
Jan 24, 2024
Merged

PHMSA transmission part P #3279

merged 4 commits into from
Jan 24, 2024

Conversation

jdangerx
Copy link
Member

@jdangerx jdangerx commented Jan 22, 2024

Overview

Closes #3278.

This maps part P of the PHMSA transmission/gathering pipeline data.

It has state-by-state information about pipeline materials, from 2010 onwards.

I added a column map for yearly_miles_of_pipe_in_state_by_material and added that to the various "where to find this data" CSVs.

I also made the release notes have a nested list of the small PRs so that we don't have a merge conflict on that same line over and over.

Testing

How did you make sure this worked? How can a reviewer verify this?

I ran the debugging notebook and also materialized the PHMSA assets.

To-do list

@e-belfer e-belfer added new-data Requests for integration of new data. phmsa Data from the Pipeline and Hazardous Material Safety Administration labels Jan 23, 2024
@jdangerx jdangerx force-pushed the 3278-phmsa-transmission-part-p branch from 7b0cf8d to 8541273 Compare January 23, 2024 15:43
Copy link
Member Author

@jdangerx jdangerx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question that I need a bit of help with.

Comment on lines 101 to 102
other_materials,,,,,,,,,,,,,,,,,,,,,partpothermaterials,partpothermaterials,partpothermaterials,partpothermaterials,partpothermaterials,partpothermaterials,partpothermaterials,partpothermaterials,partpothermaterials,partpothermaterials,partpothermaterials,partpothermaterials,partpothermaterials
gathering_other_materials,,,,,,,,,,,,,,,,,,,,,partpgothermaterials,partpgothermaterials,partpgothermaterials,partpgothermaterials,partpgothermaterials,partpgothermaterials,partpgothermaterials,partpgothermaterials,partpgothermaterials,partpgothermaterials,partpgothermaterials,partpgothermaterials,partpgothermaterials
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for most of the field names, we have part<letter><t or g or nothing><name> - so I inferred that these were the "total other materials" and "gathering other materials" columns... but I think it sort of makes sense for these to be the "transmission other materials" & "gathering other materials" columns - to show what those other materials might be. I guess partpgothermaterials is pretty unambiguously "gathering", so my question is - do we think we should call this total_other_materials or transmission_other_materials?

Copy link
Member

@cmgosnell cmgosnell Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh that's tough. I think I would go with total_other_materials because it is not specified.... even though with more context we might learn that this is transmission not total.

It looks like these two are outside of the core table in the PDF version of the table. The description is "Composite pipe requires a PHMSA Special Permit or waiver from a State2 specify Other material(s):"
I would definitely add something about composite and special_permit or something to the names of both of these.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the PDF as saying two things:

  1. Composite pipe requires a special permit / waiver
  2. please specify "other materials" - we'll store this as partpothermaterials and partpgothermaterials

In which case we shouldn't have any composite/special_permit in the name.

Part of why I read it that way is that the rest of the form distinguishes between composite & other materials, and another is that there's a little superscript (1) by the Composite column header which seems like it would correspond to the first point.

image image

What do you think?

Copy link
Member

@e-belfer e-belfer Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a heads up that when there's a "specify" field I've been calling it "x_y_description" as a column name. (E.g., "gathering_other_materials_description"). But I think Dazhong's reading of the form makes sense to me. When you look at the actual entries in the raw data what kinds of info are you seeing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yes you are totally right here. good eye. no composite or special_permit!

hm okay well that changes my answer to you were originally asking because now we can check if partpototal shows up when there is any other pipe or just transmission and it looks like there are only values in this partpototal when there is other transmission miles! so i would go with transmission_other_materials

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet, thanks for all the discussion! Going with transmission/gathering_other_materials_description :)

@jdangerx jdangerx marked this pull request as ready for review January 23, 2024 15:58
Copy link

codecov bot commented Jan 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (b565366) 92.7% compared to head (aa31bcb) 92.7%.
Report is 6 commits behind head on main.

Additional details and impacted files
@@          Coverage Diff          @@
##            main   #3279   +/-   ##
=====================================
  Coverage   92.7%   92.7%           
=====================================
  Files        145     145           
  Lines      13100   13100           
=====================================
  Hits       12143   12143           
  Misses       957     957           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@cmgosnell cmgosnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one request to add _miles suffixes and obvis we should settle on the name of the weird not in the table two you brought up but otherwise this looks grand

oh plus i think you need to add the pr into the release notes

@jdangerx jdangerx changed the title Initial pass at column mapping Part P. PHMSA transmission part P Jan 24, 2024
@jdangerx jdangerx force-pushed the 3278-phmsa-transmission-part-p branch from 8541273 to 56e709f Compare January 24, 2024 18:17
Copy link
Member

@cmgosnell cmgosnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@jdangerx jdangerx enabled auto-merge (squash) January 24, 2024 19:26
@jdangerx jdangerx force-pushed the 3278-phmsa-transmission-part-p branch from 172f234 to aa31bcb Compare January 24, 2024 19:26
@jdangerx jdangerx force-pushed the 3278-phmsa-transmission-part-p branch from aa31bcb to 41b632d Compare January 24, 2024 21:19
@jdangerx jdangerx merged commit 351b7e8 into main Jan 24, 2024
12 checks passed
@jdangerx jdangerx deleted the 3278-phmsa-transmission-part-p branch January 24, 2024 22:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new-data Requests for integration of new data. phmsa Data from the Pipeline and Hazardous Material Safety Administration
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Map columns for PHMSA transmission part P
3 participants