From 1cd95aabf7f78163a02ffb4d1f8fe1ba298bedd3 Mon Sep 17 00:00:00 2001 From: Dazhong Xia Date: Mon, 27 Nov 2023 17:46:05 -0500 Subject: [PATCH] Rewrite CONTRIBUTING.rst * call out specific common cases * pull dev-specific CONTRIBUTING.rst into top-level * add 'new dataset' issue template * hopefully make PR template less overwhelming --- .github/ISSUE_TEMPLATE/new_dataset.md | 21 +++ .github/pull_request_template.md | 53 ++---- CONTRIBUTING.rst | 79 +++++++++ docs/CONTRIBUTING.rst | 153 ++++++------------ .../resources/ferc1_eia_record_linkage.py | 4 +- 5 files changed, 168 insertions(+), 142 deletions(-) create mode 100644 .github/ISSUE_TEMPLATE/new_dataset.md create mode 100644 CONTRIBUTING.rst diff --git a/.github/ISSUE_TEMPLATE/new_dataset.md b/.github/ISSUE_TEMPLATE/new_dataset.md new file mode 100644 index 0000000000..267c23b2bb --- /dev/null +++ b/.github/ISSUE_TEMPLATE/new_dataset.md @@ -0,0 +1,21 @@ +--- +name: New dataset +about: Provide information about a new dataset you'd like to see in PUDL +title: '' +labels: new-dataset +assignees: '' +--- + +### Overview + +What is this dataset? Why do you want it in PUDL? Is it already partially in +PUDL, or do we need to start from scratch? + +### Where is it? + +Is this dataset publically available? Where can one download the actual data? + +### What do you know about it so far? + +What have you done with this dataset so far? Have you run into any problems with +it yet? diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md index 325f1bcb8a..788386f562 100644 --- a/.github/pull_request_template.md +++ b/.github/pull_request_template.md @@ -1,49 +1,26 @@ +# Overview -# PR Overview +Closes #XXXX. - +How did you make sure this worked? How can a reviewer verify this? -# PR Checklist - -- [ ] Merge the most recent version of the branch you are merging into (probably `dev`). -- [ ] All CI checks are passing. [Run tests locally to debug failures](https://catalystcoop-pudl.readthedocs.io/en/latest/dev/testing.html#running-tests-with-tox) -- [ ] Make sure you've included good docstrings. +```[tasklist] +# Remaining work +- [ ] Make sure full ETL runs & `make pytest-integration-full` passes locally - [ ] For major data coverage & analysis changes, [run data validation tests](https://catalystcoop-pudl.readthedocs.io/en/latest/dev/testing.html#data-validation) -- [ ] Include unit tests for new functions and classes. -- [ ] Defensive data quality/sanity checks in analyses & data processing functions. -- [ ] Update the [release notes](https://catalystcoop-pudl.readthedocs.io/en/latest/release_notes.html) and reference reference the PR and related issues. -- [ ] Do your own explanatory review of the PR to help the reviewer understand what's going on and identify issues preemptively. +- [ ] If updating analyses or data processing functions: write data quality checks +- [ ] Update the [release notes](../docs/release_notes.rst): reference the PR and related issues. +- [ ] Review the PR yourself and call out any questions or issues you have +``` + diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst new file mode 100644 index 0000000000..991e977fcd --- /dev/null +++ b/CONTRIBUTING.rst @@ -0,0 +1,79 @@ +-------------------- +Contributing to PUDL +-------------------- + +Welcome! We're so glad you're interested in contributing to PUDL! We would love +some help making PUDL data as complete as possible. + +.. _after-intro: + +.. IMPORTANT:: Already have a dataset in mind? + + If you **need data that's not in PUDL** that we're missing in PUDL, + `open an issue `__. + + If you've **already written some code to wrangle a dataset**, find us at + `office hours `__ and we + can talk through next steps for how to get that into PUDL. + + .. + If you **want to use PUDL tools to explore a dataset we don't have yet**, + try using our example Kaggle notebook! + + +Your first contribution +----------------------- + +**Setup** + +You'll need to fork this repository and get the +`dev environment set up `__. + +**Pick an issue** + +* Look for issues with the `good first issue + `__ + tag. These are issues that don't require a ton of PUDL-specific context, and + are relatively tightly scoped to boot. + +* Comment on the issue and tag ``@com-dev`` (our Community Development Team) to + let us know you're working on it. Feel free to ask any questions you might + have! + +* Once you have an idea of how you want to tackle this issue, write out your + plan so we can guide you around obstacles in your way. + +**Work on it!** + +* Make a branch on your fork and open a draft PR early so we can discuss + concrete code! Please don't wait until it's all polished up - it's much easier + for us to help you when we can see the code evolve over time. + +* Please make sure to write tests and documentation for your code - if you run + into trouble with writing tests, let us know in the comments and we can help! + +* Please try to keep your changes relatively small: stuff happens, and one's + bandwidth for volunteer work can fluctuate frequently. If you make a bunch of + small changes, it's much easier to pause on a project without losing a ton of + context. + +**Get it merged in!** + +* Turn the draft PR into a normal PR and ping ``@com-dev``. We'll try to get + back to you within a few days. + +Next contributions +------------------ + +Hooray! You made your first contribution! To find another issue to tackle, check +out the `Community Kanban board +`__ where +we've picked out some issues that are + +* useful to work on + +* unlikely to become super time-sensitive + +* have some context, success criteria, and next steps information. + +Pick one of these and follow the contribution flow above! diff --git a/docs/CONTRIBUTING.rst b/docs/CONTRIBUTING.rst index 616e3d0149..b41316c5bd 100644 --- a/docs/CONTRIBUTING.rst +++ b/docs/CONTRIBUTING.rst @@ -2,125 +2,74 @@ Contributing to PUDL =============================================================================== + Welcome! We're excited that you're interested in contributing to the Public Utility -Data Liberation effort! The work is currently being coordinated by the members of the -`Catalyst Cooperative `__. PUDL is meant to serve a wide -variety of public interests including academic research, climate advocacy, data -journalism, and public policy making. This open source project has been supported by -a combination of volunteer contributions, grant funding from the `Alfred P. Sloan -Foundation `__, and reinvestment of net income from the -cooperative's client projects. +Data Liberation effort! + +If you're interested in contributing directly to the PUDL database, see +:ref:`direct-contribs`. + +It can also be very helpful to provide :ref:`user-feedback`, or +help :ref:`connect-orgs` that we can work with. + +--------------- +Code of Conduct +--------------- Please make sure you review our :doc:`code of conduct `, which is based on the `Contributor Covenant `__. We want to make the PUDL project welcoming to contributors with different levels of experience and diverse personal backgrounds. -------------------------------------------------------------------------------- -How to Get Involved -------------------------------------------------------------------------------- - -There are several areas in which we would welcome your help! Many of these -require a GitHub account, since that is where we manage the project. `Signing -up for a GitHub account `__ (even if you don't intend -to write code) will allow you to participate in online discussions and track -projects that you're interested in. - -First is *user feedback* - if you use PUDL, we would love to talk to you and -understand what your use cases and problems are. This helps us steer the -project towards greater usefulness! Here are some avenues to get in touch: - -* If you need help, someone else might need it too - ask for help in `Github - Discussions - `__ - and maybe the ensuing discussion will be useful to other people too! -* Suggest new features, dataset integrations, structural changes, or just give - us feedback on overall usability using `GitHub Discussions - `__. -* If something went wrong, `file a bug report - `__ - on Github. - -* Help us plan the future of PUDL by telling us what you're using it for! - hello@catalyst.coop works great to get in touch. - -Second is *networking/growth* - for PUDL to be a go-to source of public -information about the US energy system, and help advocates with the clean -energy transition, we need to grow our community and business. Here's how you -can help: +.. _direct-contribs: -* Cite PUDL using `DOIs from Zenodo - `__ if you use the - software or data in your own published work. -* Point us toward appropriate grant funding opportunities and meetings where - we might present our work. -* Point us at interesting publications related to open energy data, open source - energy system modeling, how energy policy can be affected by better data, or - open source tools we should check out. -* Share your Jupyter notebooks and other analyses that use PUDL. -* `Hire Catalyst `__ to do analysis for - your organization using the PUDL data -- contract work helps us self-fund - ongoing open source development. -* And of course... we also appreciate `financial contributions - `__. +-------------------- +Direct contributions +-------------------- -Third is *direct contributions to the technical system* - code and -documentation! This is the most hands-on, in-the-weeds way to contribute, and -obviously helps us make the whole system more capable! +.. include:: ../CONTRIBUTING.rst + :start-after: after-intro: -* Check out the `Code contribution process`_ section below for a process - overview. -* See the :doc:`developer setup ` for technical details -* We also welcome documentation updates, which follow the general code - contribution process! +.. _user-feedback: +------------- +User feedback +------------- -------------------------------------------------------------------------------- -Code contribution process -------------------------------------------------------------------------------- +PUDL's goal is to help people use data to make change in the US energy landscape. -Our goals for you are: +As such, it's critical that we understand our users' needs! -* contribute to something important -* not accidentally end up on the critical path for a time-sensitive task and - end up working a second shift to finish something -* not flounder in a sea of high-context tasks +We'd love to hear about: -To support this, we've set up a `GitHub Projects view -`__ which we -update on a rolling basis. It includes a handful of tasks that are: +* what data you're looking for that we don't have +* what you're trying to do with PUDL data +* what issues you're running into with data access or interpretation +* any problems you find in our data +* anything you find confusing in our documentation -* important and non-urgent -* clearly scoped -* owned by a Catalyst employee who can be your buddy +`GitHub Discussions `__ +is a great place to do this, but `emailing us `__ +works too! -If you have an idea for some work you'd like to do that's not on the board, you -should absolutely find/create a new issue or post a Github Discussion - then we -can talk about how Catalyst can support that work! +.. _connect-orgs: -We envision a flow like this: +----------------------------------- +Connect us with other organizations +----------------------------------- -1. You go to the GitHub Projects "community" view and poke around at the - backlog until you find something you find interesting. -2. You ask some questions about the scope and we attempt to clarify what needs - doing. -3. If you still want to take the task on, assign the issue to yourself and - you're off to the races! We'll probably bother you for updates occasionally. -4. You put up an early draft PR for feedback. -5. Eventually, you convert the draft to a standard PR, we do a thorough review, - and it gets merged! Go back to #1. +For PUDL to make a bigger impact, we need to find more people who need the data. +Here's how you can help: -Some guidelines: - -* small PRs: we understand that stuff happens, and one's bandwidth for - volunteer work can fluctuate frequently. One way to make that feel a little - better for both the contributor and the project is to ship many small - changes, so there's never a ton of dangling work. - -* early drafts: our system has evolved over several years and can be quite - confusing. Pushing up an early draft PR will help Catalyst members guide you - gently away from pitfalls. - -* write tests and documentation: this is critical for expressing what - the software "should" do, which is helpful both in development and in - maintenance. If you haven't done much of this before, we can help! +* Cite PUDL using `DOIs from Zenodo + `__ if you use the + software or data in your own published work. +* Point us toward appropriate grant funding opportunities and meetings where + we might present our work. +* Point us at interesting publications related to open energy data, open source + energy system modeling, how energy policy can be affected by better data, or + open source tools we should check out. +* Share your Jupyter notebooks and other analyses that use PUDL. +* `Hire Catalyst `__ to do analysis for + your organization using the PUDL data -- contract work helps us self-fund + ongoing open source development. diff --git a/src/pudl/metadata/resources/ferc1_eia_record_linkage.py b/src/pudl/metadata/resources/ferc1_eia_record_linkage.py index e1a5f89032..c60ecedf3f 100644 --- a/src/pudl/metadata/resources/ferc1_eia_record_linkage.py +++ b/src/pudl/metadata/resources/ferc1_eia_record_linkage.py @@ -23,8 +23,8 @@ Because generators are often owned by multiple utilities, another dimension of this plant part table involves generating two records for each owner: one for the portion of the plant part they own and one for the plant part as a whole. The -portion records are labeled in the "ownership_record_type" column as "owned" -and the total records are labeled as "total". +portion records are labeled in the ``ownership_record_type`` column as ``owned`` +and the total records are labeled as ``total``. This table includes A LOT of duplicative information about EIA plants. It is primarily meant for use as an input into the record linkage between FERC1 plants and EIA.""",