Skip to content

Commit

Permalink
Rewrite CONTRIBUTING.rst
Browse files Browse the repository at this point in the history
* call out specific common cases
* pull dev-specific CONTRIBUTING.rst into top-level
* add 'new dataset' issue template
* hopefully make PR template less overwhelming
  • Loading branch information
jdangerx committed Nov 30, 2023
1 parent b2222a0 commit 1cd95aa
Show file tree
Hide file tree
Showing 5 changed files with 168 additions and 142 deletions.
21 changes: 21 additions & 0 deletions .github/ISSUE_TEMPLATE/new_dataset.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
name: New dataset
about: Provide information about a new dataset you'd like to see in PUDL
title: ''
labels: new-dataset
assignees: ''
---

### Overview

What is this dataset? Why do you want it in PUDL? Is it already partially in
PUDL, or do we need to start from scratch?

### Where is it?

Is this dataset publically available? Where can one download the actual data?

### What do you know about it so far?

What have you done with this dataset so far? Have you run into any problems with
it yet?
53 changes: 15 additions & 38 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -1,49 +1,26 @@
<!--
Making a PUDL Pull Request
Before making a PR you may want to check out our:
Resources:
* contributing guidelines: https://catalystcoop-pudl.readthedocs.io/en/latest/CONTRIBUTING.html
* code of conduct: https://catalystcoop-pudl.readthedocs.io/en/latest/code_of_conduct.html
* development process: https://catalystcoop-pudl.readthedocs.io/en/latest/dev/index.html
## PR Process Overview
* PRs have to get an approving review before merging into their development branch.
* Most PRs should be made against the `dev` branch, unless they are part of some larger ongoing refactoring, in which case there will be a persistent development branch for that work.
* It is much easier to do timely code reviews on smaller chunks of code. We try to keep PRs under 500 lines of code.
* Draft PRs are a good way to get early feedback on designs or several incremental commits that will add up to larger changes. If you want a review of a draft PR, make sure you contact the reviewer directly or mention their username in the PR comment, so they get a notification.
* How quickly we can review a PR will depend on how large and complex it is, and how busy we are, but ideally we strive to get an initial review done within a week. If there are going to be delays, we should at least comment on the PR to let you know the situation.
* If you believe you've addressed a reviewer's comments, respond with a brief note and mark the comment resolved. If further discussion is requried respond and do not resolve the comment.
* Before a PR is merged all reviewer comments should be resolved. If a reviewer doesn't feel that their comment has been sufficiently addressed, they may unresolve a comment.
* Be careful not to accidentally "start a review" when responding to comments! If this does happen, don't forget to submit the review you've started so the other PR participatns can see your comments (they are invisible to others if marked "Pending").
* In the period after an initial review when there is significant back-and-forth with the reviewer deciding what changes should actually be made, there should probably be daily interaction. If significant changes are required, it's usually best to request another review after those changes have been made.
Feel free to delete the commented-out parts of the template before submitting the PR.
-->
# Overview

# PR Overview
Closes #XXXX.

<!--
What problem does this address?

Include a short narrative summary of what's going on in the PR. This can be a bulleted list. You might want to include:
What did you change?

* What are you changing and why?
* Are there any known unsolved problems remaining in the PR?
* Is there anything that you want a reivewer to pay particular attention to?
* What kind of feedback are you looking for on the PR?
# Testing

-->
How did you make sure this worked? How can a reviewer verify this?

# PR Checklist

- [ ] Merge the most recent version of the branch you are merging into (probably `dev`).
- [ ] All CI checks are passing. [Run tests locally to debug failures](https://catalystcoop-pudl.readthedocs.io/en/latest/dev/testing.html#running-tests-with-tox)
- [ ] Make sure you've included good docstrings.
```[tasklist]
# Remaining work
- [ ] Make sure full ETL runs & `make pytest-integration-full` passes locally
- [ ] For major data coverage & analysis changes, [run data validation tests](https://catalystcoop-pudl.readthedocs.io/en/latest/dev/testing.html#data-validation)
- [ ] Include unit tests for new functions and classes.
- [ ] Defensive data quality/sanity checks in analyses & data processing functions.
- [ ] Update the [release notes](https://catalystcoop-pudl.readthedocs.io/en/latest/release_notes.html) and reference reference the PR and related issues.
- [ ] Do your own explanatory review of the PR to help the reviewer understand what's going on and identify issues preemptively.
- [ ] If updating analyses or data processing functions: write data quality checks
- [ ] Update the [release notes](../docs/release_notes.rst): reference the PR and related issues.
- [ ] Review the PR yourself and call out any questions or issues you have
```

79 changes: 79 additions & 0 deletions CONTRIBUTING.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
--------------------
Contributing to PUDL
--------------------

Welcome! We're so glad you're interested in contributing to PUDL! We would love
some help making PUDL data as complete as possible.

.. _after-intro:

.. IMPORTANT:: Already have a dataset in mind?

If you **need data that's not in PUDL** that we're missing in PUDL,
`open an issue <https://github.com/catalyst-cooperative/pudl/issues/new/choose>`__.

If you've **already written some code to wrangle a dataset**, find us at
`office hours <https://calend.ly/catalyst-cooperative/pudl-office-hours>`__ and we
can talk through next steps for how to get that into PUDL.

..
If you **want to use PUDL tools to explore a dataset we don't have yet**,
try using our example Kaggle notebook!

Your first contribution
-----------------------

**Setup**

You'll need to fork this repository and get the
`dev environment set up <https://catalystcoop-pudl.readthedocs.io/en/latest/dev/dev_setup.html>`__.

**Pick an issue**

* Look for issues with the `good first issue
<https://github.com/catalyst-cooperative/pudl/issues?q=is%3Aissue+is%3Aopen+label%3Agood-first-issue>`__
tag. These are issues that don't require a ton of PUDL-specific context, and
are relatively tightly scoped to boot.

* Comment on the issue and tag ``@com-dev`` (our Community Development Team) to
let us know you're working on it. Feel free to ask any questions you might
have!

* Once you have an idea of how you want to tackle this issue, write out your
plan so we can guide you around obstacles in your way.

**Work on it!**

* Make a branch on your fork and open a draft PR early so we can discuss
concrete code! Please don't wait until it's all polished up - it's much easier
for us to help you when we can see the code evolve over time.

* Please make sure to write tests and documentation for your code - if you run
into trouble with writing tests, let us know in the comments and we can help!

* Please try to keep your changes relatively small: stuff happens, and one's
bandwidth for volunteer work can fluctuate frequently. If you make a bunch of
small changes, it's much easier to pause on a project without losing a ton of
context.

**Get it merged in!**

* Turn the draft PR into a normal PR and ping ``@com-dev``. We'll try to get
back to you within a few days.

Next contributions
------------------

Hooray! You made your first contribution! To find another issue to tackle, check
out the `Community Kanban board
<https://github.com/orgs/catalyst-cooperative/projects/9/views/19>`__ where
we've picked out some issues that are

* useful to work on

* unlikely to become super time-sensitive

* have some context, success criteria, and next steps information.

Pick one of these and follow the contribution flow above!
153 changes: 51 additions & 102 deletions docs/CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,125 +2,74 @@
Contributing to PUDL
===============================================================================


Welcome! We're excited that you're interested in contributing to the Public Utility
Data Liberation effort! The work is currently being coordinated by the members of the
`Catalyst Cooperative <https://catalyst.coop>`__. PUDL is meant to serve a wide
variety of public interests including academic research, climate advocacy, data
journalism, and public policy making. This open source project has been supported by
a combination of volunteer contributions, grant funding from the `Alfred P. Sloan
Foundation <https://sloan.org>`__, and reinvestment of net income from the
cooperative's client projects.
Data Liberation effort!

If you're interested in contributing directly to the PUDL database, see
:ref:`direct-contribs`.

It can also be very helpful to provide :ref:`user-feedback`, or
help :ref:`connect-orgs` that we can work with.

---------------
Code of Conduct
---------------

Please make sure you review our :doc:`code of conduct <code_of_conduct>`, which is
based on the `Contributor Covenant <https://www.contributor-covenant.org/>`__. We
want to make the PUDL project welcoming to contributors with different levels of
experience and diverse personal backgrounds.

-------------------------------------------------------------------------------
How to Get Involved
-------------------------------------------------------------------------------

There are several areas in which we would welcome your help! Many of these
require a GitHub account, since that is where we manage the project. `Signing
up for a GitHub account <https://github.com/join>`__ (even if you don't intend
to write code) will allow you to participate in online discussions and track
projects that you're interested in.

First is *user feedback* - if you use PUDL, we would love to talk to you and
understand what your use cases and problems are. This helps us steer the
project towards greater usefulness! Here are some avenues to get in touch:

* If you need help, someone else might need it too - ask for help in `Github
Discussions
<https://github.com/orgs/catalyst-cooperative/discussions/categories/help-me>`__
and maybe the ensuing discussion will be useful to other people too!
* Suggest new features, dataset integrations, structural changes, or just give
us feedback on overall usability using `GitHub Discussions
<https://github.com/orgs/catalyst-cooperative/discussions/categories/ideas>`__.
* If something went wrong, `file a bug report
<https://github.com/catalyst-cooperative/pudl/issues/new?template=bug_report.md>`__
on Github.

* Help us plan the future of PUDL by telling us what you're using it for!
[email protected] works great to get in touch.

Second is *networking/growth* - for PUDL to be a go-to source of public
information about the US energy system, and help advocates with the clean
energy transition, we need to grow our community and business. Here's how you
can help:
.. _direct-contribs:

* Cite PUDL using `DOIs from Zenodo
<https://zenodo.org/communities/catalyst-cooperative/>`__ if you use the
software or data in your own published work.
* Point us toward appropriate grant funding opportunities and meetings where
we might present our work.
* Point us at interesting publications related to open energy data, open source
energy system modeling, how energy policy can be affected by better data, or
open source tools we should check out.
* Share your Jupyter notebooks and other analyses that use PUDL.
* `Hire Catalyst <https://catalyst.coop/hire-catalyst/>`__ to do analysis for
your organization using the PUDL data -- contract work helps us self-fund
ongoing open source development.
* And of course... we also appreciate `financial contributions
<https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=PZBZDFNKBJW5E&source=url>`__.
--------------------
Direct contributions
--------------------

Third is *direct contributions to the technical system* - code and
documentation! This is the most hands-on, in-the-weeds way to contribute, and
obviously helps us make the whole system more capable!
.. include:: ../CONTRIBUTING.rst
:start-after: after-intro:

* Check out the `Code contribution process`_ section below for a process
overview.
* See the :doc:`developer setup <dev/dev_setup>` for technical details
* We also welcome documentation updates, which follow the general code
contribution process!
.. _user-feedback:

-------------
User feedback
-------------

-------------------------------------------------------------------------------
Code contribution process
-------------------------------------------------------------------------------
PUDL's goal is to help people use data to make change in the US energy landscape.

Our goals for you are:
As such, it's critical that we understand our users' needs!

* contribute to something important
* not accidentally end up on the critical path for a time-sensitive task and
end up working a second shift to finish something
* not flounder in a sea of high-context tasks
We'd love to hear about:

To support this, we've set up a `GitHub Projects view
<https://github.com/orgs/catalyst-cooperative/projects/9/views/19>`__ which we
update on a rolling basis. It includes a handful of tasks that are:
* what data you're looking for that we don't have
* what you're trying to do with PUDL data
* what issues you're running into with data access or interpretation
* any problems you find in our data
* anything you find confusing in our documentation

* important and non-urgent
* clearly scoped
* owned by a Catalyst employee who can be your buddy
`GitHub Discussions <https://github.com/orgs/catalyst-cooperative/discussions>`__
is a great place to do this, but `emailing us <mailto:[email protected]>`__
works too!

If you have an idea for some work you'd like to do that's not on the board, you
should absolutely find/create a new issue or post a Github Discussion - then we
can talk about how Catalyst can support that work!
.. _connect-orgs:

We envision a flow like this:
-----------------------------------
Connect us with other organizations
-----------------------------------

1. You go to the GitHub Projects "community" view and poke around at the
backlog until you find something you find interesting.
2. You ask some questions about the scope and we attempt to clarify what needs
doing.
3. If you still want to take the task on, assign the issue to yourself and
you're off to the races! We'll probably bother you for updates occasionally.
4. You put up an early draft PR for feedback.
5. Eventually, you convert the draft to a standard PR, we do a thorough review,
and it gets merged! Go back to #1.
For PUDL to make a bigger impact, we need to find more people who need the data.
Here's how you can help:

Some guidelines:

* small PRs: we understand that stuff happens, and one's bandwidth for
volunteer work can fluctuate frequently. One way to make that feel a little
better for both the contributor and the project is to ship many small
changes, so there's never a ton of dangling work.

* early drafts: our system has evolved over several years and can be quite
confusing. Pushing up an early draft PR will help Catalyst members guide you
gently away from pitfalls.

* write tests and documentation: this is critical for expressing what
the software "should" do, which is helpful both in development and in
maintenance. If you haven't done much of this before, we can help!
* Cite PUDL using `DOIs from Zenodo
<https://zenodo.org/communities/catalyst-cooperative/>`__ if you use the
software or data in your own published work.
* Point us toward appropriate grant funding opportunities and meetings where
we might present our work.
* Point us at interesting publications related to open energy data, open source
energy system modeling, how energy policy can be affected by better data, or
open source tools we should check out.
* Share your Jupyter notebooks and other analyses that use PUDL.
* `Hire Catalyst <https://catalyst.coop/hire-catalyst/>`__ to do analysis for
your organization using the PUDL data -- contract work helps us self-fund
ongoing open source development.
4 changes: 2 additions & 2 deletions src/pudl/metadata/resources/ferc1_eia_record_linkage.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@
Because generators are often owned by multiple utilities, another dimension of
this plant part table involves generating two records for each owner: one for the
portion of the plant part they own and one for the plant part as a whole. The
portion records are labeled in the "ownership_record_type" column as "owned"
and the total records are labeled as "total".
portion records are labeled in the ``ownership_record_type`` column as ``owned``
and the total records are labeled as ``total``.
This table includes A LOT of duplicative information about EIA plants. It is primarily
meant for use as an input into the record linkage between FERC1 plants and EIA.""",
Expand Down

0 comments on commit 1cd95aa

Please sign in to comment.