harmonize-wq #157

jbousquin · 2024-02-08T19:43:03Z

Submitting Author: Justin Bousquin (@jbousquin)
All current maintainers: (@jbousquin)
Package Name: harmonize-wq
One-Line Description of Package: Standardize, clean, and wrangle Water Quality Portal data into more analytic-ready formats
Repository Link: https://github.com/USEPA/harmonize-wq
Version submitted: 0.4.0
EiC: @isabelizimm
Editor: @Batalex
Reviewer 1: @rcaneill
Reviewer 2: @Jacqui-123
Archive: https://doi.org/10.5281/zenodo.13356847
JOSS DOI:
Version accepted: 0.5.0
Date accepted (month/day/year): 08/10/2024

Code of Conduct & Commitment to Maintain Package

I agree to abide by pyOpenSci's Code of Conduct during the review process and in maintaining my package after should it be accepted.
I have read and will commit to package maintenance after the review as per the pyOpenSci Policies Guidelines.

Description

Include a brief paragraph describing what your package does:
The US EPA's Water Quality Portal (WQP) is a data warehouse that facilitates access to data stored in large water quality databases in a common format. There are tools to facilitate both publishing data to and retrieving data from WQP, harmonize-wq is focused on retrieved data (1) cleaning to ensure it meets the required quality standards, and (2) wrangling to get it in a more analytic-ready format. Although there are many examples where this has been done, standardized tools to perform this task could make it less time-intensive, more standardized, and more reproducible.

Scope

Please indicate which category or categories.
Check out our package scope page to learn more about our
scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):
- Data retrieval
- Data extraction
- Data processing/munging
- Data deposition
- Data validation and testing
- Data visualization¹
- Workflow automation
- Citation management and bibliometrics
- Scientific software wrappers
- Database interoperability

Domain Specific & Community Partnerships

- [ ] Geospatial
- [ ] Education
- [ ] Pangeo

Community Partnerships

If your package is associated with an
existing community please check below:

Pangeo
- My package adheres to the Pangeo standards listed in the pyOpenSci peer review guidebook

For all submissions, explain how the and why the package falls under the categories you indicated above. In your explanation, please address the following points (briefly, 1-2 sentences for each):
- Who is the target audience and what are scientific applications of this package?
  Water quality domain experts trying to synthesize available data in a stream, bay, estuary, etc.. More standardized data cleansing and wrangling allows outputs to be integrated into other tools in the water quality data pipeline, e.g., for integration into dashboards for visualization (Beck et al., 2021) or decision support tools (Booth et al., 2011).
- Are there other Python packages that accomplish the same thing? If so, how does yours differ?
  No python packages to my knowledge, there is in R: USEPA/TADA
- If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted: Presubmission: harmonize-wq #132

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

does not violate the Terms of Service of any service it interacts with.
uses an OSI approved license.
contains a README with instructions for installing the development version.
includes documentation with examples for all functions.
contains a tutorial with examples of its essential functions and uses.
has a test suite.
has continuous integration setup, such as GitHub Actions CircleCI, and/or others.

Publication Options

Do you wish to automatically submit to the Journal of Open Source Software? If so:

JOSS Checks

The package has an obvious research application according to JOSS's definition in their submission requirements. Be aware that completing the pyOpenSci review process does not guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS.
The package is not a "minor utility" as defined by JOSS's submission requirements: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria.
The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
The package is deposited in a long-term repository with the DOI:

Note: JOSS accepts our review as theirs. You will NOT need to go through another full review. JOSS will only review your paper.md file. Be sure to link to this pyOpenSci issue when a JOSS issue is opened for your package. Also be sure to tell the JOSS editor that this is a pyOpenSci reviewed package once you reach this step.

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.

Confirm each of the following by checking the box.

I have read the author guide.
I expect to maintain this package for at least 2 years and can help find a replacement for the maintainer (team) if needed.

Please fill out our survey

Last but not least please fill out our pre-review survey. This helps us track
submission and improve our peer review process. We will also ask our reviewers
and editors to fill this out.

P.S. Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

The editor template can be found here.

The review template can be found here.

Please fill out a pre-submission inquiry before submitting a data visualization package. ↩

The text was updated successfully, but these errors were encountered:

isabelizimm · 2024-02-08T19:50:29Z

Hello there @jbousquin, thank you for submitting this issue--welcome to the pyOpenSci community! Just wanted to let you know we've seen your issue. The next step is for us to run some initial checks, we will give that first feedback soon.

In the meantime, if you have any questions you can ask here or in our discourse.

isabelizimm · 2024-02-13T23:15:07Z

Editor in Chief checks

Hi there! Thank you for submitting your package for pyOpenSci
review. Below are the basic checks that your package needs to pass
to begin our review. If some of these are missing, we will ask you
to work on them before the review process begins.

Please check our Python packaging guide for more information on the elements
below.

Initial onboarding survey was filled out
We appreciate each maintainer of the package filling out this survey individually. 🙌
Thank you authors in advance for setting aside five to ten minutes to do this. It truly helps our organization. 🙌

Editor comments

As a Floridian, I do appreciate your tutorial locations 🐊

A few quick fixes:

For the CODE_OF_CONDUCT file, it is optimal to have it at the root of the repository. Right now, it looks like yours is in docs/source/Code of Conduct.rst. I'd recommend moving that file, since that is the typical place people look for a CoC. Also, if it is in the root, it will show up as a "tab" next to your README, sort of how the MIT License is shown here 🎉

Second, pending some sort of tool that requires it, you shouldn't need a separate [metadata] section in your pyproject.toml.

In the meantime, I'll start hunting for an editor to facilitate a review for you!

jbousquin · 2024-02-14T19:45:08Z

Thanks @isabelizimm - made those suggested changes on pyOpenSci-review branch. Let me know if there is anything else while we wait.

isabelizimm · 2024-02-23T23:18:16Z

No other tasks yet! That should be good to start. I think I've got an editor just about figured out, I will let you know for sure mid-next week.

isabelizimm · 2024-02-29T23:57:24Z

Update: @Batalex will be the editor for harmonize-wq, guiding you through the review process. He will be the point of contact for things from here on out (although I am still happy to answer any questions if you need me!), and I've updated the Editor field in the initial comment on this issue.

Batalex · 2024-03-03T14:25:11Z

Hey @jbousquin,
I am Alex, and I am delighted to be the editor for harmonize-wq!
During the coming week(s), I'll be looking into harmonize-wq's codebase and reaching out to potential reviewers. Meanwhile, feel free to address me any question you might have.

jbousquin · 2024-03-04T14:08:05Z

Thanks @Batalex. No questions so far, let me know if anything comes up.

Batalex · 2024-03-16T06:55:31Z

👋 Hi @rcaneill and @Jacqui-123! Thank you for volunteering to review for pyOpenSci!

Please don't hesitate to introduce yourselves. @jbousquin, I am pleased to announce that we found our A-team to proceed with the review.

Please fill out our pre-review survey

Before beginning your review, please fill out our pre-review survey. This helps us improve all aspects of our review and better understand our community. No personal data will be shared from this survey - it will only be used in an aggregated format by our Executive Director to improve our processes and programs.

@rcaneill survey completed.
@Jacqui-123 survey completed.

The following resources will help you complete your review:

Here is the reviewers guide. This guide contains all the steps and information needed to complete your review.
Here is the review template that you will need to fill out and submit here as a comment, once your review is complete.

Please get in touch with any questions or concerns! Your review is due: April 8th

Reviewers: @rcaneill, @Jacqui-123
Due date: 2024/04/08

rcaneill · 2024-03-18T08:46:49Z

@rcaneill survey completed.

I just filled the survey

rcaneill · 2024-03-18T08:48:21Z

Hi @jbousquin I am happy to review this package and will start soon :)

jbousquin · 2024-03-19T14:02:19Z

Thanks @rcaneill! Let me know as things come up :)

rcaneill · 2024-03-22T12:54:52Z

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

As the reviewer I confirm that there are no conflicts of interest for me to review this work.

Documentation

The package includes all the following forms of documentation:

A statement of need clearly stating problems the software is designed to solve and its target audience in README.
Installation instructions: for the development version of the package and any non-standard dependencies in README.
Vignette(s) demonstrating major functionality that runs successfully locally.
Function Documentation: for all user-facing functions.
Examples for all user-facing functions.
Community guidelines including contribution guidelines in the README or CONTRIBUTING.
Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a pyproject.toml file or elsewhere.

Readme file requirements
The package meets the readme requirements below:

Package has a README.md file in the root directory.

The README should include, from top to bottom:

The package name
- The package name is located after the badges, I guess that it is not an issue
Badges for:
- Continuous integration and test coverage,
- Docs building (if you have a documentation website),
- A repostatus.org badge,
- Python versions supported,
- Current package version (on PyPI / Conda).

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

Short description of package goals.
Package installation instructions
Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the README.md file.
- Brief demonstration of package usage (as it makes sense - links to vignettes could also suffice here if package description is clear)
Link to your documentation website.
If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.
Citation information

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole.
Package structure should follow general community best-practices. In general please consider whether:

Package documentation is clear and easy to find and use.
The need for the package is clear
All functions have documentation and associated examples for use
The package is easy to install

Functionality

For packages also submitting to JOSS

The package has an obvious research application according to JOSS's definition in their submission requirements.

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

A short summary describing the high-level functionality of the software
Authors: A list of authors with their affiliations
A statement of need clearly stating problems the software is designed to solve and its target audience.
References: With DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 8-10

Review Comments

Missing some instructions for the devs: Review RC [doc]: environment instructions for dev USEPA/harmonize-wq#63
request: automatic versions locking: Review RC [feature]: use automated tool for requirements USEPA/harmonize-wq#65
missing link to doc: Review RC [doc]: link the different example from README to doc / Add link to documentation USEPA/harmonize-wq#68
missing citation: Review RC [doc]: add citation USEPA/harmonize-wq#69
add badges: RC review: Update README.md with badges USEPA/harmonize-wq#70
RC review: Lock test file with commit instead of master USEPA/harmonize-wq#72
RC review [doc]: example not complete USEPA/harmonize-wq#73

Batalex · 2024-03-31T11:41:22Z

Please find below a list of comments, with my own format (editor's privilege 🐈‍⬛ )
I tried to rank them so that you can prioritize your work. I'll complete this list as I revisit the package.

Praises

praise (general): The code and the docs are extra clean.
praise (general): Whenever I see pint, I'm happy!

Typos

typo (readme.md): l7 on package name
typo (readme.md, contributing.rst): double spaces

Nitpicks

nitpick (general): I recommend adding a new line at each full stop in a markdown or rst paragraph. This way, we keep the lines short in git (easier to spot diffs in PR, easier to pinpoint a line with an issue). No worries, a single new line is not rendered.
nitpick (domain.py): there is no need for a raw string for TADA_DATA_URL

Discussions

discussion (convert.py): About the TODO - both points of view (regrouping constants in a single place, or having them defined near their place of use to avoid jumping around the code base) are valid. I am usually in favor of the former.

Suggestions

suggestion (domain.py): In harmonize_TADA_dict, we could use a groupby operation to avoid looping through the dataframe using python. TOCHECK
suggestion (domain.py): We could replace the following pattern for x in list(set(pandas_series)) by using the .unique method
suggestion (domain.py, basic.py): out_col_lookup does not need to be a function. Same for all other functions returning a dict. If we make those simple module-level dicts, we can still list the sources in the module docstring.
suggestion (convert.py): We could add "references" sections in the docstrings so that the sources are present in the website and not only in the source code.
suggestion (basis.py, general): By using pandas' methods, we could streamline a little some operations. The choice is ultimately yours; I prefer using existing methods over rolling my own implementations, even if that means that other folks need to go to the documentation website to understand what is going on.
For instance, here is my proposition for set_basis

def set_basis(df, mask, basis, basis_col):
    return df.assign(**{basis_col: np.where(mask, basis, np.nan)})

I find this implementation easier to read (but I understand that this is debatable), but it is also more efficient. I have noticed that you use this pattern quite a few time throughout the code base, so I figured this might interests you.

Todos

todo (pyproject.toml): We should remove the metadata section.
todo (__init__.py): importlib.metadata was added in python 3.8, which is the minimal version supported by the package according to its pyproject.toml. The try .. except block should not be needed, even more so considering that importlib_metadata is not listed in the project requirements.
todo (basis.py): We could regroup the conditions branches in update_result_basis
todo (contributing.rst): To lower the cost of entry for potention contributors, let's make sure that we provide all the information they need. Consider adding a section describing how to setup their development environment (e.g. installing the test and docs dependencies).

Issues

issue (general): code quality (see below)
issue (domain.py): requests should be listed in the project's dependencies. The rationale is as follows: we should not import in our code any transitive dependency, because we have no guarantee that the primary dependency will not drop the former in a future update. As far as we know, dataretrieval could replace requests by httpx without notice in a patch release, which would break new harmonize-wqinstallations. The same can be said about pandas, though I agree it is unlikely that geopandas will change its backend dataframe lib.
issue (domain.py): We should specify what kind of exception we are expecting in re_case. Making a try except block too wide can lead to hard-to-debug issues.
issue (general): It seems that there are circular dependencies: harmonize -> visualize -> wrangle -> harmonize or clean -> wrangle -> clean as well. They do not raise an exception for now, but they will if any imported object is used at the module level. I strongly advise that we rework the project structure so that the files get imported in an acyclic fashion. It is also way easier to get familiar with the code base as a new contributor if the structure is predictable and linear.

General recommendations

Code quality is important in a public package.
It is obvious that a great amount of care went in making harmonize_wq, but what I mean by code quality is having tools enforcing conventions across the code base.
Such conventions usually cover code format, and catching simple anti patterns.

To do so, I would advise you to use both a linter and a formatter.
I usually recommend:

black for formatting the code
ruff to validate that the code follows good practices, and do quick fixes.

This is up to debate of course, some people might prefer one tool over another, but the point is that a project using such tools:

is more welcoming to external contributors
needs less time dedicated to low-value maintainance.

If you are ok with everything I said so far, I'd be happy to propose a PR to help you setup everything.

jbousquin · 2024-04-01T16:04:50Z

I'll start addressing these on a pyOpenSciReview branch (I'll try to be better about merging to main so other reviewers aren't running into the same things). Will generate a issue task list w/ any that are more involved. Let me know if there is anything else that I should be doing for review/edit tracking.

Would love a PR for black & ruff setup - have been running a linter and code analysis locally and definitely see the value for contributors/maintenance. Only concern is being able to easily ignore certain conventions when appropriate.

jbousquin · 2024-04-02T13:59:49Z

@Batalex fixing issue (general): circular dependencies - will be a breaking change. To resolve I moved functions from harmonize, df_checks()/add_qa_flag() to clean, convert_unit_series() to convert and units_dimension() to wq_data (to become a method). These seemed as logical a place to find them as harmonize. Now importing specific functions from other modules where practical. This breaks docs - before addressing that I wanted to confirm this is what you had in mind?

Batalex · 2024-04-03T19:24:31Z

@jbousquin Based on a quick look through the PR, yes that's exactly what I had in mind

Jacqui-123 · 2024-04-17T03:03:03Z

Great package! I hope these comments are helpful. This was my first package review so please let me know if there is anything I missed or if I was misguided with any of my comments.

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

A statement of need clearly stating problems the software is designed to solve and its target audience in README.
Installation instructions: for the development version of the package and any non-standard dependencies in README.
Vignette(s) demonstrating major functionality that runs successfully locally.
Function Documentation: for all user-facing functions.
Examples for all user-facing functions.
Community guidelines including contribution guidelines in the README or CONTRIBUTING.
Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a pyproject.toml file or elsewhere.

Readme file requirements
The package meets the readme requirements below:

Package has a README.md file in the root directory.

The README should include, from top to bottom:

The package name
Badges for:
- Continuous integration and test coverage,
- Docs building (if you have a documentation website),
- A repostatus.org badge,
- Python versions supported,
- Current package version (on PyPI / Conda).

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

Short description of package goals.
Package installation instructions
Any additional setup required to use the package (authentication tokens, etc.)
Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the README.md file.
- Brief demonstration of package usage (as it makes sense - links to vignettes could also suffice here if package description is clear)
Link to your documentation website.
If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.
Citation information

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole.
Package structure should follow general community best-practices. In general please consider whether:

Package documentation is clear and easy to find and use.
The need for the package is clear
All functions have documentation and associated examples for use
The package is easy to install

Functionality (Skipped this)

For packages also submitting to JOSS

The package has an obvious research application according to JOSS's definition in their submission requirements.

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

A short summary describing the high-level functionality of the software
Authors: A list of authors with their affiliations
A statement of need clearly stating problems the software is designed to solve and its target audience.
References: With DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing:

approximately 8

Review Comments

Harmonize_Pensacola.Rmd:
-Small language changes suggested to make the installation process more user-friendly and clear:
-make it clear when something is an option to run and when it's step-by-step instruction, as it switches back an forth in this demo. For example, could add "# Install the harmonize-wq package... [#option 1] package install... [#option 2] development version..."
-Clearer separation of code chunks by task, so each code chunk focuses on a specific task. This makes debugging/error message interpretation easier. Ie a new code chunk after options(reticulate.conda_binary = "..."), new code chunks after conda_install() section (lines 72, 81). (For good examples see the .ipynb demo files for this package).
-I think use_condaenv("wq_harmonize") should be use_condaenv("wq-reticulate") (line 90)
Comments for Harmonize_CapeCod_Simple.ipynb
-easy to follow and clearly documented
-attribute errors for harmonize_all(df, errors='ignore'): AttributeError: 'float' object has no attribute 'upper' (these attribute errors happened a few times in the other demos, too.)
usability:
-"All functions have documentation and associated examples for use" -> I wasn't completely clear on exactly each function did, particularly some of the cleaning/tidying ones and how they changed the resulting dataframe. For example, what are all the flag options in the QA_flag column and what do each of them mean? The overall package was really clear though in terms of what it was doing and how, but some of the nuances were less clear to me.
I am curious to know if the package looks at or flags the different method detection limits (mdl) that different analytical laboratories often use, or if that is an issue with this dataset? I tend to run into this issue in my work but I don't typically work with EPA datasets.

jbousquin · 2024-07-19T20:13:39Z

@Batalex - weird I commented your responses a couple weeks ago, but just came back to make sure I hadn't missed anything from you and don't see that comment here... I'll try to re-create, mainly just copying over month old status from the repo (there is also follow-up on your draft PR that I'd written after as follow-up in case you didn't see it here)

jbousquin · 2024-07-22T19:06:44Z

@Batalex If you would like additional links/line numbers just let me know:

Typos
should be resolved as suggested

Nitpicks
nitpick (general)
should be resolved as suggested

nitpick (domain.py): there is no need for a raw string for TADA_DATA_URL
This url is only used once at the moment, but is currently a raw string (1) to allow it to be easily integrated into feature adds (i.e., intend to use it more places, especially w/ WQX 2->3), and (2) for easier maintenance given the repo is still underdevelopment (e.g., like when the url recently changed).

Discussions
Kept it in convert module because fewer module references made ensuring no circular references easier. Already importing registry_adds_list from domains so there isn't a strong reason not to move it there if the need arises in the future.

Suggestions
suggestion (domain.py): In harmonize_TADA_dict, we could use a groupby operation to avoid looping through the dataframe using python. TOCHECK
should be resolved as suggested, was there more to the TOCHECK?

suggestion (domain.py): We could replace the following pattern for x in list(set(pandas_series)) by using the .unique method
should be resolved as suggested

suggestion (domain.py, basic.py): out_col_lookup does not need to be a function. Same for all other functions returning a dict. If we make those simple module-level dicts, we can still list the sources in the module docstring.
These have been updated to be module-level dicts, but I'm not sure on how you are proposing the docstrings could be included. Hate to lose all the examples etc. on these, have you seen this in documentation for other projects you could point me to?

suggestion (convert.py): We could add "references" sections in the docstrings so that the sources are present in the website and not only in the source code.
When a conversion function has equation or methods references the documentation has a reference section for that (e.g., conductivity_to_PSU). However, if the information is for code/checks then it goes in as a comment in the code (e.g., the url in DO_concentration get to a converter written in JS). In those cases is it adequate/suggested to add contextual comments, e.g., # To check compare against:

suggestion (basis.py, general): By using pandas' methods, we could streamline a little some operations. The choice is ultimately yours; I prefer using existing methods over rolling my own implementations, even if that means that other folks need to go to the documentation website to understand what is going on.

I agree on using existing methods, I really tried to implement this suggestion but ran into issues. In the provided example if there are existing values in columns those need to be preserved. That can be done with an if/else. Additionally, numpy.where will coerce the other values (y) to the dtype which is problematic for nan. Do-able, but more complex than the current solution.

Todos
pyproject.toml & init
should be resolved as suggested
basis.py: regroup conditions in update_result_basis
Admittedly these additional basis columns haven't received much attention yet (not frequently leveraged by those entering data), and it was coded this way to make it easy to come back to and write additional specific handling. For now we combined weight/time, left particuleSize as is with added notes specific to it's handling.

contributing.rst
Added dev section

Issues
domain.py: dependencies
Added the suggested dependencies (stop short of pandas but did include numpy). pyproj.toml should populate depends from requirements now - decreasing maintenance/risk of differences.
domain.py: specify exception expected by re_case
Resolved as suggested
Circular dependencies
should be resolved as suggested

General recommendations

To summarize, working on implementing black. All the code changes are sitting on the pyOpenSci-review branch. It runs locally as suggested in your PR. I'm trying to get my head around pre-commits so that contributors will have style/format checks without having to run it locally.

jbousquin · 2024-07-22T19:10:06Z

@rcaneill - Really appreciate your doing issues/PRs over on the repo (saves steps!). I think we resolved everything over there (leaving the citation issue open so it gets resolved after), but let me know if I missed anything from your review here.

Batalex · 2024-07-29T19:05:22Z

@jbousquin, here is some quick feedback.

nitpick (domain.py): there is no need for a raw string for TADA_DATA_URL
This url is only used once at the moment, but is currently a raw string (1) to allow it to be easily integrated into feature adds (i.e., intend to use it more places, especially w/ WQX 2->3), and (2) for easier maintenance given the repo is still underdevelopment (e.g., like when the url recently changed).

I am not sure how using a raw string is relevant to the reasons you mentioned. Maybe we are not talking about the same thing: I am speaking about the r prefix in r"http://url.com". Raw strings are usually used in regular expressions.

suggestion (domain.py, basic.py): out_col_lookup does not need to be a function. Same for all other functions returning a dict. If we make those simple module-level dicts, we can still list the sources in the module docstring.
These have been updated to be module-level dicts, but I'm not sure on how you are proposing the docstrings could be included. Hate to lose all the examples etc. on these, have you seen this in documentation for other projects you could point me to

The idea would be to add the sources and any relevant information in the module docstring:

constants.py

"""
Constants submodule.


References
-----------

Plank:
The NIST Reference on Constants, Units, and Uncertainty. [NIST](https://en.wikipedia.org/wiki/National_Institute_of_Standards_and_Technology). 20 May 2019.
"""

plank = 6.62607015e-34

Then you can access the source using help on the submodule, just like you would on a function. python -c "import constant;help(constant)"

Help on module constant:

NAME
    constant - Constants submodule.

DESCRIPTION

    References
    -----------

    Plank:
    The NIST Reference on Constants, Units, and Uncertainty. NIST. 20 May 2019.

DATA
    plank = 6.62607015e-34

As for the rest of my original points, I am okay with the changes / reasons not to change. Nice job!

Batalex · 2024-07-29T19:08:54Z

@Jacqui-123, @rcaneill Were your concerns addressed?

Doesn't need to be raw string (see Batalex pyOpenSci/software-submission#157 (comment))

jbousquin · 2024-07-29T20:46:58Z

@Batalex RE quick feedback:

Ah! You really did mean it being raw string not it being a constant, resolved on branch (passing, will merge with the linting).

docstrings for dict constants - what I was stuck on was what to document it as if module level (''Attributes'' for sphinx). I'm not sure how to do the child level of an attribute, e.g., Examples, but I'll play around with it. docstring at the variable I wasn't sure how to associate it (still not sure of that, but looking at the sphinx doc helped me understand it needed to be after), documented that way the child level works, but I see where it doesn't seem to be part of the module level help, and I'm not sure how you would get help to retrieve the variable level doc-string (will look into that if module level doesn't work out).

Jacqui-123 · 2024-07-30T00:34:05Z

@jbousquin Thanks so much for the detailed response to my review/comments. The changes look great, and I appreciate your explanations. @Batalex I don't have anything further to add but let me know if you need anything else.

Batalex · 2024-07-30T09:17:41Z

@jbousquin Thanks so much for the detailed response to my review/comments. The changes look great, and I appreciate your explanations. @Batalex I don't have anything further to add but let me know if you need anything else.

Perfect, I just need you to check the approval box in your review above. Thank you so much for contributing to this review!

jbousquin · 2024-07-31T14:35:36Z

@Batalex RE:RE quick feedback: module level doc-strings are passing for both help() and docs.

pre-commits are very close to working, just need ruff to see settings in pyproject.toml like it does when local. Tried a few things based on pre-commit issues but haven't solved it yet. Close to just writing them out in the config - but reluctant since that duplicates what is in the toml (more maintenance making sure they always match)

rcaneill · 2024-07-31T19:29:20Z

@Batalex I am happy with the changes made / the answers when the authors disagreed with me

* Implementing suggested ruff rules * isort * Fix whitespace (many of these were copied from docs example execution - need to confirm it passes docs tests) * Run test.yml on push to this branch * Whitespace * F401 (redundant alias) * Missed whitespace * First attempt w/ pre-commit * Fix indent * indent/drop name * Rename .pre-commit-config.yml to .pre-commit-config.yaml yAml * Update .pre-commit-config.yaml fix file structure * Reduce .pre-commit-config.yaml Reduce what files it is run on * Update domains.py Doesn't need to be raw string (see Batalex pyOpenSci/software-submission#157 (comment)) * Dict doc strings as module level attributes * Update to main (#88) * Update domains.py 'Field' -> 'Field***' * 62 r test ci (#86) Update test_r.yaml to install conda outside r, specifically miniforge, then run on env from setup with current package (vs pip installing main) * Update .pre-commit-config.yaml From issue: pass_filenames: false in the pre-commit config so that the file discovery is done by Ruff taking into account the includes and excludes configured by the user in their pyproject.toml * Update .pre-commit-config.yaml Try updating to patch version and specify config in args. * Update pyproject.toml try withouth 'docstring-code-format = true' as this may override other settings. * Update pyproject.toml Try to get pre-commit to see config * Update pyproject.toml Warning message, so it is getting these settings from the toml? * Update conf.py E501 * Update basis.py E501 * Update basis.py Moved constant doc-string to module level * Update clean.py E501 * Update convert.py E501 * Update conf.py lint/format edits * Update pyproject.toml Without single checking if double is default * Update pyproject.toml Will move to one or the other (likely default double for ease), but trying to post-pone to work through diff * lint/formating * linted * W293 * black format/lint * W605 - try pulling r str out of test doc-string and instead as a comment. Comment shouldn't cause problems but this one has in the past. * I001 (all whitespace except test_harmonize_WQP.py) * lint conf file * lint * Add white space between module doc-string and imports * Format: add whitespace after mod doc-string * Add assert for actual2 - where the characteristics specific function is used instead of the generic. * Resolved some E501 * Check if new line fails doctest * Revert to get doc-test passing * Spread out example df entry * Spread out dict read out to reduce line length. White space is already normalized for doc-test so this may pass. * Revert * Spread out building df for wq_dat.WQCharData example. * spread out example df for we_date.measure_mask() * Shotern len of dict for wq_data.replace_unit_str() & wq_data.apply_conversion() examples * Attempt to skip E501 on this line * skip rule on line * Last attempt to ignore line too long in docstrings (3) * Update pyproject.toml Drop single quote for lint * '' -> "" * Update test.yml Revert back to testing on main only

jbousquin · 2024-08-03T00:33:24Z

@Batalex - resolved ruff checks with pre-commits on PR 89, please let me know if there is anything unresolved from your review. Really happy getting lint/formatting as part of this workflow and thank you as the edits to the pyproject.toml in your draft PR helped immensely!

Batalex · 2024-08-10T13:45:51Z

jbousquin · 2024-08-21T20:34:34Z

Author Wrap Up Tasks

Will update as tasks to wrap up this submission are completed:

Activate Zenodo watching the repo if you haven't already done so.
Tag and create a release to create a Zenodo version and DOI.
Add the badge for pyOpenSci peer-review to the README.md of . The badge should be .
Please fill out the post-review survey. All maintainers and reviewers should fill this out.

It looks like you would like to submit this package to JOSS. Here are the next steps:

Login to the JOSS website and fill out the JOSS submission form using your Zenodo DOI. When you fill out the form, be sure to mention and link to the approved pyOpenSci review. JOSS will tag your package for expedited review if it is already pyOpenSci approved.
Wait for a JOSS editor to approve the presubmission (which includes a scope check).
Once the package is approved by JOSS, you will be given instructions by JOSS about updating the citation information in your README file.
When the JOSS review is complete, add a comment to your review in the pyOpenSci software-review repo here that it has been approved by JOSS. An editor will then add the JOSS-approved label to this issue.

lwasser · 2024-11-22T22:00:23Z

hey team. has this package been accepted by JOSS? it looks like we might be able to update the labels, fill out the header and close it if that is the case. please let me know however.

lwasser · 2024-11-22T22:01:48Z

It looks to me like we just need the archive link, the version accepted and a label change!

jbousquin · 2024-11-22T22:03:18Z

Yes - it cleared JOSS and citation info on the repo are updated. Let me know if you need any info/action from me. Thanks!

lwasser · 2025-02-06T17:12:02Z

fantastic. I am going to close this. I"m sorry for the long delay in response. @jbousquin if you or anyone involved in this review (other maintianers, reviewers etc) would like to join our slack you are welcome to do so. please email me at leah at pyopensci.org and i'll invite you! or share an email here.

if you'd like to share you package with the community, we also welcome you to create a blog post. please just let me know. in the meantime i'll close this issue but we can still chat here if you have questions or need anything else!!

thank you everyone!

jbousquin added 0/pre-review-checks New Submission! labels Feb 8, 2024

github-project-automation bot added this to peer-review-status Feb 8, 2024

isabelizimm removed the New Submission! label Feb 23, 2024

isabelizimm added 1/editor-assigned and removed 0/pre-review-checks labels Feb 29, 2024

Batalex added 3/reviewers-assigned and removed 1/editor-assigned labels Mar 19, 2024

lwasser moved this to under-review in peer-review-status Mar 20, 2024

lwasser assigned Batalex Mar 27, 2024

jbousquin mentioned this issue Apr 1, 2024

Py open sci review USEPA/harmonize-wq#55

Merged

jbousquin mentioned this issue Apr 1, 2024

Py open sci review USEPA/harmonize-wq#56

Merged

Batalex mentioned this issue Apr 3, 2024

Setup for automated code quality USEPA/harmonize-wq#58

Closed

jbousquin added a commit to USEPA/harmonize-wq that referenced this issue Jul 29, 2024

Update domains.py

414ca16

Doesn't need to be raw string (see Batalex pyOpenSci/software-submission#157 (comment))

Batalex added 6/pyOS-approved and removed 4/reviews-in-awaiting-changes on-hold A tag to represent packages on review hold until we figure out a bigger issue associate with review labels Aug 10, 2024

lwasser moved this from under-review to pyos-accepted in peer-review-status Aug 10, 2024

kthyng mentioned this issue Aug 22, 2024

[PRE REVIEW]: harmonize-wq: Standardize, clean and wrangle Water Quality Portal data into more analytic-ready formats openjournals/joss-reviews#7135

Closed

kthyng mentioned this issue Sep 30, 2024

[REVIEW]: harmonize-wq: Standardize, clean and wrangle Water Quality Portal data into more analytic-ready formats openjournals/joss-reviews#7305

Closed

lwasser added the 7/under-joss-review label Oct 18, 2024

Batalex added 9/joss-approved and removed 7/under-joss-review labels Nov 25, 2024

lwasser moved this from pyos-accepted to joss-accepted in peer-review-status Nov 25, 2024

lwasser closed this as completed Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

harmonize-wq #157

harmonize-wq #157

jbousquin commented Feb 8, 2024 •

edited by lwasser

Loading

isabelizimm commented Feb 8, 2024

isabelizimm commented Feb 13, 2024 •

edited

Loading

jbousquin commented Feb 14, 2024

isabelizimm commented Feb 23, 2024

isabelizimm commented Feb 29, 2024

Batalex commented Mar 3, 2024

jbousquin commented Mar 4, 2024

Batalex commented Mar 16, 2024 •

edited

Loading

rcaneill commented Mar 18, 2024

rcaneill commented Mar 18, 2024 •

edited

Loading

jbousquin commented Mar 19, 2024

rcaneill commented Mar 22, 2024 •

edited

Loading

Batalex commented Mar 31, 2024 •

edited

Loading

jbousquin commented Apr 1, 2024

jbousquin commented Apr 2, 2024

Batalex commented Apr 3, 2024

Jacqui-123 commented Apr 17, 2024 •

edited

Loading

jbousquin commented Jul 19, 2024 •

edited

Loading

jbousquin commented Jul 22, 2024

jbousquin commented Jul 22, 2024

Batalex commented Jul 29, 2024

Batalex commented Jul 29, 2024

jbousquin commented Jul 29, 2024

Jacqui-123 commented Jul 30, 2024

Batalex commented Jul 30, 2024

jbousquin commented Jul 31, 2024

rcaneill commented Jul 31, 2024

jbousquin commented Aug 3, 2024

Batalex commented Aug 10, 2024 •

edited by lwasser

Loading

jbousquin commented Aug 21, 2024 •

edited by lwasser

Loading

lwasser commented Nov 22, 2024

lwasser commented Nov 22, 2024

jbousquin commented Nov 22, 2024

lwasser commented Feb 6, 2025

harmonize-wq #157

harmonize-wq #157

Comments

jbousquin commented Feb 8, 2024 • edited by lwasser Loading

Code of Conduct & Commitment to Maintain Package

Description

Scope

Community Partnerships

Technical checks

Publication Options

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

Please fill out our survey

Editor and Review Templates

Footnotes

isabelizimm commented Feb 8, 2024

isabelizimm commented Feb 13, 2024 • edited Loading

Editor in Chief checks

Editor comments

jbousquin commented Feb 14, 2024

isabelizimm commented Feb 23, 2024

isabelizimm commented Feb 29, 2024

Batalex commented Mar 3, 2024

jbousquin commented Mar 4, 2024

Batalex commented Mar 16, 2024 • edited Loading

Please fill out our pre-review survey

Please get in touch with any questions or concerns! Your review is due: April 8th

rcaneill commented Mar 18, 2024

rcaneill commented Mar 18, 2024 • edited Loading

jbousquin commented Mar 19, 2024

rcaneill commented Mar 22, 2024 • edited Loading

Package Review

Documentation

Usability

Functionality

For packages also submitting to JOSS

Final approval (post-review)

Review Comments

Batalex commented Mar 31, 2024 • edited Loading

Praises

Typos

Nitpicks

Discussions

Suggestions

Todos

Issues

General recommendations

jbousquin commented Apr 1, 2024

jbousquin commented Apr 2, 2024

Batalex commented Apr 3, 2024

Jacqui-123 commented Apr 17, 2024 • edited Loading

Package Review

Documentation

Usability

Functionality (Skipped this)

For packages also submitting to JOSS

Final approval (post-review)

Review Comments

jbousquin commented Jul 19, 2024 • edited Loading

jbousquin commented Jul 22, 2024

jbousquin commented Jul 22, 2024

Batalex commented Jul 29, 2024

Batalex commented Jul 29, 2024

jbousquin commented Jul 29, 2024

Jacqui-123 commented Jul 30, 2024

Batalex commented Jul 30, 2024

jbousquin commented Jul 31, 2024

rcaneill commented Jul 31, 2024

jbousquin commented Aug 3, 2024

Batalex commented Aug 10, 2024 • edited by lwasser Loading

Author Wrap Up Tasks

Editor Final Checks

jbousquin commented Aug 21, 2024 • edited by lwasser Loading

Author Wrap Up Tasks

lwasser commented Nov 22, 2024

lwasser commented Nov 22, 2024

jbousquin commented Nov 22, 2024

lwasser commented Feb 6, 2025

jbousquin commented Feb 8, 2024 •

edited by lwasser

Loading

isabelizimm commented Feb 13, 2024 •

edited

Loading

Batalex commented Mar 16, 2024 •

edited

Loading

rcaneill commented Mar 18, 2024 •

edited

Loading

rcaneill commented Mar 22, 2024 •

edited

Loading

Batalex commented Mar 31, 2024 •

edited

Loading

Jacqui-123 commented Apr 17, 2024 •

edited

Loading

jbousquin commented Jul 19, 2024 •

edited

Loading

Batalex commented Aug 10, 2024 •

edited by lwasser

Loading

jbousquin commented Aug 21, 2024 •

edited by lwasser

Loading