Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GemGIS - Spatial Data Processing for Geomodeling #128

Open
17 of 30 tasks
AlexanderJuestel opened this issue Aug 8, 2023 · 30 comments
Open
17 of 30 tasks

GemGIS - Spatial Data Processing for Geomodeling #128

AlexanderJuestel opened this issue Aug 8, 2023 · 30 comments
Assignees
Labels
4/reviews-in-awaiting-changes on-hold A tag to represent packages on review hold until we figure out a bigger issue associate with review ⌛ pending-maintainer-response

Comments

@AlexanderJuestel
Copy link

AlexanderJuestel commented Aug 8, 2023

Submitting Author: Name (@AlexanderJuestel)
All current maintainers: (@AlexanderJuestel)
Package Name: GemGIS
One-Line Description of Package: Spatial Data Processing for Geomodeling
Repository Link: https://github.com/cgre-aachen/gemgis
Version submitted: 1.0.11 (soon 1.1 with bug fixes for some methods and the API Reference once it is working, no major functionality changes)
Documentation: https://gemgis.readthedocs.io/en/latest/
JOSS Publication: https://joss.theoj.org/papers/10.21105/joss.03709
JOSE Publication: https://jose.theoj.org/papers/10.21105/jose.00185
Editor: @yeelauren
Reviewer 1: @aleksandraradecka1
Reviewer 2: @martinfleis
Reviewer 3: @SimonMolinsky
Archive: TBD
Version accepted: TBD
Date accepted (month/day/year): TBD


Code of Conduct & Commitment to Maintain Package

Description

GemGIS is a Python-based, open-source geographic information processing library. It is capable of preprocessing spatial data such as vector data (shape files, geojson files, geopackages,…), raster data (tif, png,…), data obtained from online services (WCS, WMS, WFS) or XML/KML files (soon). Preprocessed data can be stored in a dedicated Data Class to be passed to the geomodeling package GemPy in order to accelerate the model building process. Postprocessing of model results will allow export from GemPy to geoinformation systems such as QGIS and ArcGIS or to Google Earth for further use.

GemGIS uses and combines the full functionality of GeoPandas, rasterio, OWSLib, Pandas, Shapely, PyVista and NumPy to simplify, accelerate and automate the workflows used to preprocess spatial data for geomodeling.

From https://gemgis.readthedocs.io/en/latest/

In addition, almost 70 tutorials illustrate the different functionalities of GemGIS.

Scope

  • Please indicate which category or categories.
    Check out our package scope page to learn more about our
    scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):

    • Data retrieval
    • Data extraction
    • Data processing/munging
    • Data deposition
    • Data validation and testing
    • Data visualization1
    • Workflow automation
    • Citation management and bibliometrics
    • Scientific software wrappers
    • Database interoperability

Domain Specific & Community Partnerships

- [x] Geospatial
- [x] Education
- [ ] Pangeo

Community Partnerships

If your package is associated with an
existing community please check below:

  • For all submissions, explain how the and why the package falls under the categories you indicated above. In your explanation, please address the following points (briefly, 1-2 sentences for each):

Data extraction/processing: GemGIS uses the functionality of packages like GeoPandas, Shapely, or Rasterio to extract information from vector and raster data and to put it in a form that it can be used by the GemPy.

Data visualization: GemGIS uses the functionality of packages like matplotlib or PyVista to create static and dynamic plots of data and meshes. This includes digital elevation models or meshes of subsurfaces layers, boreholes geological cross sections or even seismic data.

Workflow automation: The entire purpose of GemGIS is to provide methods to accelerate the preparation of input data for GemPy. Over time, the package has also gained additional functionality to work with a variety of datasets utilized for subsurface applications, see Tutorials.

  • Who is the target audience and what are scientific applications of this package?

The target audience is the open-source Geosciences community, researchers, students but also industry. GemGIS provides functionality to accelerate the preparation of input data for the structural geological modeling package GemPy which has been used in numerous publications. For applications at universities, we are in the final stages of getting a JOSE publication approved with more than 20 structural geological models that are used at RWTH Aachen University for teaching purposes and where GemGIS and GemPy will be included in future courses.

  • Are there other Python packages that accomplish the same thing? If so, how does yours differ?

GemGIS does not reinvent the wheel but rather combines the functionality of already existing packages mentioned in the description above. The packages utilized the most in GemGIS are the well-known packages like GeoPandas, Shapely, Rasterio, Pandas, NumPy, PyVista, matplotlib, etc. We also decided against i.e. wrapping GeoPandas GeoDataFrames in our own class or creating many new classes so that users can still use the full functionality of the underlying packages. This is one big advantage in comparison to GemPy where i.e. the meshes of the resulting structural geological models cannot be extracted (GemGIS is capable of extracting them though). Another example is that raster data opened with GemGIS will be stored as PyVista PolyData datasets or as grids so that users can harvest the functionality of this amazing package.

  • If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted:

Pre-submission inquiry: #126, @NickleDave

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • uses an OSI approved license.
  • contains a README with instructions for installing the development version. --> was added in cgre-aachen/gemgis@b5f71e3
  • includes documentation with examples for all functions. --> will be checked once it is working again
  • contains a tutorial with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration setup, such as GitHub Actions CircleCI, and/or others.

Publication Options

JOSS Checks
  • The package has an obvious research application according to JOSS's definition in their submission requirements. Be aware that completing the pyOpenSci review process does not guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS.
  • The package is not a "minor utility" as defined by JOSS's submission requirements: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria.
  • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
  • The package is deposited in a long-term repository with the DOI:

Note: JOSS accepts our review as theirs. You will NOT need to go through another full review. JOSS will only review your paper.md file. Be sure to link to this pyOpenSci issue when a JOSS issue is opened for your package. Also be sure to tell the JOSS editor that this is a pyOpenSci reviewed package once you reach this step.

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

  • Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.

Confirm each of the following by checking the box.

  • I have read the author guide.
  • I expect to maintain this package for at least 2 years and can help find a replacement for the maintainer (team) if needed.

Please fill out our survey

P.S. Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

The editor template can be found here.

The review template can be found here.

Footnotes

  1. Please fill out a pre-submission inquiry before submitting a data visualization package.

@AlexanderJuestel
Copy link
Author

Hello,
There are two bullets that I still need to check, I am further open to make updates to the package to meet minimal criteria before review (Section 5 of the Maintainer Guide) if there is still anything missing.

@NickleDave
Copy link
Contributor

NickleDave commented Aug 15, 2023

Thank you @AlexanderJuestel for opening this and for linking back to #126.

I have gone ahead and started looking for an editor for this review.

Please do let us know when you get the API docs up, and then I will proceed with the initial checks we do before starting review.

@AlexanderJuestel
Copy link
Author

@NickleDave, I have created a question on Stackoverflow (https://stackoverflow.com/questions/76883029/api-reference-building-locally-but-not-on-rtd) and I am trying one last thing and hope to make it work!

@NickleDave
Copy link
Contributor

Not sure if I spotted the issue but I replied there: https://stackoverflow.com/a/76908429/4906855

You can also feel free to ask for help in our Discourse in the future (https://pyopensci.discourse.group/) (or now, if I haven't helped 😅)

@AlexanderJuestel
Copy link
Author

@NickleDave I am trying your solution right now!

@AlexanderJuestel
Copy link
Author

AlexanderJuestel commented Aug 16, 2023

@NickleDave You solution did not work, unfortunately. I am only using two methods now to get a mini API Reference running but it also fails because a module, that is not even needed is missing. The only place this module appears is the __init__.py file. So maybe there is a link missing to this file? I do not know anymore...

I also opened a new topic on your Discourse! I hope someone has an idea!

@NickleDave
Copy link
Contributor

NickleDave commented Aug 29, 2023

Hi @AlexanderJuestel just checking back. I think we do have an editor that can run this review.

I know we were helping you with the API docs build but I'm a little out of the loop on where we left off. Could you give me a rough estimate of when you might have a chance to get to that? I don't mean to rush you--I'm sure you have other obligations as well; I just don't want to lose track of this review.

My guess is that, if possible, switching to a pyproject.toml that includes your dependencies, and then using pure Python virtual environments on readthedocs, might make the build easier. But it's possible I'm missing the real issue.

@AlexanderJuestel
Copy link
Author

@NickleDave, thanks for checking in. I have to get back to @lwasser on Discourse still. It has been a little hectic here. I try to get back to her in the next few days!

@NickleDave
Copy link
Contributor

Very understood, thank you @AlexanderJuestel.
I do not mean to rush you.
I think Leah is out of town right now but should be back on-line next week.
Early next week I will see if I can troubleshoot for you a little.

@AlexanderJuestel
Copy link
Author

@NickleDave Just to pick up the thread here again. The API Reference is working and the project was transferred to use a pyproject.toml file.

I may also have some capacity to work on some other issues prior to the start of the review if necessary to ensure that the review itself is smoothly :)

@lwasser
Copy link
Member

lwasser commented Jan 5, 2024

hey there @AlexanderJuestel just checking in on this review. we will keep this on hold until you are ready. so please just ping us. here (and also on slack) when you are ready to pick back up on the review. i think we might have an editor for this but it will depend on timing!

@AlexanderJuestel
Copy link
Author

@lwasser, sorry for being so unresponsive lately, I am getting into the final phase of my PhD. I have not made any progress on the tests yet. Would this be a reason to put the review on hold or can it be continued for now and I am completing the tests later?

Cheers
Alex

@NickleDave
Copy link
Contributor

NickleDave commented Jan 9, 2024

Hi @AlexanderJuestel, could you clarify a little?

I have not made any progress on the tests yet.

I see you have a set of tests but it looks like those tests might not be passing. Is that what you mean? I can't find where we discussed tests previously.

I am getting into the final phase of my PhD

If you are about to very busy with finishing up your PhD--believe me, I sympathize--then it might be better to put the review on hold. We don't want you to feel overwhelmed!

We also need to make an effort to adhere to the timelines that we specify in our review process, so that we're not putting too much demands on our editors and reviewers. If you don't think you could respond to reviews in anywhere near the two week timeline, then now may not be a good time to start a review. I will admit we probably are taking longer than two weeks for many reviews, but still it's not fair to editors or reviewers to have to keep a review on their to-do list for months, so we should try to minimize that.

Please say a little more about what's left to do with the tests, and whether you'll be able to meet the review timelines (roughly) given other things you have going on.

edit: all that said, we do think we have an editor as @lwasser mentioned, so if you do feel like we could get through a review here, we do really want to support you with that after all the work you've done already! I know you've worked really hard to get GemGIS where it is now

@AlexanderJuestel
Copy link
Author

Hello @NickleDave,

we mentioned the tests briefly on Slack (see below). I do not have 100% coverage with the tests right now. There are quite a few missing. If that would not be a show-stopper for now, can also proceed with the review as you suggested in your last paragraph. If not, we should put the review on hold for now and continue later.

image

@NickleDave
Copy link
Contributor

Thank you @AlexanderJuestel for clarifying, that helps.

For tests, the criterion to start review is just that you have a test suite and it runs in CI. You have met that criterion.

If I were a reviewer, I would definitely ask you to move towards full coverage as much as possible though.

Let's go ahead and start the review. You've worked hard to get GemGIS ready, and I sense that you want to take this over the finish line. I'll reach out to the editor.

@AlexanderJuestel
Copy link
Author

@NickleDave sorry for getting back to so late. Yes, let us start the review!

@yeelauren
Copy link

Hey @AlexanderJuestel !
I'll be the editor for this review :)
I'm currently searching for reviewers and taking a spin through the documentation.

@NickleDave
Copy link
Contributor

Sorry I missed your reply @AlexanderJuestel! Thank you @yeelauren for taking over 🙏

@yeelauren
Copy link

Editor response to review:

👋 @AlexanderJuestel hurrah! We've found reviewers. I like the look of the number of tutorials and examples included thus far, looking forward to more feedback from our reviewers :)

Editor comments

👋 Hi @aleksandraradecka1 and @martinfleis! Thank you for volunteering to review
for pyOpenSci!

Please fill out our pre-review survey

Before beginning your review, please fill out our pre-review survey. This helps us improve all aspects of our review and better understand our community. No personal data will be shared from this survey - it will only be used in an aggregated format by our Executive Director to improve our processes and programs.

  • reviewer 1 survey completed.
  • reviewer 2 survey completed.
  • reviewer 3 (if applicable)

The following resources will help you complete your review:

  1. Here is the reviewers guide. This guide contains all of the steps and information needed to complete your review.
  2. Here is the review template that you will need to fill out and submit
    here as a comment, once your review is complete.

Please get in touch with any questions or concerns! Your review is due: ~March 29th

Reviewers: @aleksandraradecka1 @martinfleis
Due date: March 29th

@aleksandraradecka1
Copy link

Hi Everyone,

as I already spoko with @yeelauren, my review will be short. I hope it will be helpful anyway.

First of all, congrats to @AlexanderJuestel and the Team on creating such an interesting and extensive library. What I find especially good is:

Additonally, I think the library would benefit from improving the following elements:

  • 'Overview' section on GitHub provides a much clearer explanation of the library's aim than the ''Welcome page' in the Documentation. I would consider standardizing this.
  • Both in the 'Installation' and 'Tutorials and Basic Usage', part of the instructions provided place a lot of attention on things outside the scope of GemGIS functionality (e.g. Anaconda). I think such a part could be summarized in one or two sentences supplied with links to the respective resources.

Good luck with the further development of the library!
Aleksandra

@martinfleis
Copy link

martinfleis commented Mar 22, 2024

Package Review

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README.
  • Installation instructions: for the development version of the package and any non-standard dependencies in README.
  • Vignette(s) demonstrating major functionality that runs successfully locally.
  • Function Documentation: for all user-facing functions.
  • Examples for all user-facing functions.
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING.
  • Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a pyproject.toml file or elsewhere.

Readme file requirements
The package meets the readme requirements below:

  • Package has a README.md file in the root directory.

The README should include, from top to bottom:

  • The package name
  • Badges for:
    • Continuous integration and test coverage,
    • Docs building (if you have a documentation website),
    • A repostatus.org badge,
    • Python versions supported,
    • Current package version (on PyPI / Conda).

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

  • Short description of package goals.
  • Package installation instructions
  • Any additional setup required to use the package (authentication tokens, etc.)
  • Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the README.md file.
    • Brief demonstration of package usage (as it makes sense - links to vignettes could also suffice here if package description is clear)
  • Link to your documentation website.
  • If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.
  • Citation information

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole.
Package structure should follow general community best-practices. In general please consider whether:

  • Package documentation is clear and easy to find and use.
  • The need for the package is clear
  • All functions have documentation and associated examples for use
  • The package is easy to install

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests:
    • All tests pass on the reviewer's local machine for the package version submitted by the author. Ideally this should be a tagged version making it easy for reviewers to install.
    • Tests cover essential functions of the package and a reasonable range of inputs and conditions.
  • Continuous Integration: Has continuous integration setup (We suggest using Github actions but any CI platform is acceptable for review)
  • Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines.
    A few notable highlights to look at:
    • Package supports modern versions of Python and not End of life versions.
    • Code format is standard throughout package and follows PEP 8 guidelines (CI tests for linting pass)

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 3


Review Comments

Hello,

congratulations on a great work @AlexanderJuestel! I am surely not target audience but I am sure there is enough users that require the additional degree of hand-holding GemGIS offers (compared to using underlying libraries directly).

I do have some comments that need to be resolved before I can tick all the boxed above.

  • Contributing.md and Contributing in docs have different contents. Is there a reason? Which one should contributors read?

  • I would recommend pinning minimal versions of dependencies

  • The package name in Readme should not be only an image without alt-text. That is not machine-readable and limits users with accessibility issues. Heading of level 1 would be optimal.

  • Badges for Continuous integration and test coverage are missing

  • Badge for supported Python versions is outdated (3.12 missing)

  • Citation information is not present in Readme

  • It seems that DevelopersGuide.md is heavily outdated. It should also be part of Contributing.

  • Tutorial notebooks need to be re-generated as some outputs look differently now (e.g. 48 shows colours)

  • I wasn't able to make PyVista plots work. They all look like this. Btw, you have some of these in documentation as well.
    Screenshot 2024-03-25 at 13 24 54

  • CI on GHA does not run on schedule and was failing last time it ran

  • Locally, tests also fail in a fresh environment on Apple Silicon

=================================== short test summary info ===================================
FAILED tests/test_utils.py::test_transform_location_coordinate - AssertionError: assert (32294411.334...009.357074925) == (32294411.334...009.357074926)
FAILED tests/test_vector.py::test_create_buffer_point - assert 78.4137122636485 == 78.41371226364849
FAILED tests/test_vector.py::test_create_buffer_linestring - assert 106.69798351111041 == 106.6979835111104
FAILED tests/test_vector.py::test_calculate_dipping_angle_linestring - assert 63.43494882292202 == 63.43494882292201
FAILED tests/test_vector.py::test_calculate_dipping_angles_linestrings - AssertionError: assert [45.0, 63.434...5051177077994] == [45, 63.43494...6505117707799]
FAILED tests/test_web.py::test_load_as_file - OSError: [Errno 45] Operation not supported: '/home/runner'
FAILED tests/test_web.py::test_load_as_files - LookupError: Directory not found. Pass create_directory=True to create a new directory
============= 7 failed, 262 passed, 6 skipped, 488 warnings in 168.77s (0:02:48) ==============
  • Linting with ruff check fails with 91 errors. See the report in Details below.
The report:
gemgis/__init__.py:33:1: F403 `from gemgis.gemgis import *` used; unable to detect undefined names
gemgis/__init__.py:34:25: F401 [*] `gemgis.vector` imported but unused
gemgis/__init__.py:35:25: F401 [*] `gemgis.raster` imported but unused
gemgis/__init__.py:36:24: F401 [*] `gemgis.utils` imported but unused
gemgis/__init__.py:37:32: F401 [*] `gemgis.visualization` imported but unused
gemgis/__init__.py:38:22: F401 [*] `gemgis.web` imported but unused
gemgis/__init__.py:39:33: F401 [*] `gemgis.postprocessing` imported but unused
gemgis/__init__.py:40:23: F401 [*] `gemgis.misc` imported but unused
gemgis/__init__.py:41:1: F403 `from gemgis.download_gemgis_data import *` used; unable to detect undefined names
gemgis/download_gemgis_data.py:85:16: F401 `pooch` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/gemgis.py:25:25: F401 [*] `pandas.core.frame` imported but unused
gemgis/postprocessing.py:68:25: F401 `gempy` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/postprocessing.py:255:72: E712 Comparison to `True` should be `cond is True` or `if cond:`
gemgis/postprocessing.py:506:25: F401 `gempy` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/postprocessing.py:720:5: F841 Local variable `option1_1` is assigned to but never used
gemgis/postprocessing.py:726:5: F841 Local variable `option1_2` is assigned to but never used
gemgis/postprocessing.py:730:5: F841 Local variable `option1_3` is assigned to but never used
gemgis/postprocessing.py:747:5: F841 Local variable `option2_1` is assigned to but never used
gemgis/postprocessing.py:753:5: F841 Local variable `option2_2` is assigned to but never used
gemgis/postprocessing.py:759:5: F841 Local variable `option2_3` is assigned to but never used
gemgis/postprocessing.py:765:5: F841 Local variable `option2_4` is assigned to but never used
gemgis/postprocessing.py:771:5: F841 Local variable `option2_5` is assigned to but never used
gemgis/postprocessing.py:777:5: F841 Local variable `option2_6` is assigned to but never used
gemgis/postprocessing.py:783:5: F841 Local variable `option2_7` is assigned to but never used
gemgis/postprocessing.py:789:5: F841 Local variable `option2_8` is assigned to but never used
gemgis/postprocessing.py:795:5: F841 Local variable `option2_9` is assigned to but never used
gemgis/postprocessing.py:801:5: F841 Local variable `option2_10` is assigned to but never used
gemgis/postprocessing.py:807:5: F841 Local variable `option2_11` is assigned to but never used
gemgis/postprocessing.py:820:5: F841 Local variable `option3_1` is assigned to but never used
gemgis/postprocessing.py:824:5: F841 Local variable `option3_2` is assigned to but never used
gemgis/postprocessing.py:828:5: F841 Local variable `option3_3` is assigned to but never used
gemgis/postprocessing.py:883:12: E713 [*] Test for membership should be `not in`
gemgis/postprocessing.py:895:12: E713 [*] Test for membership should be `not in`
gemgis/postprocessing.py:903:12: E713 [*] Test for membership should be `not in`
gemgis/postprocessing.py:972:5: F841 Local variable `roation` is assigned to but never used
gemgis/postprocessing.py:976:5: F841 Local variable `sizescale` is assigned to but never used
gemgis/postprocessing.py:1062:25: F401 `gempy` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/postprocessing.py:1073:16: E713 [*] Test for membership should be `not in`
gemgis/postprocessing.py:1073:94: E712 Comparison to `True` should be `cond is True` or `if cond:`
gemgis/postprocessing.py:1090:16: E713 [*] Test for membership should be `not in`
gemgis/raster.py:31:27: F401 [*] `rasterio.mask.mask` imported but unused
gemgis/raster.py:32:30: F401 [*] `shapely.geometry.box` imported but unused
gemgis/utils.py:545:25: F401 `gempy` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/utils.py:2206:48: E714 [*] Test for object identity should be `is not`
gemgis/utils.py:2408:5: F841 Local variable `bbox` is assigned to but never used
gemgis/utils.py:2639:47: F401 `segysak.segy.segy_writer` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/utils.py:2723:47: F401 `segysak.segy.segy_writer` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/vector.py:8140:56: E713 [*] Test for membership should be `not in`
gemgis/visualization.py:111:26: E712 Comparison to `False` should be `cond is False` or `if not cond:`
gemgis/visualization.py:1670:60: E712 Comparison to `True` should be `cond is True` or `if cond:`
gemgis/visualization.py:1670:114: E712 Comparison to `True` should be `cond is True` or `if cond:`
gemgis/visualization.py:2941:16: F401 `mplstereonet` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/web.py:73:32: F401 `owslib.wms.WebMapService` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/web.py:184:16: F401 `owslib` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/web.py:186:32: F401 `owslib.wms.WebMapService` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/web.py:187:32: F401 `owslib.wfs.WebFeatureService` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/web.py:188:32: F401 `owslib.wcs.WebCoverageService` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/web.py:189:9: F841 Local variable `__all__` is assigned to but never used
gemgis/web.py:379:16: F401 `owslib` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/web.py:381:32: F401 `owslib.wms.WebMapService` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/web.py:382:32: F401 `owslib.wfs.WebFeatureService` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/web.py:383:32: F401 `owslib.wcs.WebCoverageService` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/web.py:384:9: F841 Local variable `__all__` is assigned to but never used
gemgis/web.py:527:32: F401 `owslib.wms.WebMapService` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/web.py:528:32: F401 `owslib.wfs.WebFeatureService` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/web.py:529:32: F401 `owslib.wcs.WebCoverageService` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/web.py:530:9: F841 Local variable `__all__` is assigned to but never used
gemgis/web.py:602:16: F401 `owslib` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/web.py:604:32: F401 `owslib.wms.WebMapService` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/web.py:605:32: F401 `owslib.wfs.WebFeatureService` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/web.py:606:32: F401 `owslib.wcs.WebCoverageService` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/web.py:607:9: F841 Local variable `__all__` is assigned to but never used
gemgis/web.py:701:32: F401 `owslib.wms.WebMapService` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/web.py:702:32: F401 `owslib.wfs.WebFeatureService` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/web.py:703:32: F401 `owslib.wcs.WebCoverageService` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/web.py:704:9: F841 Local variable `__all__` is assigned to but never used
gemgis/web.py:780:16: F401 `owslib` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/web.py:782:32: F401 `owslib.wms.WebMapService` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/web.py:783:32: F401 `owslib.wfs.WebFeatureService` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/web.py:784:32: F401 `owslib.wcs.WebCoverageService` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/web.py:785:9: F841 Local variable `__all__` is assigned to but never used
gemgis/web.py:874:16: F401 `owslib` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/web.py:876:32: F401 `owslib.wms.WebMapService` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/web.py:877:32: F401 `owslib.wfs.WebFeatureService` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/web.py:878:32: F401 `owslib.wcs.WebCoverageService` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/web.py:879:9: F841 Local variable `__all__` is assigned to but never used
gemgis/web.py:988:16: F401 `owslib` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/web.py:990:32: F401 `owslib.wms.WebMapService` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/web.py:991:32: F401 `owslib.wfs.WebFeatureService` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/web.py:992:32: F401 `owslib.wcs.WebCoverageService` imported but unused; consider using `importlib.util.find_spec` to test for availability
gemgis/web.py:993:9: F841 Local variable `__all__` is assigned to but never used
Found 91 errors.
[*] 17 fixable with the `--fix` option (33 hidden fixes can be enabled with the `--unsafe-fixes` option).

@martinfleis
Copy link

@AlexanderJuestel @yeelauren I have edited the post above to include comments from the first round of the review.

@AlexanderJuestel
Copy link
Author

@martinfleis @aleksandraradecka1 @yeelauren Thank you for your reviews! If you don't mind, I would like to tackle these in two weeks after I have submitted my PhD thesis and can focus on programming a little bit :)

@yeelauren
Copy link

Hey Everyone, I'll be ooo for this week and next. But wanted to introduce a second reviewer : @SimonMolinsky 👋🤝

@SimonMolinsky
Copy link
Collaborator

SimonMolinsky commented Apr 28, 2024

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README.
  • Installation instructions: for the development version of the package and any non-standard dependencies in README.
  • Vignette(s) demonstrating major functionality that runs successfully locally.
  • Function Documentation: for all user-facing functions.
  • Examples for all user-facing functions.
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING.
  • Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a pyproject.toml file or elsewhere.

Readme file requirements
The package meets the readme requirements below:

  • Package has a README.md file in the root directory.

The README should include, from top to bottom:

  • The package name
  • Badges for:
    • Continuous integration and test coverage,
    • Docs building (if you have a documentation website),
    • A repostatus.org badge,
    • Python versions supported,
    • Current package version (on PyPI / Conda).

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

  • Short description of package goals.
  • Package installation instructions
  • Any additional setup required to use the package (authentication tokens, etc.)
  • Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the README.md file.
    • Brief demonstration of package usage (as it makes sense - links to vignettes could also suffice here if package description is clear)
  • Link to your documentation website.
  • If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.
  • Citation information

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole.
Package structure should follow general community best-practices. In general please consider whether:

  • Package documentation is clear and easy to find and use.
  • The need for the package is clear
  • All functions have documentation and associated examples for use
  • The package is easy to install

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests:
    • All tests pass on the reviewer's local machine for the package version submitted by the author. Ideally this should be a tagged version making it easy for reviewers to install.
    • Tests cover essential functions of the package and a reasonable range of inputs and conditions.
  • Continuous Integration: Has continuous integration setup (We suggest using Github actions but any CI platform is acceptable for review)
  • Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines.
    A few notable highlights to look at:
    • Package supports modern versions of Python and not End of life versions.
    • Code format is standard throughout package and follows PEP 8 guidelines (CI tests for linting pass)

For packages also submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: With DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 8


Review Comments

@SimonMolinsky
Copy link
Collaborator

@AlexanderJuestel Thank you for your work. It was a pleasure testing GemGIS! I've checked all tutorials and the package itself, and here is the wrap-up of my notes (and I won't repeat the same problems stated by @martinfleis ). Not all points have the same weight, but for me as a reviewer, points 1, 2, 3, 5, 8, 9, 10, 11 are critical.

  1. Required dependencies. I've seen that you assume that some high-level packages (e.g., geopandas) will install other packages (e.g., fiona), so there is no need to list the lower-level packages as required dependencies. I recommend listing all imported packages as the core dependencies, even if those are supposedly installed by listed packages.
  2. Optional vs. required dependencies. Some dependencies enlisted as optional should be required if their absence means the process fails. Why? For example, I can install GemGIS and its core dependencies in the environment with Python 3.12, but some optional dependencies won't work in these settings (I couldn't install GemPy in an environment with Py==3.18).
  3. Optional dependencies - installation. Please include a paragraph on how to install the full version of the package with all dependencies. It might be helpful for people who are not Python devs :)
  4. Dependencies imports in tutorials: In some tutorials, dependencies are imported multiple times throughout code cells. I recommend doing all imports in the first cell; this way, a user will quickly find something that needs to be added. I've seen that in the later tutorials, you put all imports in the first cell. (Tutorial 1, 3)
  5. Contribution: information about the installation of all dev dependencies.
  6. (Docs) What interpolation methods are used? - missing description
  7. gg.vector.extract_xy()
    drop_id: Isn't dropping columns without the user's consent too risky? Dataframe cleaning should be the user's responsibility. The method name focuses on extracting XY coordinates and removing columns might be an unwanted surprise. Comparing the function to gg.vector.extract_xy_linestring() - this one does not remove columns. (When I tried more tutorials, I understood the reasoning behind this functionality; it's up to you what you want to do with this issue).
  8. gg.raster.calculate_difference() | gg.raster.resize_raster() | gg.raster.resize_by_array() | gg.raster.extract_contour_lines_from_raster()

When input arrays have different sizes, I get Import Error: ModuleNotFoundError: Scikit Image package is not installed. Use pip install scikit-image to install the latest version

The case when optional dependency should be listed as required.
9. Tutorials & API - imports. Here, we have multiple issues, mainly related to imports. I recommend writing one paragraph on installing additional dependencies with links to these packages:

  • Tutorial 14: rioxarray is required for gg.visualization.read_raster().
  • Tutorial 17: Info about mplstereonet installation should be included.
  • Tutorial 18: GemPy installation.
  • Tutorial 19: owslib package.
  • Tutorial 21: gg.web.load_as_files() - ModuleNotFoundError: tqdm package is not installed. Use pip install tqdm to install the latest version
  • Tutorial 22: gempy import
  • Tutorial 23: a/a
  • Tutorial 26: gg.misc.load_pdf(path=file_path + 'test_data.pdf') - ModuleNotFoundError: PyPDF package is not installed. Use pip install pypdf to install the latest version
  • Tutorial 28: gg.utils.load_surface_colors() - ModuleNotFoundError: xmltodict package is not installed. Use pip install xmltodict to install the latest version
  • Tutorial 31: gg.utils.get_location_coordinate() - ModuleNotFoundError: GeoPy package is not installed. Use pip install geopy to install the latest version
  • Tutorial 35: gg.visualization.add_row_to_boreholes() - ModuleNotFoundError: tqdm package is not installed. Use pip install tqdm to install the latest version
  • Tutorial 38: bokeh is required
  • Tutorial 52: gg.visualization.read_raster() requires rioxarray.
  • Tutorial 54: gempy import
  • Tutorial 55: a/a
  • Tutorial 56: segysak import
  • Tutorial 57: gempy import
  • Tutorial 61: gempy import
  • Tutorial 62: gg.raster.extract_contour_lines_from_raster() error (`ModuleNotFoundError: No module named 'skimage")
  • Tutorial 65: segysak import
  • Tutorial 67: gempy import & pv.set_jupyter_backend('client') results in ImportError: Please install trameandipywidgets to use this feature.
  • Tutorial 68: gempy import
  • Tutorial 69: missing gemgis import
  • Tutorial 70: segysak import
  1. Tutorials - missing data:
  • Tutorial 15: data file_path + '11_NI_sm.ts' seems to be not downloaded.
  • Tutorial 66: The tutorial's data is not available (it's not possible to download it)
  • Tutorial 67: The tutorial's data does not exist.
  • Tutorial 68: The tutorial's data does not exist.
  • Tutorial 69: The tutorial's data does not exist.
  • Tutorial 70: The tutorial's data does not exist.
  • Tutorial 71: The tutorial's data is not provided.
  1. Tutorials: misc - important
  • Tutorial 5: Number of points, default all or 100? Why does a small number of points change the extent?
  • Tutorial 10: Got this error in cell 18:

"`python
p = pv.Plotter()

p.add_mesh(mesh=grid, scalars=grid["Elevation"], cmap='gist_earth')

p.show_grid(color='black')
p.set_background(color='white')
p.show()


"`text
KeyError: 'Data array (Elevation) not present in this dataset.'

because my version of PyVista returned a key with the name Elevation [m] (not Elevation). I have PyVista==0.43.5

The final view was inverted in my notebook:

tutorial10-final-result

  • Tutorial 13: There is a section (Georeference Cross Section) which states:
The first step is to georeference the cross section. A tutorial on how to georeference maps is provided on this documentation page.

Shouldn't there be a link from the sentence this documentation page? I see screens taken from QGIS, but your tutorial (no 13) does not describe how to create cross-sections in QGIS.

  • Tutorial 32: cell 11: `AttributeError: 'MultiBlock' object has no attribute 'active_scalars_name"
  • Tutorial 41: cell 8 (or cell 3 if we preserve ordering):
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[3], line 4
      1 import geopandas as gpd
      2 import fiona
----> 4 gpd.io.file.fiona.drvsupport.supported_drivers['KML'] = 'rw'
      6 layer_list = fiona.listlayers(file_path + 'KML_Samples.kml')
      7 layer_list

AttributeError: 'NoneType' object has no attribute 'drvsupport'
  • Tutorial 42: don't know if this view from pyvista is what you were looking for
  • Tutorial 49: cell 21:
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[28], line 1
----> 1 spline = pv.Spline(np.asarray(linestring),15)
      2 spline

File ~/miniforge3/envs/gemgis/lib/python3.12/site-packages/pyvista/core/utilities/parametric_objects.py:56, in Spline(points, n_points)
     18"  ""Create a spline from points.
     19 
     20 Parameters
   (...)
     53 
     54 """
     55 spline_function = _vtk.vtkParametricSpline()
---> 56 spline_function.SetPoints(pyvista.vtk_points(points, False))
     58 # get interpolation density
     59 u_res = n_points

File ~/miniforge3/envs/gemgis/lib/python3.12/site-packages/pyvista/core/utilities/points.py:48, in vtk_points(points, deep, force_float)
     46 # verify is numeric
     47 if not np.issubdtype(points.dtype, np.number):
---> 48     raise TypeError('Points must be a numeric type')
     50 if force_float:
     51     if not np.issubdtype(points.dtype, np.floating):

TypeError: Points must be a numeric type
  • Tutorial 63: gemgis is not imported at the beginning (It is required in the 8th cell.) / Section Combined Function - function show_well_log_along_well was not defined.
  • Tutorial 71: block [4] is:

"`python
gemgis.raster.read_raster_gdb(path=file_path + 'OpenFileGDB.gdb',
crs='EPSG:25832',
path_out=file_path)


But it should be:

"`python
gg.raster.read_raster_gdb(path=file_path + 'OpenFileGDB.gdb',
                                          crs='EPSG:25832',
                                          path_out=file_path)
  1. Tutorials - misc - less important & optional

Tutorial 4: Maybe cropping by a polygon of irregular shape? This has nothing to do with the package's functionality or the tutorial's quality.

  • Tutorial 11: Multiple empty cells.
  • Tutorial 12: You have an error in the last cell of the tutorial. (But it works!)
  • Tutorial 24: cell 12 - better if the output is hidden :)
  • Tutorial 52: The first block with an example in the tutorial is commented on.
  1. Linitng with PyCharm linter shows me multiple non-critical problems with your code. I recommended using a linter and checking if there are no hidden bugs in the code. (You might use ruff as Martin has done).

@yeelauren
Copy link

@AlexanderJuestel 👋 Checking in on this since it's been a over a month since the last review!
If you need more time to address the comments and write the ol' dissertation, I'm happy to put this on hold.

@AlexanderJuestel
Copy link
Author

Yes, let's put this on hold for now. I am defending next Monday and will then slowly work on the suggestions and improvements :)

@yeelauren yeelauren added on-hold A tag to represent packages on review hold until we figure out a bigger issue associate with review 4/reviews-in-awaiting-changes and removed 3/reviewers-assigned 1/editor-assigned labels Jun 21, 2024
@AlexanderJuestel
Copy link
Author

Hi Everyone,

[...]

First of all, congrats to @AlexanderJuestel and the Team on creating such an interesting and extensive library. What I find especially good is:

[....]

The first point about renaming was addressed in cgre-aachen/gemgis@cbc9f31 (mentioned in cgre-aachen/gemgis#337)

@AlexanderJuestel
Copy link
Author

@ aleksandraradecka1, could you please elaborate on what you meant by this comment? :)

This was referenced Jul 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4/reviews-in-awaiting-changes on-hold A tag to represent packages on review hold until we figure out a bigger issue associate with review ⌛ pending-maintainer-response
Projects
Status: on-hold-or-maintainer-unresponsive
Development

No branches or pull requests

9 participants