Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a few rof notebooks #126

Open
wants to merge 77 commits into
base: main
Choose a base branch
from
Open

Conversation

nmizukami
Copy link
Member

@nmizukami nmizukami commented Aug 23, 2024

Initial commits for ROF notebooks.

An ultimate set of the notebooks intend to mimic old ROF diagnostic plots

This PR is just starting with a few notebooks.

All Submissions:

  • Have you followed the guidelines in our Contributor's Guide (including the pre-commit check)?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you lint your code locally prior to submission?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you successfully tested your changes locally?

@TeaganKing
Copy link
Collaborator

Hey @nmizukami ! Thanks so much for adding these notebooks! I made those changes we discussed to cupid-run and the config file to include rof in running cupid and in the jupyter book table of contents.

One note: I think we'll need to provide math in cupid-analysis if you prefer to use that for sqrt rather than numpy in metrics.py? I don't think this should be a problem, but just let me know if you'd like me to do that?

Would you be able to pull these changes in locally, test running your notebook with cupid-run and make sure that things look as you expect?

@TeaganKing TeaganKing self-requested a review August 23, 2024 20:11
@TeaganKing TeaganKing added lnd enhancement New feature or request labels Aug 23, 2024
@nmizukami
Copy link
Member Author

Hi Teagan (@TeaganKing), The notebook almost ran with cupid-run -rof. One error was reading geopackage file (gis vector data) via geopanda.

You can see /glade/work/mizukami/CUPiD/examples/coupled_model/month_annual_flow.ipynb.

cupid prints on screen this:

RROR 1: PROJ: proj_create_from_database: /glade/u/apps/casper/23.10/spack/opt/spack/proj/8.2.1/gcc/12.2.0/7gif/share/proj/proj.db contains DATABASE.LAYOUT.VERSION.MINOR = 2 whereas a number >= 3 is expected. It comes from another PROJ installation.

I have seen this before. I don't fully understand the error, but this is coming from pyogrio package that came with geopandas. pyogrio is used internally in geopandas.

When I ran the notebook outside cupid, it runs fine but I activated Python [conda-env:cupid-analysis] environment in the jupyterhub. I see another one called cupid-analysis, which I believe cupid actually uses. I saw similar error when I use cupid-analysis. wondering what is the difference between Python [conda-env:cupid-analysis] and cupid-analysis?

hopefully I am simply setting something e.g., environment incorrectly...

@TeaganKing
Copy link
Collaborator

Hi @nmizukami , Sorry I let this slip! In the environment in which this was working, did you have a particular version of geopandas or pyogrio pinned? I could also add that to the environment yaml specification. Or when you previously ran into this error, did you have another solution?

This error may be because PROJ is already installed-- I'm not sure where at this point, but can look into that.

@nmizukami
Copy link
Member Author

nmizukami commented Aug 27, 2024

Hi @TeaganKing, some hint is that I can ran outside cupid-run, meaning I can run the notebook manually on jupyterhub with [conda-env:cupid-analysis] env on, but NOT with cupid-analysis on (get similar error on PROJ). You see two similar envs in Jupyter in image below. I believe the package versions should be ok. I can think about this more... I don't know what is the difference between [conda-env:cupid-analysis] and cupid-analysis

Screen Shot 2024-08-27 at 12 07 05 PM

@TeaganKing
Copy link
Collaborator

TeaganKing commented Aug 27, 2024

It sounds like there may be some issue related to the ipykernel installation. I think one of these might be the installation from ipykernel (a soft linked conda environment) and the other may be a conda environment found elsewhere (possibly an outdated cupid-analysis that doesn't include geopandas?). Mike mentioned that the ipykernel installation basically creates a softlink to an environment, which made me think that could be an inconsistency.

I had updated a test environment but not my actual cupid-analysis environment; I'm doing that now and will test your notebook out. This is probably not the most efficient workflow, but I wonder if it might also be worth removing your cupid-analysis environment, see if it's still listed as an option in JupyterHub, make sure that both versions are removed, and then re-install a clean version?

@nmizukami
Copy link
Member Author

I did the following steps to remove cupid-analysis env and reinstall it on terminal.

mamba remove --name cupid-analysis --all
mamba env create -f environments/cupid-analysis.yml

It did not fix it. After removing cupid-analysis, jupyterhub still showed cupid-analysis, though [conda-env:cupid-analysis] was gone.

@nmizukami
Copy link
Member Author

Hi @TeaganKing , trying to run conda list to see what packages are there in cupid-analysis env when running cupid-run. So unfortunately including conda list in the notebook cause error in cupid run:

SyntaxError: An error happened when checking the source code. 
:25:7: invalid syntax

conda list

@nmizukami
Copy link
Member Author

casper-login1:/glade/work/mizukami/CUPiD/examples/coupled_model (main_adding_rof)> cupid-run -rof

/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/ploomber/dag/dag.py:455: UserWarning: 
========================================================================================= DAG render with warnings =========================================================================================
----------------------------------------------------------------- NotebookRunner: index -> File('computed_notebook...ucture/index.ipynb') ------------------------------------------------------------------
----------------------------------------------------------------- /glade/work/mizukami/CUPiD/examples/nblibrary/infrastructure/index.ipynb -----------------------------------------------------------------
These parameters are not used in the task's source code: 'CESM_output_dir', 'lc_kwargs', 'serial', and 'subset_kwargs'
----------------------------------------------------------- NotebookRunner: month_annual_flow -> File('computed_notebook..._annual_flow.ipynb') ------------------------------------------------------------
---------------------------------------------------------------- /glade/work/mizukami/CUPiD/examples/nblibrary/rof/month_annual_flow.ipynb -----------------------------------------------------------------
These parameters are not used in the task's source code: 'CESM_output_dir', 'lc_kwargs', 'serial', and 'subset_kwargs'
============================================================================================ Summary (2 tasks) =============================================================================================
NotebookRunner: index -> File('computed_notebook...ucture/index.ipynb')
NotebookRunner: month_annual_flow -> File('computed_notebook..._annual_flow.ipynb')
========================================================================================= DAG render with warnings =========================================================================================

  warnings.warn(str(warnings_))
Executing: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.03cell/s]
Building task 'month_annual_flow':  50%|███████████████████████████████████████████████████████████████████                                                                   | 1/2 [00:02<00:02,  2.92s/itERROR 1: PROJ: proj_create_from_database: /glade/u/apps/casper/23.10/spack/opt/spack/proj/8.2.1/gcc/12.2.0/7gif/share/proj/proj.db contains DATABASE.LAYOUT.VERSION.MINOR = 2 whereas a number >= 3 is expected. It comes from another PROJ installation.
                                                                                                                                                                                                           /glade/u/apps/opt/conda/condabin/conda                                                                                                                                      | 5/69 [00:20<03:39,  3.44s/cell]
Executing:  90%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍               | 62/69 [03:53<00:26,  3.76s/cell]
Building task 'month_annual_flow': 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [03:56<00:00, 118.06s/it]
Traceback (most recent call last):
  File "/glade/work/mizukami/conda-envs/cupid-dev/bin/cupid-run", line 8, in <module>
    sys.exit(run())
             ^^^^^
  File "/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/work/mizukami/CUPiD/cupid/run.py", line 290, in run
    dag.build()
  File "/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/ploomber/dag/dag.py", line 557, in build
    report = callable_()
             ^^^^^^^^^^^
  File "/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/ploomber/dag/dag.py", line 662, in _build
    raise build_exception
  File "/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/ploomber/dag/dag.py", line 591, in _build
    task_reports = self._executor(dag=self, show_progress=show_progress)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/ploomber/executors/serial.py", line 203, in __call__
    raise DAGBuildError(str(exceptions_all))
ploomber.exceptions.DAGBuildError: 
============================================================================================= DAG build failed =============================================================================================
----------------------------------------------------------- NotebookRunner: month_annual_flow -> File('computed_notebook..._annual_flow.ipynb') ------------------------------------------------------------
---------------------------------------------------------------- /glade/work/mizukami/CUPiD/examples/nblibrary/rof/month_annual_flow.ipynb -----------------------------------------------------------------
---------------------------------------------------------------------------
Exception encountered at "In [24]":
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[24], line 2
      1 column_stat = []
----> 2 gauge_shp_all_case = gauge_shp.copy(deep=True)
      3 for case, grid_name in cases.items():
      4     gauge_shp_all_case = gauge_shp_all_case.merge(
      5         gauge_shp1[case][["id", f"{error_metric}_{grid_name}"]],
      6         left_on="id",
      7         right_on="id",
      8     )

NameError: name 'gauge_shp' is not defined

ploomber.exceptions.TaskBuildError: Error when executing task 'month_annual_flow'. Partially executed notebook available at /glade/work/mizukami/CUPiD/examples/coupled_model/computed_notebooks/quick-run/rof/month_annual_flow.ipynb
ploomber.exceptions.TaskBuildError: Error building task "month_annual_flow"
============================================================================================= Summary (1 task) =============================================================================================
NotebookRunner: month_annual_flow -> File('computed_notebook..._annual_flow.ipynb')
============================================================================================= DAG build failed =============================================================================================

@nmizukami
Copy link
Member Author

nmizukami commented Aug 29, 2024

Hi @TeaganKing,
Small good new is that I got it run without the geopanda error. The trick is to add this
os.environ['PROJ_LIB']='/glade/work/mizukami/conda-envs/cupid-analysis/share/proj'
before loading geopandas.
However, I don't think this is permanent solution. I still try to consult with CISL.

I was able to create /glade/work/mizukami/CUPiD/examples/coupled_model/computed_notebooks/quick-run/_build/html/index.html
How do you usually open under HPC. I was trying to open firefox in derecho/casper, but it is very slow. Wonder if there is any other ways to look.

@TeaganKing
Copy link
Collaborator

Hi @nmizukami , I'm glad that is temporarily working (but of course we need this to work for any user's environment). Yes, I think this would be a good conversation to have with CISL.

Regarding looking at output, see the second section on this page for recommendations on NCAR machines.

@TeaganKing TeaganKing mentioned this pull request Sep 10, 2024
6 tasks
@TeaganKing
Copy link
Collaborator

TeaganKing commented Sep 10, 2024

Hey @nmizukami , I added a PR to bring rof into run.py. And then I realized these changes are already in this PR... so apologies-- feel free to ignore that!

@TeaganKing
Copy link
Collaborator

TeaganKing commented Sep 10, 2024

To-do:

  • update readme to include 'rof' on line 104: -rof, --river-runoff Run river runoff component diagnostics

@nmizukami
Copy link
Member Author

In month_annual_flow.ipynb, I also have a few additional comments (but the file was too large to render/comment on particular lines):
All sections: Can you please remove empty cells? I think the ‘go back to top’ sections could be removed if this is being run all at once?

Removed empty cells and removed ‘go back to top’ link.

Section 1. Are the lists of cases intended to be regularly used options? Is the yaml file that’s loaded going to be consistent, or is it intended to be updated by user?

Yes, case is CESM case, which CESM modeler put and this has to be provided by the user. I tried adding some comments what need to be provided here. Multiple cases are allowed to compare the results from different cases

2.1 – there’s a typo ‘Mmonthly’

Fixed.

2.6 - can you add a docstring to define what’s happening in this function?

This function is removed.

3.1 – I think this cell may have an empty slice error.

I think I know how this happens. I think this is due to the inconsistent observation data length across the sites (some sites provide the data for a short period). No observed flow data is available depending analysis (or simulation) periods. Can I fix this in another PR?

@TeaganKing
Copy link
Collaborator

Thanks for addressing some of these comments!

I made issue ticket #140 for the empty slice error-- that's fine by me to address in another PR as long as the notebook otherwise runs smoothly (which it does) and users are informed (by the issue ticket) of the error that needs to be fixed.

Ok, I wanted to make sure that we are using the cases values in the config.yml file and that users don't actually need to update anything in the notebook once the cell is tagged as a parameters cell.

@mnlevy1981
Copy link
Collaborator

I started to look at this, and I have a lot of questions and suggestions -- @nmizukami and @TeaganKing could we try to find a time early next week to meet? I'd like to discuss a few things that might be tough to squeeze into an in-line review on this PR. Some initial comments, though (and maybe these changes will make it a little easier to go back through line-by-line asynchronously):

  1. It looks like this is failing the pre-commit style checks
  2. I noticed in month_annual_flow.ipynb you have a logical flag parallel that enables using PBSCluster when set to true. If you look at examples/nblibrary/ocn/ocean_surface.ipynb you'll see how CUPiD already passes a serial flag and uses a LocalCluster when that is set to false... I haven't looked at the other runoff notebooks, but we need to avoid casper- or derecho-specific blocks of code
  3. The notebooks in examples/nblibrary/rof should not have any output in them

@nmizukami
Copy link
Member Author

Hi Mike (@mnlevy1981) and Teagan (@TeaganKing), yes, I can meet on Monday or Tuesday (my calendar is up-to date). I was wondering about PBSCluster (2nd point)

@nmizukami
Copy link
Member Author

nmizukami commented Oct 7, 2024

To-do

  • replaced PBSCluster with LocalCluster, then follow the example in ocn notebook (use serial logical flag)
  • activated cupid-dev and then Installed pre-commit under CUPiD/.git/hook.
  • clear outputs from all the cells.
  • add key_metrics/config.yml and modify case:grid input.

@nmizukami
Copy link
Member Author

Updated key_metrics/config.yml and coupled_model/config.yml for rof
modify two notebooks based on config changes so now they run.

Review is needed and some science questions came up (e.g., what to do if you plot for time period when no observation is available. Are the other notebooks comparing the model outputs with observations??)

@TeaganKing
Copy link
Collaborator

Hey @nmizukami , thanks for these updates.

Not all notebooks are comparing with observations, but you can see an example of an observational comparison in the glacier notebook & corresponding config.yml details.

I think that if you are plotting for a time period where observations are not available, perhaps a warning statement that the obs are unavailable would be useful?

@TeaganKing
Copy link
Collaborator

And I'll review after our discussion on Thursday.

@nmizukami
Copy link
Member Author

Right now I am pointing to case /glade/campaign/cesm/development/cross-wg/diagnostic_framework/CESM_output_for_testing/b.e23_alpha16b.BLT1850.ne30_t232.054

The time period for this case is year 0001-0102, when for sure there is no observation for any components. So I thought this config is meant to compare the simulation with some base simulation, to see the model comparison or something like that, not meant to validate the model component with observations.

Just wanted to understand the context of this setup. just with current config, the rof notebooks look less interesting, but technically the notebook works now (I believe).

If config point to any CESM cases that use 20th-21st century, rof notebook automatically adds the observed streamflow to the plots, and compare the simulations with observations.

@TeaganKing
Copy link
Collaborator

The key setup here that's different from the coupled-model example is in the 'global params' section of the config file, where we have both a case name for the case you're looking at, as well as the base_case_name for a comparison case. The observations are defined separately in each individual notebook config section at this point.

@TeaganKing
Copy link
Collaborator

That sounds good that plots are generated without obs if obs do not exist.

@TeaganKing
Copy link
Collaborator

@nmizukami is planning to do the following:

  • implement comparison with base_case in addition to obs
  • remove years that overwrite config file start/end years
  • include analysis-period configuration parameter to specify e.g. 10 years so that users don't need to run 100 years unless they really want to do so.
  • test cupid-run from key_metrics directory

Once these items are done, @TeaganKing can review.

@nmizukami
Copy link
Member Author

nmizukami commented Oct 18, 2024

@nmizukami is planning to do the following:

  • implement comparison with base_case in addition to obs
  • remove years that overwrite config file start/end years
  • include analysis-period configuration parameter to specify e.g. 10 years so that users don't need to run 100 years unless they really want to do so.
  • test cupid-run from key_metrics directory

Once these items are done, @TeaganKing can review.

Hi @TeaganKing, all are done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request lnd
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants