Adding a few rof notebooks #126

nmizukami · 2024-08-23T19:26:51Z

Initial commits for ROF notebooks.

An ultimate set of the notebooks intend to mimic old ROF diagnostic plots

This PR is just starting with a few notebooks.

All Submissions:

Have you followed the guidelines in our Contributor's Guide (including the pre-commit check)?
Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

Does your submission pass tests?
Have you lint your code locally prior to submission?

Changes to Core Features:

Have you added an explanation of what your changes do and why you'd like us to include them?
Have you successfully tested your changes locally?

… this, and change from pre-commit reformatting

TeaganKing · 2024-08-23T19:46:22Z

Hey @nmizukami ! Thanks so much for adding these notebooks! I made those changes we discussed to cupid-run and the config file to include rof in running cupid and in the jupyter book table of contents.

One note: I think we'll need to provide math in cupid-analysis if you prefer to use that for sqrt rather than numpy in metrics.py? I don't think this should be a problem, but just let me know if you'd like me to do that?

Would you be able to pull these changes in locally, test running your notebook with cupid-run and make sure that things look as you expect?

nmizukami · 2024-08-23T21:08:31Z

Hi Teagan (@TeaganKing), The notebook almost ran with cupid-run -rof. One error was reading geopackage file (gis vector data) via geopanda.

You can see /glade/work/mizukami/CUPiD/examples/coupled_model/month_annual_flow.ipynb.

cupid prints on screen this:

RROR 1: PROJ: proj_create_from_database: /glade/u/apps/casper/23.10/spack/opt/spack/proj/8.2.1/gcc/12.2.0/7gif/share/proj/proj.db contains DATABASE.LAYOUT.VERSION.MINOR = 2 whereas a number >= 3 is expected. It comes from another PROJ installation.

I have seen this before. I don't fully understand the error, but this is coming from pyogrio package that came with geopandas. pyogrio is used internally in geopandas.

When I ran the notebook outside cupid, it runs fine but I activated Python [conda-env:cupid-analysis] environment in the jupyterhub. I see another one called cupid-analysis, which I believe cupid actually uses. I saw similar error when I use cupid-analysis. wondering what is the difference between Python [conda-env:cupid-analysis] and cupid-analysis?

hopefully I am simply setting something e.g., environment incorrectly...

TeaganKing · 2024-08-27T17:56:46Z

Hi @nmizukami , Sorry I let this slip! In the environment in which this was working, did you have a particular version of geopandas or pyogrio pinned? I could also add that to the environment yaml specification. Or when you previously ran into this error, did you have another solution?

This error may be because PROJ is already installed-- I'm not sure where at this point, but can look into that.

nmizukami · 2024-08-27T18:11:21Z

Hi @TeaganKing, some hint is that I can ran outside cupid-run, meaning I can run the notebook manually on jupyterhub with [conda-env:cupid-analysis] env on, but NOT with cupid-analysis on (get similar error on PROJ). You see two similar envs in Jupyter in image below. I believe the package versions should be ok. I can think about this more... I don't know what is the difference between [conda-env:cupid-analysis] and cupid-analysis

TeaganKing · 2024-08-27T20:45:30Z

It sounds like there may be some issue related to the ipykernel installation. I think one of these might be the installation from ipykernel (a soft linked conda environment) and the other may be a conda environment found elsewhere (possibly an outdated cupid-analysis that doesn't include geopandas?). Mike mentioned that the ipykernel installation basically creates a softlink to an environment, which made me think that could be an inconsistency.

I had updated a test environment but not my actual cupid-analysis environment; I'm doing that now and will test your notebook out. This is probably not the most efficient workflow, but I wonder if it might also be worth removing your cupid-analysis environment, see if it's still listed as an option in JupyterHub, make sure that both versions are removed, and then re-install a clean version?

nmizukami · 2024-08-28T15:58:04Z

I did the following steps to remove cupid-analysis env and reinstall it on terminal.

mamba remove --name cupid-analysis --all
mamba env create -f environments/cupid-analysis.yml

It did not fix it. After removing cupid-analysis, jupyterhub still showed cupid-analysis, though [conda-env:cupid-analysis] was gone.

nmizukami · 2024-08-28T17:29:38Z

Hi @TeaganKing , trying to run conda list to see what packages are there in cupid-analysis env when running cupid-run. So unfortunately including conda list in the notebook cause error in cupid run:

SyntaxError: An error happened when checking the source code. 
:25:7: invalid syntax

conda list

nmizukami · 2024-08-28T20:23:38Z

casper-login1:/glade/work/mizukami/CUPiD/examples/coupled_model (main_adding_rof)> cupid-run -rof

/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/ploomber/dag/dag.py:455: UserWarning: 
========================================================================================= DAG render with warnings =========================================================================================
----------------------------------------------------------------- NotebookRunner: index -> File('computed_notebook...ucture/index.ipynb') ------------------------------------------------------------------
----------------------------------------------------------------- /glade/work/mizukami/CUPiD/examples/nblibrary/infrastructure/index.ipynb -----------------------------------------------------------------
These parameters are not used in the task's source code: 'CESM_output_dir', 'lc_kwargs', 'serial', and 'subset_kwargs'
----------------------------------------------------------- NotebookRunner: month_annual_flow -> File('computed_notebook..._annual_flow.ipynb') ------------------------------------------------------------
---------------------------------------------------------------- /glade/work/mizukami/CUPiD/examples/nblibrary/rof/month_annual_flow.ipynb -----------------------------------------------------------------
These parameters are not used in the task's source code: 'CESM_output_dir', 'lc_kwargs', 'serial', and 'subset_kwargs'
============================================================================================ Summary (2 tasks) =============================================================================================
NotebookRunner: index -> File('computed_notebook...ucture/index.ipynb')
NotebookRunner: month_annual_flow -> File('computed_notebook..._annual_flow.ipynb')
========================================================================================= DAG render with warnings =========================================================================================

  warnings.warn(str(warnings_))
Executing: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.03cell/s]
Building task 'month_annual_flow':  50%|███████████████████████████████████████████████████████████████████                                                                   | 1/2 [00:02<00:02,  2.92s/itERROR 1: PROJ: proj_create_from_database: /glade/u/apps/casper/23.10/spack/opt/spack/proj/8.2.1/gcc/12.2.0/7gif/share/proj/proj.db contains DATABASE.LAYOUT.VERSION.MINOR = 2 whereas a number >= 3 is expected. It comes from another PROJ installation.
                                                                                                                                                                                                           /glade/u/apps/opt/conda/condabin/conda                                                                                                                                      | 5/69 [00:20<03:39,  3.44s/cell]
Executing:  90%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍               | 62/69 [03:53<00:26,  3.76s/cell]
Building task 'month_annual_flow': 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [03:56<00:00, 118.06s/it]
Traceback (most recent call last):
  File "/glade/work/mizukami/conda-envs/cupid-dev/bin/cupid-run", line 8, in <module>
    sys.exit(run())
             ^^^^^
  File "/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/work/mizukami/CUPiD/cupid/run.py", line 290, in run
    dag.build()
  File "/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/ploomber/dag/dag.py", line 557, in build
    report = callable_()
             ^^^^^^^^^^^
  File "/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/ploomber/dag/dag.py", line 662, in _build
    raise build_exception
  File "/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/ploomber/dag/dag.py", line 591, in _build
    task_reports = self._executor(dag=self, show_progress=show_progress)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/ploomber/executors/serial.py", line 203, in __call__
    raise DAGBuildError(str(exceptions_all))
ploomber.exceptions.DAGBuildError: 
============================================================================================= DAG build failed =============================================================================================
----------------------------------------------------------- NotebookRunner: month_annual_flow -> File('computed_notebook..._annual_flow.ipynb') ------------------------------------------------------------
---------------------------------------------------------------- /glade/work/mizukami/CUPiD/examples/nblibrary/rof/month_annual_flow.ipynb -----------------------------------------------------------------
---------------------------------------------------------------------------
Exception encountered at "In [24]":
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[24], line 2
      1 column_stat = []
----> 2 gauge_shp_all_case = gauge_shp.copy(deep=True)
      3 for case, grid_name in cases.items():
      4     gauge_shp_all_case = gauge_shp_all_case.merge(
      5         gauge_shp1[case][["id", f"{error_metric}_{grid_name}"]],
      6         left_on="id",
      7         right_on="id",
      8     )

NameError: name 'gauge_shp' is not defined

ploomber.exceptions.TaskBuildError: Error when executing task 'month_annual_flow'. Partially executed notebook available at /glade/work/mizukami/CUPiD/examples/coupled_model/computed_notebooks/quick-run/rof/month_annual_flow.ipynb
ploomber.exceptions.TaskBuildError: Error building task "month_annual_flow"
============================================================================================= Summary (1 task) =============================================================================================
NotebookRunner: month_annual_flow -> File('computed_notebook..._annual_flow.ipynb')
============================================================================================= DAG build failed =============================================================================================

nmizukami · 2024-08-29T13:14:15Z

Hi @TeaganKing,
Small good new is that I got it run without the geopanda error. The trick is to add this
os.environ['PROJ_LIB']='/glade/work/mizukami/conda-envs/cupid-analysis/share/proj'
before loading geopandas.
However, I don't think this is permanent solution. I still try to consult with CISL.

I was able to create /glade/work/mizukami/CUPiD/examples/coupled_model/computed_notebooks/quick-run/_build/html/index.html
How do you usually open under HPC. I was trying to open firefox in derecho/casper, but it is very slow. Wonder if there is any other ways to look.

TeaganKing · 2024-08-29T13:56:08Z

Hi @nmizukami , I'm glad that is temporarily working (but of course we need this to work for any user's environment). Yes, I think this would be a good conversation to have with CISL.

Regarding looking at output, see the second section on this page for recommendations on NCAR machines.

TeaganKing · 2024-09-10T15:53:27Z

Hey @nmizukami , I added a PR to bring rof into run.py. And then I realized these changes are already in this PR... so apologies-- feel free to ignore that!

TeaganKing · 2024-09-10T16:20:54Z

To-do:

update readme to include 'rof' on line 104: -rof, --river-runoff Run river runoff component diagnostics

…put.

…ding_rof

nmizukami · 2024-10-01T22:47:55Z

In month_annual_flow.ipynb, I also have a few additional comments (but the file was too large to render/comment on particular lines):
All sections: Can you please remove empty cells? I think the ‘go back to top’ sections could be removed if this is being run all at once?

Removed empty cells and removed ‘go back to top’ link.

Section 1. Are the lists of cases intended to be regularly used options? Is the yaml file that’s loaded going to be consistent, or is it intended to be updated by user?

Yes, case is CESM case, which CESM modeler put and this has to be provided by the user. I tried adding some comments what need to be provided here. Multiple cases are allowed to compare the results from different cases

2.1 – there’s a typo ‘Mmonthly’

Fixed.

2.6 - can you add a docstring to define what’s happening in this function?

This function is removed.

3.1 – I think this cell may have an empty slice error.

I think I know how this happens. I think this is due to the inconsistent observation data length across the sites (some sites provide the data for a short period). No observed flow data is available depending analysis (or simulation) periods. Can I fix this in another PR?

TeaganKing · 2024-10-02T15:42:22Z

Thanks for addressing some of these comments!

I made issue ticket #140 for the empty slice error-- that's fine by me to address in another PR as long as the notebook otherwise runs smoothly (which it does) and users are informed (by the issue ticket) of the error that needs to be fixed.

Ok, I wanted to make sure that we are using the cases values in the config.yml file and that users don't actually need to update anything in the notebook once the cell is tagged as a parameters cell.

mnlevy1981 · 2024-10-03T15:32:31Z

I started to look at this, and I have a lot of questions and suggestions -- @nmizukami and @TeaganKing could we try to find a time early next week to meet? I'd like to discuss a few things that might be tough to squeeze into an in-line review on this PR. Some initial comments, though (and maybe these changes will make it a little easier to go back through line-by-line asynchronously):

It looks like this is failing the pre-commit style checks
I noticed in month_annual_flow.ipynb you have a logical flag parallel that enables using PBSCluster when set to true. If you look at examples/nblibrary/ocn/ocean_surface.ipynb you'll see how CUPiD already passes a serial flag and uses a LocalCluster when that is set to false... I haven't looked at the other runoff notebooks, but we need to avoid casper- or derecho-specific blocks of code
The notebooks in examples/nblibrary/rof should not have any output in them

nmizukami · 2024-10-03T15:38:35Z

Hi Mike (@mnlevy1981) and Teagan (@TeaganKing), yes, I can meet on Monday or Tuesday (my calendar is up-to date). I was wondering about PBSCluster (2nd point)

nmizukami · 2024-10-07T21:49:23Z

To-do

replaced PBSCluster with LocalCluster, then follow the example in ocn notebook (use serial logical flag)
activated cupid-dev and then Installed pre-commit under CUPiD/.git/hook.
clear outputs from all the cells.
add key_metrics/config.yml and modify case:grid input.

nmizukami · 2024-10-13T03:00:50Z

Updated key_metrics/config.yml and coupled_model/config.yml for rof
modify two notebooks based on config changes so now they run.

Review is needed and some science questions came up (e.g., what to do if you plot for time period when no observation is available. Are the other notebooks comparing the model outputs with observations??)

TeaganKing · 2024-10-14T15:34:44Z

Hey @nmizukami , thanks for these updates.

Not all notebooks are comparing with observations, but you can see an example of an observational comparison in the glacier notebook & corresponding config.yml details.

I think that if you are plotting for a time period where observations are not available, perhaps a warning statement that the obs are unavailable would be useful?

TeaganKing · 2024-10-14T15:36:19Z

And I'll review after our discussion on Thursday.

nmizukami · 2024-10-14T16:18:25Z

Right now I am pointing to case /glade/campaign/cesm/development/cross-wg/diagnostic_framework/CESM_output_for_testing/b.e23_alpha16b.BLT1850.ne30_t232.054

The time period for this case is year 0001-0102, when for sure there is no observation for any components. So I thought this config is meant to compare the simulation with some base simulation, to see the model comparison or something like that, not meant to validate the model component with observations.

Just wanted to understand the context of this setup. just with current config, the rof notebooks look less interesting, but technically the notebook works now (I believe).

If config point to any CESM cases that use 20th-21st century, rof notebook automatically adds the observed streamflow to the plots, and compare the simulations with observations.

TeaganKing · 2024-10-14T16:34:44Z

The key setup here that's different from the coupled-model example is in the 'global params' section of the config file, where we have both a case name for the case you're looking at, as well as the base_case_name for a comparison case. The observations are defined separately in each individual notebook config section at this point.

TeaganKing · 2024-10-14T16:35:19Z

That sounds good that plots are generated without obs if obs do not exist.

TeaganKing · 2024-10-17T16:22:55Z

@nmizukami is planning to do the following:

implement comparison with base_case in addition to obs
remove years that overwrite config file start/end years
include analysis-period configuration parameter to specify e.g. 10 years so that users don't need to run 100 years unless they really want to do so.
test cupid-run from key_metrics directory

Once these items are done, @TeaganKing can review.

nmizukami · 2024-10-18T23:48:42Z

@nmizukami is planning to do the following:

implement comparison with base_case in addition to obs

remove years that overwrite config file start/end years

include analysis-period configuration parameter to specify e.g. 10 years so that users don't need to run 100 years unless they really want to do so.

test cupid-run from key_metrics directory

Once these items are done, @TeaganKing can review.

Hi @TeaganKing, all are done!

nmizukami and others added 7 commits August 22, 2024 15:08

added rof notebook config example

c4e8a29

added rof notebooks with ancillary data and scripts

9402085

reformatting with pre-commit

aa80717

change one of key names in setup and notebook changes associated with…

c784061

… this, and change from pre-commit reformatting

Update config.yml to include rof in timeseries and book_toc

a652359

Update run.py to include rof

f9505d0

updates for pre-commit

9acbf78

replace math sqrt with numpy sqrt

b45413a

TeaganKing self-requested a review August 23, 2024 20:11

TeaganKing assigned nmizukami Aug 23, 2024

TeaganKing added lnd enhancement New feature or request labels Aug 23, 2024

forgot uncommenting other component configs

c5a52ae

TeaganKing mentioned this pull request Sep 10, 2024

Include rof in run.py #132

Closed

6 tasks

nmizukami and others added 4 commits September 12, 2024 10:41

adding new script to compute various flow metrics

bda0fa8

remove unused packages, rename setup variables

1f7c8aa

added one new rof notebook

5cc7d25

update readme and config for clarification purposes

3beca30

nmizukami added 4 commits October 1, 2024 08:19

added ocean_discharge notebook in chapter. added a comment on case in…

2f51d2d

…put.

Merge remote-tracking branch 'nmizukami/main_adding_rof' into main_ad…

de81559

…ding_rof

remove case_dir input b/c it is not used

2ac3101

Cleanup based on review comments e.g., remove empty cells, etcs

2bd5c36

nmizukami added 9 commits October 11, 2024 10:03

change dask setupt

35df64d

rof updates

04f9c53

rof sepcific setup update

bb894a3

automated reformat

b57be4f

just temporal comment out

1aa1654

auto-formatting

7ca263a

uncomment ocean_discharge notebook config

eb22c6b

key values updates and remove some items

26fed7c

automated format fix on ocean-discharge notebook

413239a

nmizukami added 3 commits October 18, 2024 17:36

rof specific setup changes

d4efeeb

a few rof parameters added in config.yml

1e26410

auto reformatting

4b2a3b2

auto formatting

e9174f1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding a few rof notebooks #126

Adding a few rof notebooks #126

nmizukami commented Aug 23, 2024 •

edited by TeaganKing

Loading

TeaganKing commented Aug 23, 2024

nmizukami commented Aug 23, 2024

TeaganKing commented Aug 27, 2024

nmizukami commented Aug 27, 2024 •

edited

Loading

TeaganKing commented Aug 27, 2024 •

edited

Loading

nmizukami commented Aug 28, 2024

nmizukami commented Aug 28, 2024

nmizukami commented Aug 28, 2024

nmizukami commented Aug 29, 2024 •

edited

Loading

TeaganKing commented Aug 29, 2024

TeaganKing commented Sep 10, 2024 •

edited

Loading

TeaganKing commented Sep 10, 2024 •

edited

Loading

nmizukami commented Oct 1, 2024

TeaganKing commented Oct 2, 2024

mnlevy1981 commented Oct 3, 2024

nmizukami commented Oct 3, 2024

nmizukami commented Oct 7, 2024 •

edited

Loading

nmizukami commented Oct 13, 2024

TeaganKing commented Oct 14, 2024

TeaganKing commented Oct 14, 2024

nmizukami commented Oct 14, 2024

TeaganKing commented Oct 14, 2024

TeaganKing commented Oct 14, 2024

TeaganKing commented Oct 17, 2024

nmizukami commented Oct 18, 2024 •

edited

Loading

Adding a few rof notebooks #126

Are you sure you want to change the base?

Adding a few rof notebooks #126

Conversation

nmizukami commented Aug 23, 2024 • edited by TeaganKing Loading

All Submissions:

New Feature Submissions:

Changes to Core Features:

TeaganKing commented Aug 23, 2024

nmizukami commented Aug 23, 2024

TeaganKing commented Aug 27, 2024

nmizukami commented Aug 27, 2024 • edited Loading

TeaganKing commented Aug 27, 2024 • edited Loading

nmizukami commented Aug 28, 2024

nmizukami commented Aug 28, 2024

nmizukami commented Aug 28, 2024

nmizukami commented Aug 29, 2024 • edited Loading

TeaganKing commented Aug 29, 2024

TeaganKing commented Sep 10, 2024 • edited Loading

TeaganKing commented Sep 10, 2024 • edited Loading

nmizukami commented Oct 1, 2024

TeaganKing commented Oct 2, 2024

mnlevy1981 commented Oct 3, 2024

nmizukami commented Oct 3, 2024

nmizukami commented Oct 7, 2024 • edited Loading

nmizukami commented Oct 13, 2024

TeaganKing commented Oct 14, 2024

TeaganKing commented Oct 14, 2024

nmizukami commented Oct 14, 2024

TeaganKing commented Oct 14, 2024

TeaganKing commented Oct 14, 2024

TeaganKing commented Oct 17, 2024

nmizukami commented Oct 18, 2024 • edited Loading

nmizukami commented Aug 23, 2024 •

edited by TeaganKing

Loading

nmizukami commented Aug 27, 2024 •

edited

Loading

TeaganKing commented Aug 27, 2024 •

edited

Loading

nmizukami commented Aug 29, 2024 •

edited

Loading

TeaganKing commented Sep 10, 2024 •

edited

Loading

TeaganKing commented Sep 10, 2024 •

edited

Loading

nmizukami commented Oct 7, 2024 •

edited

Loading

nmizukami commented Oct 18, 2024 •

edited

Loading