Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create global time series Viewers #616

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

forsyth2
Copy link
Collaborator

Allow single variable global time series plots. Resolves #601.

@forsyth2 forsyth2 added the semver: new feature New feature (will increment minor version) label Jul 23, 2024
@forsyth2 forsyth2 self-assigned this Jul 23, 2024
@forsyth2
Copy link
Collaborator Author

@chengzhuzhang @thorntonpe @BunnyVon Here's my initial implementation for #601. Thoughts/suggestions?
Example output
Screenshot 2024-07-23 at 1 15 51 PM

The HTML is sparse, but effective. If the HTML needs to look more like the E3SM Diagnostics viewer, that would involve more significant, time-intense front-end work.

The PDFs are multi-page, with one variable per page. The PNGs are one variable per PNG.

The implementation method is for the user to set nrows and ncols. If both are set to 1, then zppy will handle things a bit differently -- doing individual plots rather than multiple plots per page.

@BunnyVon
Copy link

the visualization looks great to me and fits the purpose. Could you clarify a few questions I have below?

  1. does this implementation allow the users to select any 2D land variables and 2D atm variables?
  2. LAISHA is the sum of LAI and SHA in EAM. In ELM, there is an equivalent variable called TLAI. V3 uses active BGC for production runs. my question is, are they truly the same in E3SM, and when circumstances like that happen, would you be able to add recommendations or notes to users in the variable tables?

@forsyth2
Copy link
Collaborator Author

does this implementation allow the users to select any 2D land variables and 2D atm variables

In theory, yes. I haven't tested with all of them. I know at our last meeting @thorntonpe mentioned he could provide a full list of 400+ 2D variables to work with.

are they truly the same in E3SM

I'm not sure. @chengzhuzhang do you know more about this?

and when circumstances like that happen, would you be able to add recommendations or notes to users in the variable tables?

That might be beyond the scope of zppy. I guess we could maybe try to provide some parenthetical information in the subplot titles? Or warnings in the output?

@BunnyVon
Copy link

yeah, I think you are right. the user should know what they want to plot :) please ignore that question. @forsyth2 @chengzhuzhang

@forsyth2 forsyth2 marked this pull request as ready for review July 26, 2024 23:27
@forsyth2
Copy link
Collaborator Author

@BunnyVon I was reviewing my notes from our June 14 meeting. It looks like the highest priority item was just setting up a basic HTML page with links to single plots. That is accomplished by this pull request.

The second highest priority item was to categorize those plots (@thorntonpe suggested Energy, Water, Carbon). But it looks like we never found a good way to automatically determine a plot's category (one suggestion was to try basing it on the variables' units). @chengzhuzhang recently suggested at least categorizing the plots so that the global, Northern hemisphere, and Southern hemisphere plots are grouped together per-variable.

So, at this point, the question is what level of categorization we want for this pull request:

  1. Just merge this, it's useful enough.
  2. At least categorize/sub-bullet the plots so all the variables of a single variable are grouped together. This is reasonable, but will require some more front-end work. I'm going to try to implement this.
  3. Categorize the variables by Energy, Water, Carbon (or some other categorization scheme). This would require more front-end work and would also require us to have some scheme of categorizing variables (either an automatic method or a list of Land variables and their respective categories).

As for the front-end code, I'm going to try using the output_viewer library, which is what the e3sm_diags viewer uses. We'll see how closely I can emulate Diags, but it may end up being a basic bullet/sub-bullet list of plots.

@chengzhuzhang
Copy link
Collaborator

chengzhuzhang commented Jul 30, 2024

@forsyth2 I think the code change at this point effectively generated one file per each variable. It is the first step to create the html page. As I mentioned in EZ meeting yesterday, we can adopt output_viewer which is also used in e3sm_diags, it can be used to produce familiar looking webpages. There is instruction to generate an html page here: https://docs.e3sm.org/e3sm_diags/_build/html/main/dev_guide/using-cdp-output-viewer.html

For now, each variable has 3 plots, global mean, NH mean, and SH mean. Each component can be a group, for each group, the row number is the number of variables, and results time-series plots can have 3 columns: Global Average, NH Average, and SH Average, each links to a corresponding plot. I'm thinking something like following:


Atmospheric Variables        Description                                                 Time-series Plots
atm_var1                              atm_var1.standard_name/ long_name       Global, NH, SH
atm_var2
atm_var3

Land  Variables                   Description                                                   Time-series plots
land_var1                             land_var1.standard_name/ long_name       Global, NH, SH

Let me know if this makes sense, and happy to clarify.

@forsyth2
Copy link
Collaborator Author

@chengzhuzhang Yes, your notes here will help with implementing option 2 in my comment above. Thanks, I will get started on that.

I think to must fully implement the categories (option 3) though, we'll need the extra input I described from the Land team.

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Jul 30, 2024

@chengzhuzhang @tomvothecoder Does output_viewer re-introduce CDAT dependency to zppy? Both cdp and output_viewer are listed as CDAT packages in my CDAT-finding script (E3SM-Project/e3sm_diags#803).

What was the solution for the e3sm_diags CDAT migration -- I thought you ended up incorporating the output_viewer code into e3sm_diags, no?

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Jul 30, 2024

I'm not able to glean much info from E3SM-Project/e3sm_diags#628. Is the output_viewer dependency independent of CDAT?

EDIT: It's definitely a CDAT dependency: https://anaconda.org/cdat/output_viewer shows https://github.com/ESGF/output_viewer listed as a CDAT package.

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Jul 30, 2024

I'd also like to note this is another reason global_time_series should really be its own plotting package (#398) -- we're adding dependencies to zppy which are really outside its scope. zppy coordinates packages in the post-processing workflow. That is its function, not plotting, and yet we're now adding a dependency on a plotting-display/visualization package.

I realize for tech debt purposes this is unlikely to change, but I think we should still note how that tech debt negatively impacts the modularity of our codebase.

@tomvothecoder
Copy link
Collaborator

@chengzhuzhang @tomvothecoder Does output_viewer re-introduce CDAT dependency to zppy? Both cdp and output_viewer are listed as CDAT packages in my CDAT-finding script (E3SM-Project/e3sm_diags#803).

What was the solution for the e3sm_diags CDAT migration -- I thought you ended up incorporating the output_viewer code into e3sm_diags, no?

I'm not able to glean much info from E3SM-Project/e3sm_diags#628. Is the output_viewer dependency independent of CDAT?

EDIT: It's definitely a CDAT dependency: anaconda.org/cdat/output_viewer shows ESGF/output_viewer listed as a CDAT package.

e3sm_diags does not use the CDAT version of the output_viewer anymore (E3SM-Project/e3sm_diags#628)

Before:
e3sm_diags -> CDAT cdp output_viewer -> ESGF output_viewer

Now:
e3sm_diags -> ESGF output_viewer

@tomvothecoder
Copy link
Collaborator

The ESGF output_viewer is the baseline output viewer package: https://anaconda.org/conda-forge/output_viewer

@BunnyVon
Copy link

@forsyth2 , thank you for working on this and having it completed.
regarding the category, I think you can start with ilamb category shown in the scorecard I recently made https://www.ilamb.org/ELM/ilamb_e3sm_cbgcv2/e3sm/ . however, it has much fewer variables than @thorntonpe 's list.

@forsyth2
Copy link
Collaborator Author

@tomvothecoder Ok, so even though anaconda.org/cdat/output_viewer shows ESGF/output_viewer listed as a CDAT package, it does NOT introduce a CDAT dependency by including it?

@forsyth2
Copy link
Collaborator Author

@BunnyVon Ah good idea to match up with the ILAMB categories. I'll try incorporating that. Thanks!

@forsyth2
Copy link
Collaborator Author

@tomvothecoder Looking at E3SM-Project/e3sm_diags#628 > fixed by E3SM-Project/e3sm_diags#773 > e3sm_diags/viewer/core_viewer.py, it appears you've basically just rewritten the CDAT code from scratch in e3sm_diags. If zppy now needs that too, wouldn't it make sense to break that out into its own package that both e3sm_diags and zppy call? Or since it's only one Python file, is it better to take the low overhead route and just duplicate e3sm_diags/viewer/core_viewer.py in zppy?

@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Jul 31, 2024

@tomvothecoder Ok, so even though anaconda.org/cdat/output_viewer shows ESGF/output_viewer listed as a CDAT package, it does NOT introduce a CDAT dependency by including it?

anaconda.org/cdat is the Anaconda channel for CDAT. This does not necessarily mean all of the packages in that channel are "CDAT packages". The ESGF output_viewer is a standalone package that is a dependency of CDAT (cdp) and not the other way around, which probably explains why it is hosted in that channel (conda-forge wasn't being used back then).

Note, the ESGF output_viewer package is also hosted on conda-forge (comment above).

@tomvothecoder Looking at E3SM-Project/e3sm_diags#628 > fixed by E3SM-Project/e3sm_diags#773 > e3sm_diags/viewer/core_viewer.py, it appears you've basically just rewritten the CDAT code from scratch in e3sm_diags. If zppy now needs that too, wouldn't it make sense to break that out into its own package that both e3sm_diags and zppy call? Or since it's only one Python file, is it better to take the low overhead route and just duplicate e3sm_diags/viewer/core_viewer.py in zppy?

It makes more sense to write an OutputViewer class specific to the needs of zppy. e3sm_diags' core_viewer.py consists of just single small class called OutputViewer, which uses modules from the ESGF output_viewer package.

Alternatively, zppy can import the OutputViewer class directly from e3sm_diags (e.g., from e3sm_diags.viewer.core_viewer import OutputViewer).

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Aug 1, 2024

Note to self, mostly: using the output_viewer to display these plots nicely is in fact non-trivial. I'm able to generate css and js directories, but not the viewer itself. Thus, in an effort to better understand everything involved with producing the viewer, I'll now try to follow the call-path for the viewer in e3sm_diags on the cdat-migration-fy24 branch. (The numbering scheme attempts to capture the function call hierarchy in a sort-of depth-first-search fashion, where number is the level and letter is the node on that level).

1a. e3sm_diags/e3sm_diags_driver.py main has the following block:

            path = os.path.join(parameters_results[0].results_dir, "viewer")
            if not os.path.exists(path):
                os.makedirs(path)

            index_path = create_viewer(path, parameters_results)
            logger.info("Viewer HTML generated at {}".format(index_path))

In particular:

index_path = create_viewer(path, parameters_results)

1a-2a. e3sm_diags/viewer/main.py create_viewer calls:

viewer_function = SET_TO_VIEWER[set_name]

If we use "lat_lon": default_viewer.create_viewer,, the number of functions used is quite high. So, first let's try a simpler viewer to emulate and see if that's all we need. Let's use "enso_diags": enso_diags_viewer.create_viewer,. That brings us to:

1a-2a-3a. e3sm_diags/viewer/enso_diags_viewer.py create_viewer calls:

viewer = OutputViewer(path=root_dir)

1a-2a-3a-4a. e3sm_diags/viewer/core_viewer.py OutputViewer.__init__. The class is essentially a wrapper around the ESGF output_viewer, as mentioned in comments above. Let's go back up a level.

1a-2a-3a. e3sm_diags/viewer/enso_diags_viewer.py create_viewer calls:

meta=create_metadata(param),

1a-2a-3a-4b. e3sm_diags/viewer/default_viewer.py create_metadata -- this function is perhaps not necessary for us to worry about.

1a-2a-3a. e3sm_diags/viewer/enso_diags_viewer.py create_viewer calls:

add_header(root_dir, os.path.join(root_dir, url), parameters)

1a-2a-3a-4c. e3sm_diags/viewer/utils.py add_header -- this function is perhaps not necessary for us to worry about.

1a-2a-3a. e3sm_diags/viewer/enso_diags_viewer.py create_viewer calls:

h1_to_h3(os.path.join(root_dir, url))

1a-2a-3a-4d. e3sm_diags/viewer/utils.py h1_to_h3 -- this function is perhaps not necessary for us to worry about.

1a-2a. e3sm_diags/viewer/main.py create_viewer calls:

index_url = create_index(root_dir, title_and_url_list)

1a-2a-3b. e3sm_diags/viewer/main.py create_index (by way of enclosed function insert_data_in_row) calls:

path = os.path.join(e3sm_diags.INSTALL_PATH, "viewer", "index_template.html")

So, we'd need some different path because clearly e3sm_diags.INSTALL_PATH wouldn't be what we want.

1a-2a-3b. e3sm_diags/viewer/main.py create_index calls:

utils.add_header(root_dir, index_url, parameters)

which was described previously (1a-2a-3a-4c).

I'll try out copying some of the e3sm_diags viewer code and adjusting it now that I understand the process better.

@forsyth2 forsyth2 force-pushed the issue-601-glb-single-plots branch 2 times, most recently from 27dd400 to 9249d1b Compare August 1, 2024 18:31
@forsyth2
Copy link
Collaborator Author

forsyth2 commented Aug 1, 2024

Visuals

Alright, the second commit (9249d1b) has produced a very basic viewer (with UNIQUE_ID set as test-601-output-viewer).

Screenshot 2024-08-01 at 11 32 38 AM

Accomplished:

  • A table is produced with variables listed on the left and regions listed on the top.
  • Plot previews appear when hovering over links.

Need to add/fix:

  • Web address should have /viewer/ not /table/
  • Rows should have header rows for Atmosphere and Land at the very least. Ideally Land would have sub-headers to match the ILAMB categorization scheme.
  • All the links just say title -- they should actually have useful titles.

Directory structure

The produced directory structure:

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_global_time_series_single_plots_output/test-601-output-viewer/v3.LR.historical_0051/post/scripts/global_time_series_1985-1995_results
$ ls 
table                                     v3.LR.historical_0051_glb_lnd.pdf       v3.LR.historical_0051_n_lnd_LAISHA.png  v3.LR.historical_0051_s_lnd_FSH.png
v3.LR.historical_0051_glb_atm.pdf         v3.LR.historical_0051_glb_lnd_RH2M.png  v3.LR.historical_0051_n_lnd.pdf         v3.LR.historical_0051_s_lnd_LAISHA.png
v3.LR.historical_0051_glb_atm_TREFHT.png  v3.LR.historical_0051_n_atm.pdf         v3.LR.historical_0051_n_lnd_RH2M.png    v3.LR.historical_0051_s_lnd.pdf
v3.LR.historical_0051_glb_lnd_FSH.png     v3.LR.historical_0051_n_atm_TREFHT.png  v3.LR.historical_0051_s_atm.pdf         v3.LR.historical_0051_s_lnd_RH2M.png
v3.LR.historical_0051_glb_lnd_LAISHA.png  v3.LR.historical_0051_n_lnd_FSH.png     v3.LR.historical_0051_s_atm_TREFHT.png  viewer

$ ls table
index.html  summary-table

$ ls viewer
css  js

Compare that with example output from e3sm_diags:

$ cd /lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/issue-346-diags-20240731/v3.LR.historical_0051/e3sm_diags/atm_monthly_180x360_aave/model_vs_obs_1987-1988/viewer
$ ls
cmip6  cmip6-comparison-data  diurnal_cycle  index.html  index.json  lat_lon  table  table-data  taylor  taylor-diagram-data  viewer

$ ls table
index.html  summary-table

$ ls viewer
css  e3sm_logo.png  js

The directory structures actually appear to match up ok, however there is no index.html in this commit's version of the viewer.

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Aug 1, 2024

Ok with the third commit (cd5b100), I am able to categorize the variables. I dislike the hard-coded variable categorization, but I suppose it will have to do if we have no other method.

Results

Screenshot 2024-08-01 at 2 51 49 PM

@BunnyVon How does this look? It doesn't have all the bells and whistles of E3SM Diags, but it is the same general display. I tried to categorize variables the best I could using ilamb.cfg, but I don't think I did it totally correctly.

@BunnyVon
Copy link

BunnyVon commented Aug 2, 2024

I think it looks great. what do you think, @thorntonpe?

@forsyth2 , in terms of the variables, I am finalizing my preprocessing script with Nate Collier, which has the info for the conversion between ilamb variables and E3SM variables for the ilamb scorecard. It's soon to be completed, and I can share afterward.

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Aug 2, 2024

which has the info for the conversion between ilamb variables and E3SM variables for the ilamb scorecard

Awesome, thanks @BunnyVon!

@thorntonpe
Copy link

@forsyth2 and @BunnyVon - After a hectic few weeks I am now able to give this thread the attention it deserves. Thanks for your patience. The overall timeseries plotting capability looks excellent - the plots themselves have good graphic quality and having them all accessible through a single index page is exactly what we need. Thank you!
In our land group call today we will review and provide further feedback. I will also be following up soon with the full list of variables, and the categorization we would like to see in the master list. Even before the group discussion, I know there are a few changes that we would like to see. I'll list them below, and then will follow up with further thoughts from the group.

  1. Each variable has a long name on the history file. It will be helpful to display that below the variable name in the title block for each plot. That should help to address questions like the one Sha raised above about LAISHA.
  2. Some flux variables (such as GPP) are best presented as annual global (or N/S regional) totals. That means doing the multiplication by time and the totaling by (gridcell area * landfraction). Those typically also need to be scaled, for example from grams carbon to petagrams carbon. Other flux variables (such as QVEGT) are best displayed as global mean values. When we provide the full list of variables we can also provide in a format that includes columns of information for each variable about how to present it in the plots. It would be good to get together with you to discuss the best way to transmit that information.
  3. We would eventually like to have the ability to plot two simulations together, perhaps distinguished by red vs blue lines. Does this seem feasible?
    Thanks again for all your work on this.

@forsyth2
Copy link
Collaborator Author

Thanks @thorntonpe!

To your points:

  1. I know E3SM Diags has short and long name functionality. I can see if something similar is simple to extract/implement here.
  2. Yes, a meeting would be good to discuss precise needs/methods.
  3. I don't think this is impossible, but I do foresee it being somewhat involved. E3SM Diags has model-vs-model functionality but otherwise most of zppy is focused on processing one simulation's output, and adding in the Diags mvm functionality has been involved (e.g., a lot of duplication of parameters). I'd need to do some prototyping to see for sure, but I imagine that could be a pretty big overhaul/refactoring of zppy/the global-time-series task specifically to be able to handle that (especially if you wanted more than two simulations' outputs)

@forsyth2
Copy link
Collaborator Author

Next steps with this pull request:

  • Land team will provide CSV of the form variable name, total or area, scale factor, new units, category label
  • That CSV will be used as a metadata look-up table for Python to construct a dictionary to organize the viewer.
  • There should be a separate viewer for each component (e.g., atm, lnd).
  • Goal is for this to be merged by the next zppy / E3SM Unified release.

After this is merged: with the viewers looking how we want for one simulation, we'll follow an approach similar to Diags model-vs-model (add equivalent parameters for ref) to get a second line plotted on the graphs. The start years may have to be adjusted. Two options 1) make each "years since start" or 2) put one x axis on top and another on the bottom

@forsyth2
Copy link
Collaborator Author

@czender We had some questions come up about how NCO calculates the global averages. The Land team would like to process some variables as "total" and others as "average". We're currently only getting the global averages.

  1. To get the "total", can we just multiply the weighted average by the global area? Or would we need an NCO update to give the global integral? Is NCO using landfrac of the grid cell or the entire grid cell?
  2. Does NCO take care of months being different lengths in that weighting?

@czender
Copy link

czender commented Aug 22, 2024

@forsyth2 The averages that NCO produces from EAM simulations are true global averages. The "global" averages that NCO produces from ELM simulations are by definition global land-only averages. They are computed by summing each field (e.g., FSDS) over every valid ELM gridcell, multiplied by the gridcell area, multiplied by landfrac. This "global" sum is then normalized by the valid land area, i.e., by the sum of area*landfrac.

Responses to your questions:

  1. Sort of. To convert ELM averages from a mean to a total, multiply them by the global land area, i.e., the sum of area*landfrac. To convert EAM averages from a mean to a total, multiply them by the true global area. Global totals for both EAM and ELM variables could be obtained from the current climos without first updating NCO. The true global land area (for EAM) and the land-only area (for ELM) are time-invariant and can be computed once offline and used by any subsequent processing to convert averages to totals for select variables.
    ncclimo could be modified to produce timeseries of all totals or all means with the addition of a new switch/option. However, it sounds like ELM users might want some variables as global totals (e.g., CO2 fluxes) and some as global means (e.g., temperature). Modifying ncclimo to input which fields get which treatment would be a significant project. It might be better accomplished in E3SM diags since the treatment will be intimately connected to how the variable is plotted. LMK if you'd like me to perform either of the two potential ncclimo modifications that have been mentioned.
  2. Yes, NCO properly weights months based on the model (365-day) calendar.

@czender
Copy link

czender commented Aug 27, 2024

FYI the latest NCO snapshot includes a new option to control the global/regional statistics that ncclimo computes for timeseries: --rgn_stt=avg|sum. When invoked with --rgn_stt=avg the behavior is identical to the current output timeseries which are averages. When invoked with --rgn_stt=sum the averaged field is multiplied by the sum of area variable. While this would work fine if the area were given in m2 and the field was given in m-2, that is rarely the case. ELM uses km2, so a scale factor of one million is needed to correct the sum for many variables. EAM uses sr-1 for area so it needs a different scale factor. The ncclimo accepts a second new option --sum_scl=sum_scl. This scale factor will multiply the integrated field value. The whole procedure is model and variable-specific so it's hard to have a generic solution on the front end, and I thought this generic feature of a simple scale factor might be useful. Example usage is:

ls v3.LR.piControl.elm.*.nc | ncclimo -P elm --split --rgn_stt=sum --sum_scl=1.0e6 -v GPP -s 460 -e 461 --drc_out=${DATA}/ne30/rgn

Feedback welcome.

@chengzhuzhang
Copy link
Collaborator

@czender thank you for having this implemented so quickly. I'm tagging land model developers @thorntonpe @BunnyVon and @darincomeau about this new nco development and for feedbacks.

Copy link
Collaborator Author

@forsyth2 forsyth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I have the Viewer categorized by the groupings in the csv:
Screenshot 2024-10-01 at 7 34 04 AM

Full results at https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_min_case_global_time_series_single_plots_www/test-616-20241001/v3.LR.historical_0051/global_time_series/global_time_series_1985-1995_results/viewer/

Remaining To Do items are listed in my comments here.

@thorntonpe also noted that once a few “A” and “T” variables are working, the Land team would like to do some checks against outputs from their prior workflow.

Comment on lines 1151 to 1165
# TODO: make viewer home page to point to multiple viewers
url = create_viewer(figstr, regions, "lnd", plots_lnd)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Right now, the viewer only exists for land variables. We'll need to create a viewer for each component (e.g., atm, ice, ocn).

(Again, a note on design and tech debt: this is very much outside the scope of zppy as a post-processing workflow coordinator. At this point, Global Time Series is for all intents-and-purposes a plotting package fully contained within zppy due to legacy reasons).

# the name of the ELM variable on the monthly h0 history file
self.variable_name = csv_row[0]
# “A” or “T” for global average over land area or global total, respectively
self.metric = csv_row[1]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Actually use A or T.

Going by @czender's comments #616 (comment) and #616 (comment), it seems like the logic change would belong more in ts.py than coupled_global.py. That would mean ts.py would also have to parse the csv.

In any case, at some point, a decision on what to plot (A or T) must be made based on this csv. Currently, the csv parsing is done very late, after the plots themselves have been generated. This parsing will need to be moved much earlier in order to propagate the A/T information before it is needed.

# “A” or “T” for global average over land area or global total, respectively
self.metric = csv_row[1]
# the factor that should convert from original units to final units, after standard processing with nco
self.scale_factor = csv_row[2]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: design decision -- should this be done at the ts task, or at the global_time_series task? I.e., is it only relevant to plotting?

# test string for the units as given on the history file (included here for possible testing)
self.original_units = csv_row[3]
# the units that should be reported in time series plots, based on A/T and Scale Factor
self.final_units = csv_row[4]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Again, the csv parsing will need to be moved earlier on to before the plots are generated (rather than right before they're organized into a viewer)

# a name used to cluster variables together, to be separated in groups within the output web pages
self.group = csv_row[5]
# Descriptive text to add to the plot page to help users identify the variable
self.long_name = csv_row[6]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Again, the csv parsing will need to be moved earlier on to before the plots are generated (rather than right before they're organized into a viewer)

@forsyth2
Copy link
Collaborator Author

Rebased off main, after the modularizing refactor of #628. Luckily, #628 touched the <task>.py files whereas this pull request is more focused on the .py files called from global_time_series.bash, so changes were minimal: only zppy/global_time_series.py had a minor merge conflict.

@forsyth2 forsyth2 changed the title Allow single variable global time series plots Create global time series Viewers Oct 21, 2024
@forsyth2 forsyth2 force-pushed the issue-601-glb-single-plots branch 3 times, most recently from 75c9757 to 8ef71d5 Compare October 24, 2024 00:16
Copy link
Collaborator Author

@forsyth2 forsyth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest status update

Comment on lines +26 to +29
# Run this test suite in the environment the global_time_series task runs in.
# I.e., whatever `environment_commands` is set to for `[global_time_series]`
# NOT the zppy dev environment.
# Run: python -u -m unittest tests/global_time_series/test_coupled_global.py
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These unit tests are specific to coupled_global.py. I tried to add as many unit tests as I could, but the many functions that rely on IO (notably reading data or plotting data) can't be well tested outside of integration testing/the "min-case" tests.

Importantly, these unit tests can't be run like the other unit tests. They need to be run from the environment that zppy is running the global_time_series task. That is, these tests likely need to be run from E3SM Unified rather than the zppy dev environment. To facilitate testing I also just moved the contents of readTS.py into coupled_global.py.

(Note: this testing setup again highlights how we need to be wary of scope creep in zppy. Here we're really testing the de facto Global Time Series package, not so much zppy itself. See #398)

@@ -0,0 +1,46 @@
[default]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the min-case cfg template I'm using for testing this PR.

@@ -49,13 +50,18 @@ def global_time_series(config, script_dir, existing_bundles, job_ids_file):
c["global_time_series_dir"] = os.path.join(script_dir, f"{prefix}_dir")
if not os.path.exists(c["global_time_series_dir"]):
os.mkdir(c["global_time_series_dir"])
scripts = ["coupled_global.py", "readTS.py", "ocean_month.py"]
scripts = ["coupled_global.py", "ocean_month.py"]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contents of readTS.py are now in coupled_global.py, as mentioned in an earlier comment.

Comment on lines +303 to +310
if (metric == Metric.AVERAGE) or (metric == Metric.TOTAL):
annual_average_dataset_for_var: xarray.core.dataset.Dataset = (
self.f.temporal.group_average(var, "year")
)
data_array = annual_average_dataset_for_var.data_vars[var]
# elif metric == Metric.TOTAL:
# # TODO: Implement this!
# raise NotImplementedError()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomvothecoder @chengzhuzhang I will look into this further, but in case one of you has an immediate answer:

We currently use group_average from XCDAT to calculate the global annual average for variables. The Land team however needs to compute averages of some variables and totals for others. It looks like we don't need to modify the ts task then; we just need to modify the calculation here. What I'm not sure of is what calculation to use. Multiply each year's average by a constant (e.g., landfrac * surface area)? Use a different xCDAT function?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@forsyth2 I believe the formula to compute global sum from global average is to multiply global average by a scalar: the sum of area*landfrac. Maybe you could check if area and landfrac are available from the .nc file generated by ts global task.

Comment on lines +1309 to +1311
# new_url = f"viewer_{component}"
# # shutil.move("viewer", new_url)
# distutils.dir_util.copy_tree("viewer", new_url)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any time I try to move/copy the land viewer over to a different dir (e.g., viewer_lnd), the formatting goes bad. I'm assuming some formatting file it's referencing isn't also getting copied over.

What this means though is that I can't seem to properly make different viewer directories for different components.

url = create_viewer(parameters, vars, component)
print(url)
title_and_url_list.append((component, url))
# index_url: str = create_viewer_index(parameters.case_dir, title_and_url_list)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chengzhuzhang I will work on this more, but I'm having some difficulties with setting up multiple viewers (see earlier comment) and then getting them to show up on an index page together. I think I just need to match up file paths and URLs correctly.

The problem is viewer seems to be used everywhere, so it's difficult to differentiate between viewers. For instance, in https://github.com/E3SM-Project/e3sm_diags/blob/7b054251e54f223e44ec1e15908548e8e1797744/e3sm_diags/viewer/index_template.html, the code block

  <link href="viewer/css/bootstrap.min.css" rel="stylesheet" type="text/css"/>
  <link href="viewer/css/viewer.css" rel="stylesheet" type="text/css"/>
  <script src="viewer/js/jquery-2.2.3.min.js" type="text/javascript">
  </script>
  <script src="viewer/js/bootstrap.min.js" type="text/javascript">
  </script>
  <script src="viewer/js/viewer.js" type="text/javascript">

implies the existence of a viewer directory above all the other viewers.

Again, I'll look into this more myself, but I am wondering if we could get the multi-viewer index working quite quickly if we just pair programmed it together.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about issues here. Is the idea to create a main viewer and have each component viewers link to it? As a first step, does it work to create each component viewer under each directory?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm aiming for:

Something like e3sm_diags where there's an index page linking to multiple viewers:
Screenshot 2024-10-25 at 9 57 29 AM

I want a viewer for each component.

What I currently have:

My latest results are just the land plots:
Screenshot 2024-10-25 at 10 03 09 AM

The parent directory has a subdir viewer/.

If I try to create multiple viewers by having two subdirs viewer_lnd and viewer_atm, then the formatting goes off, as described in #616 (comment). This can be seen here:
Screenshot 2024-10-25 at 10 02 33 AM
and here:
Screenshot 2024-10-25 at 10 04 43 AM
Actually, atm is even worse; the plots don't seem to get linked from the atm viewer.

for rgn in parameters.regions:
run(parameters, requested_variables, rgn)
plots_per_page = parameters.nrows * parameters.ncols
# TODO: Is this how we want to determine when to make a viewer or should we have a `make_viewer` parameter in the cfg?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Design decision question: how do we want the user to specify they want a Viewer?

  • Always make a viewer
  • Make a viewer if they say they want 1 row, 1 column summary pages (i.e., single plot pages) and otherwise construct the nrows x ncols summary PDFs as before?
  • Have an explicit parameter in the cfg: make_viewer = True

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver: new feature New feature (will increment minor version)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: Create HTML page for global time series
6 participants