-
Notifications
You must be signed in to change notification settings - Fork 313
Meeting Notes 2018 Software
- Erik: Upcoming tags. Need to update surface datasets on release branch. As well as update datasets on master (new urban dataset, new PFT datasets with PFT changes and new GULUC fields from Peter). Just want to check on how to go about doing this. Peter has to update the two afforestation future scenarios (SSP1 and one of the SSP4 scenarios).
- Erik: And with new datasets, the process for dataset creation needs to be examined. Because of the wetlands issue Bill Lipscomb proposed we examine our process. The proposal we have is one where we compare one of the previous files to the new, and make sure it's as expected. This will possibly catch inadvertent changes, but wouldn't catch long standing problems.
- Erik: What about PR ESCOMP/ctsm#250 snow parameter update?
- Erik: cime update for f19 files?
- Erik: Note, putting release version of RTM and MOSART on master. At some point we should move the changes on the release branch to master. But, since this doesn't matter much, having the release version as the one we use seems to be fine.
- Erik: Carbon isotope time-series bug. Fixing this means changing
to streams. This would also allow present day controls to work
2000_control
and2010_control
. There are other issues with the hard-code implementation that are problematic, see the last comment in ESCOMP/ctsm#182. This is also a good example of why we shouldn't use custom code to bring in datasets, but the streams mechanism in general is better, more robust and more flexible. - Erik: Note some forcing datasets will be removed at end of year: princeton, NLDAS, Qian, WATCHFDEI, ERAI.
- Erik: Variable name on Sam's branch:
n_dom_soil_patches
orn_dom_veg_patches
or something else? It does include bare soil. - Erik: Just to make Dave aware of the Software Contributor Code of Conduct. https://www2.fin.ucar.edu/ethics/contributor-code-conduct
Need to update surface datasets on the release branch, in order to get glacier-wetland fix and a new field.
On master, also need to update surface dataset because Keith has a new raw dataset, and Peter has GULUC fields, and has new raw data sets with small PFT changes.
Is it okay to change master to use these new datasets? We're trying not to do big answer changes relative to the cmip6 control runs, e.g., so isotope runs using cesm2.2 can be compared against cmip6. Peter's changes are definitely bigger than roundoff, but Dave expects they're close enough that it's okay. Erik points out that the isotope runs could point to old surface datasets as long as we make sure code remains compatible with these old files (which we should therefore strive to do).
Current plan: compare standard resolution (1 deg) to previous, and make sure changes look as expected.
We can't think of anything more to do at this point.
For spinup runs, we should NOT have
use_c13_timeseries
= .true.
use_c14_bombspike
= .true.
in user_nl_clm
Dave: Side-note: We should stop using Qian forcing... update our testing to use one of the newer datasets.
Dave: Feels this should only apply to natural PFTs: we should always run with all of the prognostic crops. (One example of a reason for this: we don't want to cut out the irrigated crop.)
- What about if we're just doing generic crop? Ideally, this would be treated along with PFTs. But if he hasn't already implemented this, then this part isn't critical.
So then the name can be n_dom_pfts
.
- Erik: Future scenarios. I got it setup for mksurfdata for SSP5-8.5, but could extend it for the others. What is the priority on this?
- Erik: Some things that should get done,
conus_30_x8
grid, mosart 8th degree, high-res PFT. Priority for these? - Bill: Why do we have 6 "critical" issues that haven't been updated in months or years?
- Erik: I'd like to lobby for getting fates update in ASAP
- Erik: FYI: Added a couple tasks to the upcoming tags page (SDYN tag?, LGM surfdata, ciso future data)
- Bill: Timing of Sam's tag vs. my upcoming tags (initial tracer update things, Sean's irrigation [possibly 2 tags])
Need to bring in Fang's population data
Erik asks about: conus_30_x8
grid, mosart 8th degree, high-res PFT
The CONUS grid is the refined grid that Colin wants. The files exist, but not yet available by default.
Dave feels these things aren't super high priority.
More generally: how are we going to deal with all the grids that people want to run moving forward?
For this particular case (conus refined grid), it depends on whether this is a priority for the AMWG, and whether they're going to make this possible out-of-the-box.
Beyond that, more generally, we need to discuss this issue of possible explosion of grids... tied in with ease-of-use of creating datasets for new grids.
Good to get high-res pft and then 8th degree mosart in before too long.
Naoki presented some numbers getting at the relative cost of different levels, for the Mississippi basin.
Only about 1/50 of the work is done on the main stem, suggesting that it could be okay to do the main stem on a single processor.
However, the most expensive 2nd level stream is responsible for about 20% of the total cost of the Mississippi basin, so we still may not want to put an entire 2nd level on a single processor.
So we may want to consider something like: do a decomposition at a higher level, and compute that all in parallel. Then redistribute the inputs to a different decomposition at a lower level. This adds some complexity in the mpi algorithm, but something like this should be doable.
(There was quite a bit of discussion that Bill didn't completely follow.)
Bill: Rather than having completely independent decompositions: intuition is that it may be better to think about having a given set of processors fully responsible for a given sub-basin.
Question: do we focus first on getting MizuRoute into CESM or parallelizing MizuRoute? We'll do the latter (especially as long as the NUOPC coupling is still being developed).
- Erik: Current milestones are: clm5, cmip6, cesm2.1.0, and future. clm5 is already done. future, probably isn't useful. What is useful is to put in the requirement wishlist for future releases. Those we need to manage. So I'm making cesm2.1.0 the ones we HAVE to get in. But, some things should be done fairly quickly, but after the cesm2.1.0 timeframe. so I need a label for those. cmip6 is now too broad of a milestone, because some things are needed at some point along the cmip6 process, but not for cesm2.1.0.
- Bill: Where should we record checklists for things like:
- When making new surface datasets: should do a careful comparison of at least one global dataset against the previous version, comparing all fields. For fields that differ, visually check
- Erik: Think the answer changes in rtm/mosart is due to output being a hybrid of single and double precision. Still working on...
Bill will start a checklist in README.developers. For now just have the above item.
Erik and Bill talked about a few things:
- Doing at least a little more design, and design reviews
- Pairing; some possible times:
- For programming
- For the process of making a tag
- For debugging
- Requirements
- Examples:
- Chemistry group requires f05
- WACCM-X requirements
- One answer here is having at least some more things written down
- In documents in the doc directory of CTSM
- And/or in well-placed comments in the code
- For WACCM-X, Erik added a system test to test their three requirements
- Erik: Some things (like f05) should be written such that the group
responsible for it can change it
- For resolution in particular, maybe we can have this on the CESM wiki?
- Examples:
- Keeping up with some reading
- Journals
- Considering a lunch-time "study group"
- Martyn: Discuss workflow for MizuRoute grid aggregation. (Bill wants to think about how this would fit into the CESM/CIME workflow, with respect to when you define grids and set up mapping files: How much of this aggregation is done ahead of time (allowing you to prestage the necessary mapping files) vs. at runtime?)
Naoki has basically finished within-basin parallelization. Still some work to be done to load balance main-stem parallelization.
We had some discussion about the relative benefits of decomposing within vs. between basins, and related: mpi vs openmp parallelization.
- Based on Naoki's numbers for Colorado basin: It looks like we can get something like a 3x - 4x speedup for OpenMP with 36 threads.
- Thinking about the CONUS: about 1/3 of the reaches are within the Mississippi basin. So if you had only between-basin parallelization, the best you can do is improve timing by a factor of 3.
- Mariana suggests: could you move one step back, and parallelize everything but the mainstems, allowing more decomposition. We had some discussion of this, and generally feel this might not be too hard if we allow the mainstem to be done on a single processor (so gather inputs to that single processor) - which may be okay if the mainstem is a relatively small amount of the total time.
The basic question is related to: You can subdivide your domain (e.g., CONUS) into higher or lower-resolution reaches / grid cells. This can be done dynamically for MizuRoute.
With the current coupling, you need to have separate mapping files defined ahead of time. But with NUOPC's online regridding, which (according to Mariana) may be ready by next summer, this could become much easier.
Should we skip trying to do a MCT-based cap for MizuRoute and go straight to a NUOPC cap? Maybe.
- Bill: Organization of usermods: note how I'm doing nhtfrq, etc.
- Show
user_nl_clm
for a case withoutput_sp_highfreq
- Show generated
lnd_in
- Show
- Bill: Fixing non-prognostic-crop transient bug in release
- Erik: are you able to do a careful review of the latest change and shepherd this tag through?
- Is this needed on the cesm2.0 release branch, or just the clm5 release branch?
- Bill: How to notify people of this bug?
- Bill: How can we prevent bugs like
https://github.com/escomp/ctsm/issues/538 in the future?
- This doesn't seem like something that could have easily been prevented through more care in coding (either in the initial code related to this or in the changes that were made)
- One idea is: For risky answer-changing tags, doing some group brainstorming on things that should be manually checked? (e.g., in this case, manually checking crop areas in a transient run). This could be done ahead of time (coming up with a checklist) or as a review of what the primary SE checked (e.g., if I was the primary SE, I would present what I have manually checked, and at least one other person would review that and say if they felt it's sufficient).
- (erik) along with above to figure out the ramifications of a change and what options it will effect. Right now it's obvious to me why this doesn't work like we expected. But, we didn't think of it at the time. But, had we thought through the process more we should've realized it at the time.
- Smaller tags would help here (helping to focus a reviewer on one thing), especially when there are answer changes
- More tests like https://github.com/ESCOMP/ctsm/issues/542
- (erik) do we know if the right behavior happens when
create_crop_landunit
=F?
- Erik - Jim's changes to rtm and mosart, surprisingly change answers for five tests. So I'm going to put those updates in the answer changing tag. How much time should I spend on figuring out why answers change?
- Erik - ne30/conus grid F case issues? What should we do?
Dave & Erik are happy with the format that lists individual items separately.
We want to fix this on master and for the CESM2.1 release (so on the clm5 release branch). We'll think later about whether to fix this for a CESM2.0.2.
Notifying people: Dave's inclination is to notify LMWG people along with the 2.1 release.
Bill: Not a silver bullet, but: For risky answer-changing tags, doing some group brainstorming on things that should be manually checked? (e.g., in this case, manually checking crop areas in a transient run). This could be done ahead of time (coming up with a checklist) or as a review of what the primary SE checked (e.g., if I was the primary SE, I would present what I have manually checked, and at least one other person would review that and say if they felt it's sufficient).
Others agree.
Also, longer-term: Bill has some ideas for system tests we can do to catch this exact problem.
- Erik points out that we probably need a corresponding cime change
Mariana feels it's important to get these changes in, and to understand the reason for the answer changes.
NaNs getting sent from land to cpl after some number of years, starting in a single point. It's happening in cold regions.
Dave's suspicion is: there is some unusual situation coming from CAM, which causes CLM to get unhappy. e.g., it may be due to super-cold temperatures.
In the NUOPC coupler framework, it assumes that all components send meshes (which include connectivity), to allow online regridding. So for CLM and other components, will have another input file that describes the mesh; for simple grids, there's a tool that creates this.
So we'll have 3 files that need to be consistent: surface dataset, domain file and mesh file. May want to think about how we want to combine these - or at least ensure they stay consistent.
One motivation for this network-based routing scheme: allows inclusion of reservoirs.
Mariana suggests, as a first step, putting in place an mct-based cap.
Stepping back: we are proposing bringing in a 3rd river model in CESM alongside RTM / MOSART.
Big pieces here:
-
Infrastructure code that can / should be shared with MOSART / RTM / MizuRoute
-
MCT cap
-
Remapping (using ESMF)
Note that MizuRoute is currently not parallelized, but they're working on that.
Currently, MizuRoute is doing its own remapping (in serial). But we want the coupler to do this.
As a side-note: We would typically run CTSM at comparable resolution to the river - quite possibly even on a grid defined by basins / catchments.
Translates data structures between MisuRoute data structures and MCT data structures.
At initialization, cap tells coupler what grid cells are, what areas are, and domain decomposition.
Note that you don't send the corners in MCT, so it's fine to have complex polygons. (You still need to define the corners offline for the purpose of ESMF map generation.)
Want to maintain the same flexibility as we currently have, that allows coupling to happen less frequently than the CTSM time step.
It's important to get parallelization in place: for the high-res CONUS grid, it takes 3 hours per model year.
Martyn asks if we can have nested decomposition. In his figure: grey needs to be done before the orange, which needs to be done before the red, which needs to be done before the blue.
MCT doesn't care about the details of how the decomposition is done: it just cares about how grid cells / elements are distributed amongst processors.
Coming up with a general decomposition strategy within MisuRoute could be challenging....
Joe points out: We could consider using MPI just at a very coarse level (just for main rivers), and using OpenMP threading for everything else. People like this idea.
- In CESM, if you run a component on the same processors as a different component (sequentially), then all of those components need to use the same number of threads. However, it seems like we may want to run the river model concurrently, on its own set of processors, in which case it has full control over the number of threads per processor.
We might be able to share the decomposition code that's used for RTM / MOSART.
Note that the coupling will happen with polygons that fill the domain. These polygons are defined as the catchments of the smallest-level streams.
Mariana suggests trying to bring in MizuRoute first as a single-processor model, to get all of the coupling infrastructure sorted out. Or, as Martyn suggests, could put in place OpenMP parallelization at first.
Bill asks: to what extent do we want to have this tied in with cime initially, for the build and setting up namelists? Mariana thinks we should at least hook in the build, but initially we could have a dead-simple build-namelist script that just puts in place a pre-staged namelist file.
- Follow up on discussion about friction velocity, with Mariana
- Bill - For Dave: Confirming it's okay that the CESM2.1 code base
will need
use_init_interp
to continue from existing historical simulations - Bill - Dave: Want any help putting together slides for tomorrow's co-chairs meeting re: things still wanted for cesm2.1?
We have 2 weeks. That should be plenty of time for the needed three tags.
If we did anything else, it could be a data reduction tag - turning a bunch of things to inactive by default. One thing to look at is how carbon isotope fields are treated by default.
Mariana: We're not going to have cylc for the community, so we should use usermods to get the correct output.
We'll add hist_fexcl1 for isotope in the carbon isotope user mods. (Note that, currently the excluded monthly vars are also included as annual vars. That may not be needed for all the C isotopes.)
We should split the cmip6_output directory into two: one that just has
the low-frequency output, and one that adds high-frequency. Then we can
have a top-level cmip6 as well as cmip6_high_frequency
. And do the
same thing at the CESM level. The main one will exclude the daily and
3-hourly output.
Actually: We should have different output directories:
output_sp
-
output_bgc
(includesoutput_sp
) -
output_crop
(includesoutput_bgc
) -
output_sp_highfreq
(builds onoutput_sp
) output_bgc_highfreq
output_crop_highfreq
We'll also have cmip6
(which includes output_crop
) and
cmip6_highfreq
(which includes output_crop_highfreq
, which should be
identical to what is now cmip6_output
).
Mariana isn't sure if there is actually duplication between what's in FrictionVelocity and what's done in other components. Would need to talk to Bill Large, Dave Bailey and Marika Holland about this.
Dave L: we could ask Thomas to spear-head this and look into the similarities / differences between different components.
Confirming it's okay that the CESM2.1 code base will need
use_init_interp
to continue from existing historical simulations: Dave
is okay with this.
Ideally we'd set use_init_interp
by default when it's needed... but that's hard to determine.
- Bill - Issue #509: irrigate is true for non-crop 1850 runs
- Confirm how we want to fix this
- Should this be fixed for CESM2.1?
- Who should fix this? (Erik understands the subtleties of this better than I do, but if his plate is overloaded, I can take a stab at it)
- Erik: Note, I'm pretty sure this will require code changes see issue (the branch Sam is working on fixes it)
- Erik - CESM2.1 coordination. Talked about this at CSEG meeting. CAM needs 3 more tags. POP needs 1. We need a cime update and cesm2.1 release tag. There will be a separate branch for cime for cesm2.1 than cesm2.0. For CLM we need a tag that updates cime and ndep. The CLM tag needed for cesm2.1 will go on the clm5.0 branch. We normally put the changes on master first, but could put them on the branch first and then migrate to master.
- Erik - See cesm2.1 testdb: https://csegweb.cgd.ucar.edu/testdb/cgi-bin/viewPlannedTag.cgi?tag_id=466
- Erik - Should we separate the cime update for presearo from the ndep update?
- Erik - Should prescribed aerosols and ndep be same as CMIP5 1850? Currently not, so 1850 will be different than historical
- Erik - Talked to Peter about creating rawpft datafiles that are identical to old, but have GULAC fields on. One way to do it is with nco. I told him to keep that in mind while he is working on his code, if doing it with nco would be easier, we should do it that way. We agreed on shooting for Thursday to get this done by.
- Erik - FYI. Peter was frustrated with git. Wanted to know how to see that his push to his fork worked.
- Erik - Sheri brought up reducing output. Want to reduce default output for cesm2.1 to only that needed for diagnostic package as a proposal. She will talk about this at co-chairs tomorrow.
- Erik - My tag. Made changes over weekend, need to rerun testing and tag tomorrow.
- Bill - timing of performance changes
- Dave - From Thomas Toniazzo: Consistent treatment of all material and energy fluxes between different components in the coupled systems so that both matter and energy are conserved. (As you know at present matter is not considered to be carrying any energy in CESM). In order to achieve this, one bit of progress that I would think important would be to centralise the computations which are carried out in different components for the properties of the atmospheric flux layer. For example, in components/clm/src/FrictionVelocityMod.F90 there is one such calculation, using in particular specific stability functions that are defined locally. So I wonder whether it would be possible to move some of these calculations to cime code repository, e.g. to cime/src/share/utils/, where also the ocean- atmosphere flux calculations are done. In am thinking specifically of the scientific models that pertain to the atmospheric surface layer and not to other specific models, so in particular the stability functions, and also some of the in-lined computations following e.g. in lines 472 in FrictionVelocityMod.F90 I see some comments in that routine (line 503) that point in this direction, so perhaps it may be possible to coordinate this with my planned work on harmonising the model's energy formulation.
We updated this issue.
We'll come back to this after the cesm2.1 crunch and after the changes Sam Levis is working on.
Dave and Keith have looked through this a few times, and don't feel that they can get the output volume much lower.
A bigger concern for Dave is: A lot of the savings come from excludes in the cmip6 output user mods. We might want to make those inactive by default.
How can we make it clear to users that they should include these user mods?
Bill: A kind of crazy idea is: We could make the default user_nl_clm
that's copied to your case dir actually have some of this output
stuff. But then we'd need to remove some error-checking code at runtime,
so that it lets you add fields that aren't actually present in this run.
For now, some things we want to do are:
- Split the user mods directory into one that just has monthly fields, and one that builds on that, adding the high-frequency output
- Mention this user mods dir in the user's guide
We could consider moving this to a function in cime. The simplest thing would be to put a single-point routine in cime, but that might have performance issues. A better solution could be to have a cime routine that operates on arrays as input and output. We'd then have some pre code that packs data into arrays (using only points within the filter), and then some post code that unpacks the outputs. (This packing and unpacking would be a handy little general-purpose subroutine to have.)
A first step in the right direction is would be to remove some
apparently duplicated code that depends on whether landunit_index
is
present. The approach could be the same as above: having pre code that
packs data into arrays and post code that unpacks the data.
- Bill - Discuss issue #511: How should we ensure that the source state for each water tracer is set correctly?
Bill laid out four options:
-
Do nothing about this
-
Put in place a system test that catches problems, if we can think of one that can do this
-
Organize the code so that the tracer updates happen alongside their respective state updates
-
Set pointers that point from a flux variable to its source state, which can be used both in the state update code and in this tracer code.
People agreed that this could truly be an issue.
Dave thinks that the best option could be coming up with a system test for this, if we can think of one.
Dave also thought that organizing the code so that tracer updates happen alongside their respective state updates might be a good option, though also agrees that it could muddy the code for someone trying to understand just the state updates and not caring about tracer stuff.
We thought of a 5th option:
- Renaming fluxes to be explicit about their source (and maybe destination), as is done for the biogeochem code.
This wouldn't guarantee that things are done right, but it should make it more obvious when things are done wrong, and it should prevent someone from changing the state/flux structure, but reusing a flux variable to have a different source state. People felt that this might actually be the best solution. Nobody could think of a case where it fundamentally would NOT work to have a single source for a given flux - though there may be cases like two-way fluxes (e.g., for Richards equation) where we need to split what's currently one two-way flux into two one-way fluxes (see below). And Dave thought this could help clarify the code, and so might be good for other reasons anyway.
Martyn: Long-term, we might want something like a data structure like:
eqn%var(ixLookState1)%state eqn%var(ixLookState1)%flux(:)
Then you can loop over this to do the state updates.
Bill: will consider moving incrementally toward this; for now this would involve something like option (4) above - having some extra pointers. The upside is that it would force code to remain self-consistent; the downside is that it could make the code harder to understand.
What to do about a flux that could be either direction, e.g., for Richards equation? We're not sure how this would work with the vision for tracer updates. We could split a two-way flux (positive or negative) into two, positive one-way fluxes (one of which will always be 0). Martyn points out that this could be problematic for computing derivatives, but it's possible that this could be done after the fact (after solving for all of the fluxes). We'll check if David Noone has ideas for how this could work.
- Erik/Sean -- How important are grids on -180-180? I've added an immediate fix for local noon. But, if they are important we should add tests and show that we get the same answers with negative longitude. Other cesm grids are sometimes on this (notably cism), and it does help if you want to do regions along 0 degrees. As far as I can tell cime works with these grids.
- Isotope project
- Isotope work update
- Bill's upcoming priorities
Mike: WRF and the National Water Model work with -180 to 180 by default, but this can be changed in preprocessing. If it's going to take a lot of time to fix this, then it may not be worth doing.
Erik has fixed some things, but there are apparently still some other problems. e.g., Sean has found some problems with reflected solar.
Probably separate this into at least three routines:
- Snow cover fraction
- Snow initialization
- Canopy Hydrology (though it does more than that)
For now, we'll keep state updates where they are. Later, we might merge some state updates (which would change answers).
We'll keep in place the original fsca formulation (n&y 07) - probably giving it a better name than just origfflag. Note that that is the form used in Noah-MP.
Note that, the way things are currently structured, it does the new frac_sno calculation, then possibly overwrites it with the old.
And note that there are order differences for the new vs. old: New needs
to compute frac_sno
, then update snow_depth
based on that; old needs
to first update snow_depth
, then compute frac_sno
based on that. So
we should have a routine that updates both frac_sno
and snow_depth
,
and let individual parameterizations do that in whatever order they
want.
Then this will be lumped with the FSCA block above it:
! for subgrid fluxes
if (subgridflag ==1 .and. .not. lun%urbpoi(l)) then
if (frac_sno(c) > 0._r8)then
snow_depth(c)=snow_depth(c) + newsnow(c)/(bifall(c) * frac_sno(c))
else
snow_depth(c)=0._r8
end if
whereas this will be lumped with the "original fsca formulation" below it:
else
! for uniform snow cover
snow_depth(c)=snow_depth(c)+newsnow(c)/bifall(c)
endif
i.e., it is currently assumed that subgridflag==1 is combined with oldfflag==0, and vice versa.
We'll aim for biweekly meetings. But we'll keep it on the calendar for every Thursday and decide week-by-week if we want to keep it.
- Erik -- PGI bug #442 what should we do? Worked with Jim last week to have PGI compiler guys work on it. Had to prove an issue. I finally showed that it worked for PGI16.5, but failed for all other compiler versions we have available.
- Erik -- Set options bug #431? Dataset issue Keith found #478?
- Erik -- FYI: added "-fast" option to mkmapdata.sh and it was able to run on normal 1-proc queues
- Erik -- r8th grid for mosart?
- Erik -- Data from Peter L.?
- Erik -- Discuss location of FATES/ctsm monthly meetings
- Erik -- CESM2.1 updates that are needed: CO2 (waiting on Doug K.), ndep/presearo for transient, FATES, transient speedup?, WACCM-X nstep startup issue and testing, improve user friendliness of build-namelist, snowmip, littfall
- Erik -- Ethical use for publication addition to Code of Conduct.
- Erik -- Branch needed for Steve G. that changes pause-resume behavior.
As far as Erik knows, the PGI guys are working on it now. So we'll wait for them.
Priority of #431 (History fields incorrect when set_xxx=0 but the xxx landunit is also set in initCold)
Feeling is: let's fix the known issues, but not high priority to look through carefully for other issues.
With the new -fast option, it can run on a single processor, though it still takes a few hours.
Sean did some work on this and got it to work.
Should this become an option in cime? Dave feels probably yes, with the main hesitation being that we'll eventually have a new network-based routing scheme, which could replace mosart for high-resolution. But in the near-term, 1/8 degree mosart could be nice. Maybe aim for this for CESM2.2.
After a lot of back and forth, decision has been to just use globally-averaged rather than latitudinally varying, for coupled runs.
The dataset is pretty much ready, but they need to resolve Gregorian vs. noleap.
For datm, Dave feels it would be good to have the option to go back and forth: in a lot of cases, we'd like latitudinally varying, but for comparison with cmip6, might want to use globally-averaged. Erik might do that by having a separate field in the dataset that is spatially and temporally averaged, so you could point to that other field.
Dave: if possible, we should try to use the same file that CAM is using, rather than creating our own file. This may not be possible, though, because we need a streams-format file, whereas we think CAM is reading a slightly different format.
Priorities in order:
-
Needed: co2, ndep/presaero for transient
-
The transient speedup would be really good to have
-
The nstep fix would be also really good to have
- Dave -- Status of CLM5/WACCM-X bugs, removal of CLM4
- Erik -- Bring ndep update to master?
- Erik -- Tool chain for NWP (option to skip
1km_hydro
maps and use constant topo_std=371?) (The memory use requirements for mkmapdata.sh is enormous, this would bypass that) - Bill -- Default options for NWP (https://github.com/ESCOMP/ctsm/issues/456)
- Bill -- Water isotope update
When WRF runs with Noah-MP, lakes and urban run outside of Noah-MP, as separate land surface models called by WRF.
What do we want to do with CTSM?
- For lakes, it probably makes sense to have CTSM handle these.
- For urban: For single-layer, it could make sense to have CTSM handle it. Multi-layer urban (sticking into the atmosphere) is trickier....
Getting CTSM to work with the WRF workflow will take some work.
Mike has created scripts that take the WRF geogrid file and create a CTSM domain. But we might need to do stuff later like remove lakes from the geogrid file (or have a WRF namelist flag to tell it not to do lakes itself).
Initialization procedure: typically, initial conditions in Noah-MP come from some other model and/or HRLDAS.
- We may need the capability to blend a CTSM restart file with initial conditions from some other source.
- Mike notes that, in some cases, they take something like skin temperature, and initialize a bunch of other temperatures from that
- Mike notes that there is currently no way to initialize the crop model in the middle of a season
For the Noah-MP raw input data, they store individual tiles. We might want to think about doing that, or doing an initial step of subsetting the high-res raw data.
Mike notes the big difference that, for regional applications, everyone has their own domain/grid, so the need to create something like mapping files from raw data to the surface datasets becomes something that all users need to do.
The highest priority thing right now is to bypass the current 1-km input file. We either want to have a lower-resolution version of this file, or have the option to bypass this, using a different subgrid snow cover fraction parameterization.
We'd like the capability to have NLDAS out-of-the-box. This is a 12-km grid. This is a better one to use than the 30-km grid.
We'd like to have NLDAS datm forcing in addition to the NLDAS CTSM grid.
Erik thinks it makes sense to start with having this done with CLM USRDAT. A following step would involve having it as a supported grid in cime. But actually, after some more discussion, we decided to just go straight into cime. Among other things that would let us choose a PE layout for that grid. (We do have precedent for having single-point and regional grids in CIME; there's something that detects regional grids and turns off ROF; look at how that's done.)
Erik has made some progress on the ESMF library bug, but hasn't figured it out. But this just affects crop, which probably isn't a standard configuration for WACCM-X.
For the balance check issue: For now we're going to change the hard-coded 2 time steps to 1 hour. (Later we could make this namelist-settable if needed.)
Dave will send an email to get final confirmation, but at this point we can probably go ahead with CLM4 removal.
If possible, we'd like to compare:
- Noah-MP
- CTSM in NWP configuration
- CTSM in its standard configuration, with the same forcing data as the other runs
We could also show at the meeting what it takes to get the NWP configuration working within CTSM now (hopefully it will be pretty out-of-the-box by then).
For initial conditions: use_init_interp
from our existing 2000 initial
conditions file.
- Bill - Confirming that we want to store users guide and tech note source in the source tree: I think this will make it easier to keep these in sync with code changes, but for the tech note in particular, we'll need to be vigilant about preventing large images from entering the repository. So I'm wondering if we should at least store the tech note source in a different repository so we don't need to worry about file size....
- Bill - how do we want to manage defaults for different versions moving forward?
- Mike/Erik -- Creating regional grids for NWP for CTSM. Currently have a tool chain for this. Requires SCRIP grid file. Removing ocean points requires some extra work (you can leave them in and run as wetland). Standard resolution PFT rawdata is quarter degree, Peter is creating a 0.05 version for just present day. Other raw datasets are various resolutions.
- Erik -- What compilers do we need to support? Currently having trouble with PGI.
- Report
- Upcoming plans
Erik: Maybe more reason to have the user's guide in the source tree?
Mike: It could be helpful to have the tech note in the source tree if we want people to update it frequently. Would it make sense to have images in a separate repository? Erik: We could do this with manage_externals.
Bill: like that idea. Will check with Keith, then tentatively plan to do this, as long as it isn't too hard in terms of the build.
Partly related to the tech note, Sean raises the question of whether we envision maintaining a bunch of options for each parameterization long-term (and thus needs to be documented in the tech note), or do we more envision that we'd pull out less-well-performing parameterizations after initial development?
Martyn: There's some value in keeping in a parameterization just because it was used in some important paper.
Mike: Also, with regional modeling, there are cases where different parameterizations work better for different regions.
Dave: Some balance here: We don't want every possible parameterization in the model, but want to keep fundamentally different ones, when they're useful. Need to think about how to organize the tech note; might just document the default configurations, possibly including alternatives in an appendix.
For example, we currently have clm4_5
and clm5_0
. Do we want to just
keep incrementing that number (going to clm5_5
or clm5_1
), or move
to something completely different for the climate configuration?
One convenient note is that clm
could be thought of as short for
climate
. Then we could keep going with our clm
numbering, thinking
of it as the numbering of the climate version within ctsm. We would also
have physics suites like nwp
and hyd
(or wat
). Though Mike points
out that the nwp
configuration will need different names for the
different systems in which it's used.
Dave wants to bring this to SSG for approval. Then it can be the responsibility of the different groups (climate, nwp, etc.) for how they do their sub-naming convention.
Mike points out that we want some nwp configuration that basically mimics the latest version of Noah-MP.
Mike: Coming up with the first cut at the NWP configuration is fairly easy in terms of the code. (Side-note: It would be great long-term to have a more flexible way to specify soil layers by specifying a vector of nodes, but for now he has hard-coded a layout for 4 layers.) But the bigger issue is how to create the surface datasets.
Mike points out that, for NWP, each user basically has their own grid, so surface dataset creation needs to be easy and relatively fast. This makes it more realistic to have options in mksurfdata_map rather than at runtime.
Some specific issues are:
-
A lot of time is spent creating fields that may not be needed (hydro1k, glacier)
-
How to specify use of only (say) the dominant PFT
We feel that (1) may be fairly easy - we could skip setting some fields (or put some hard-coded constant fields) for some options.
For (2), we could do this at runtime - having a namelist option to only take the dominant N PFTs. This could fit into the work Sam is doing to collapse crops down to non-crop runs. Alternatively, this could be done at mksurfdata_map time... though the benefits of that would be greatest if we changed the format of the file to only list the present pfts, and we're not sure if we want to go there.
We want to set up a default NWP configuration soon. Erik thinks that doing this correctly in build-namelist could take a fair amount of work, but we might be able to come up with something quick and dirty initially, with a plan to make it more robust when we convert to python.
Is PGI important for NWP work? Mike will ask WRF and MPAS folks?
- Bill - In the upcoming tags project, can we just archive cards that are done rather than keeping them forever in a "Done" column?
- Bill - Allowing namelist defaults for new dev code that differ from CLM5 (clm5.5???)
- Erik -- Need PFT rawdata from Peter for CESM2.1. Also there is PFT data for TRENDY that I need from Peter. And would like high res PFT data.
- Erik -- What's the latest on getting CMIP6 presaero files in cime/datm?
- Erik -- Science workflow, cime bug so
MODEL_VERSION
isn't being updated in build. Need to educate people about it. One workflow would be to add an optional setting that requires a specificMODEL_VERSION
for cases to run (i.e.REQUIRED_MODEL_VERSION
=ctsm1.0.dev004-4-blah, it would die on the build ifMODEL_VERSION
is something else) - Erik -- FYI on Glade changes. Enabled softlink that points to
p_old
, will update to current standard location under /glade/cgd/tss/ just beforep_old
data goes away. By the way, I couldn't get old cases working this might bite us for people that have existing simulations that they want to extend. The best way is probably to do a branch from the existing one, you could set the flag to use the existing name. - Erik -- Go through upcoming tags. Plans for when release tags happen. Think about Fates updates. What are the priorities?
- Bill - Things to sort out for tag that fixes neg ice runoff SH issue: (1) Is it okay to do this in a way that changes answers even for the clm5 configuration?; (2) Need to wait until CAM is ready to update refcases for cesm2.2
- Erik -- There is a separate discussion on the SMB work, but I need to be putting significant time into it. If I make it optional, I could also bring that work to the trunk. Perhaps Bill and I should meet regarding it as well.
- Erik -- Is Sam waiting on me, or does he have work to do? There are some svn branches I was going to do the svn part, and he would do the git part.
Erik and Dave find some value in having the "completed tags" column, so we'll keep it.
We'll talk about this at broader CTSM software and CLM science meetings.
We may want to think about syntax for having a namelist default that has one value prior to version X, and a different default with version X and later. e.g., having a ">" or "gt" in the quotes for the phys attribute in the xml file.
Will W has a branch with 3 commits and wanted to be able to run cases out of the same sandbox with these different commits - and possibly even a single case that he rebuilds / reruns as he makes more commits.
There seems to be a bug with saving updated provenance information when you rebuild... Erik will look into this.
Erik's point is that we should document / let people know about this reasonable workflow.
Erik points out that a gotcha with this workflow is that you might
accidentally have the wrong version checked out. To catch this, he
suggests an optional REQUIRED_MODEL_VERSION
that is checked, probably
at build and run time.
Erik: It works to run a new case from an old ctsm code base, but it doesn't work to extend an existing case.
Dave feels we don't need to worry about this too much.
- Bill - my open PRs
- Bill - closing wontfix issues?
- Bill - should we do a scientific test of the ctsm dev001 tag to ensure same climate, or wait until other changes are included here, too?
- User's guide
- Tasks for 2.1
Is there a reason to keep wontfix issues open? No, Erik is closing them.
Dave suggests having Keith run the automatic validation test
"Special cases": Rename to something like "Notes on specific cases" or "use cases". This can include crops, glaciers, isotopes.
Suggestion of having some links to the auto-generated namelist definition / defaults / etc. in a very obvious place.
-
Bill - Go over contributing guidelines, PR template and issue template
-
Matt/Bill - Water isotope update
-
Martyn - what's in the CTSM presentation next week
Rosie suggests having a link to information on how to run testing. Also, to avoid scaring people off, maybe say, "Testing performed, if any", to make it clear that it's okay to submit a PR without doing any testing.
Sean: Should we distinguish between bug fixes and science enhancements, because the review process might be different for those? Feeling is that we might need to evolve towards that if we start getting a lot of PRs. If we start to have a lot more open PRs, we might need a system, like using labels and/or having a keyword at the beginning (with options like: bug fix, science enhancement, documentation only, etc.).
Mike: could be worth having a space to add names of all people involved. Dave agrees. We could remove that line if it doesn't apply.
Rosie: It would be useful to explicitly mention that you can use issues for science discussions. It might be an empty template, but we should call out "Science discussion" as an option for issues.
Erik suggests that we point people to the forums for this, partly because that's how CESM does it.
Some of us feel that it's nice to have things all in one place, and we don't really like the UI of the forums. But if it becomes unmanageable, we can revisit this.
Dave: support request template should point people to the manual.
Should we distinguish between users and developers? maybe if needed, not for now.
Add to support request: Have you made any code modifications?
Make it clear that this is support needed for model use. "Support needed for model use."
Matt is getting close to completing the separation of WaterStateType, following the design shown here: https://github.com/ESCOMP/ctsm/pull/395
Next step will be separating WaterFluxType into pieces depending on whether a given flux is needed for isotopes.
- Bill asks if fluxes need to be broken down into finer-grained categories. Martyn says that, eventually, we may want to distinguish summary fluxes from fluxes that are actually used to update state variables. But we can do that later.
Present: Erik Kluzek, Dave Lawrence, Bill Sacks
- Bill - Issue template
- Erik - ESMF not working on DAV cluster, and Use of cesm mapping tools. Use CIME configure for tools makefiles?
- Erik - Do we want smaller tools working on hobart (mksurfdata)? mkmapdata can only fit on large memory machine like cheyenne (> 450GB of memory).
- Erik - Fix CTSM buildnml for python 3?
- Erik - Not ready for CLM spinup yet, but should start making it.
- Erik - Updated README files. Since, we need to update README files and UG, I'd like UG to utilize README files for some of it. Doing this would mean to add manage_externals to UG and include the needed files from a CTSM checkout
- Erik - CLM is all over in the code and miscellaneous files.
- Erik - I need to start doing work on the SMB project
- Erik - next tag will be done today.
- Erik - when/how often do we do these meetings?
- Erik - Code of conduct needs to go around CESM/CGD. Need to talk with Mariana about this as well.
Feeling is that support for mapping on DAV cluster is low priority, since Erik has it working on cheyenne.
Tools working on hobart?: Low priority
Using cime scripts for mapping: Good to have, though not super-high priority.
After dev013, we'll let master and the release branch diverge.
For now, we'll just have master and this one release branch, which is for cesm2.0 / cesm2.1, and so can't change answers. We don't see a need for a second release branch at this point.
Bill: Added an "awaiting triage" column, where issues / PRs appear automatically if you add them to this project (instead of being added to the "to do" column). The motivation was: When they were added to the "to do" column, they appeared at the top, messing with our carefully laid out ordering. This way it's obvious what still needs prioritization within the "to do" column.
- (Want Mariana's input) Pulling more parameters out of the code
- Bill will give a brief overview:
- How we deal with parameters now, via namelist: show example:
soilwater_movement_method
SoilWaterMovementMod.F90
namelist_definition_clm4_5.xml
namelist_defaults_clm4_5.xml
-
CLMBuildNamelist.pm
(add_default
call) - Demo case:
- Original
lnd_in
- Change via
user_nl_clm
- Rerun
preview_namelists
- Original
- Also: pft params in netcdf file (current solution isn't scalable
- e.g., merging changes from branches)
- What we last discussed doing
- How we deal with parameters now, via namelist: show example:
- We could simply follow status quo... but that requires writing more
ugly and error-prone namelist-reading code, which we might
eventually just ditch
- One way to make this a little less ugly and error-prone would be Ben's suggestion from a few years ago: Read the file on the master task into a buffer, broadcast that to everyone, and then let all procs read the namelist values from that buffer. So the interface to namelist reads would include a string buffer containing the namelist file contents, rather than a name of the namelist file. (This avoids the need for each module to have namelist-reading code, avoids the problem of needing to remember to broadcast each individual value, and also is better suited for unit testing.) I feel this would be worth doing even if we have just a small number of parameters left on the namelist file.
- Or should we do some infrastructure work so that things don't need
to be redone later?
- Translating build-namelist to python - so that we don't write more perl that then needs to be translated to python
- In the past, we talked about reading everything from a netcdf file
rather than from namelist files. Should we get that infrastructure
set up now, so that we don't do extra work that just needs to be
redone later?
- Introduces cime dependency on at least numpy (we could snapshot in scipy.io.netcdf). Is that okay?
- Ideally, we would translate the perl to python before doing this, but we might be able to come up with a way to call the necessary python from perl short-term, to allow moving forward with the netcdf-based solution before translating build-namelist to python.
- Bill will give a brief overview:
- (Want Mariana's input) Removal of CLM4: can we do this on master
now?
- One reason this could be good: allows cleaning up build-namelist
- Discuss separation of WaterStateType for water isotopes.
Early 2016: Ben Sanderson's request is that there be one way of doing things... which could mean going back to how things were, in that pft-specific parameters were in a netcdf file, and scalars in the namelist. What he does NOT want is having some scalars in namelist and some in netcdf.
FATES plan (developed late 2016) was to use xml backend, netcdf
frontend. You could modify parameters via the user_nl
mechanism, or
you could point to your own netcdf file with all parameters. See also
https://docs.google.com/document/d/105p3L6981KJxcddVBN4J330S-6ECZODrha7uQW0OpfU/edit
and
https://docs.google.com/document/d/1SvwrrSo9ZKymY6nhasi0hqMkidmQyl3yFHrMZpTwhhs/edit#heading=h.110phbn1yf25
and
https://docs.google.com/document/d/1332XErcAB3-2TwrwGEhFTUxFKZdhVZlH3FS_F4VpxNw/edit#heading=h.i5i5dlq3axx9
-
We talked about having real-valued parameters on a netcdf file and logical / integer / string-based options on the namelist file. Is that still what we want?
- This will mean that each science module potentially needs to read both from the namelist file and the netcdf file; this doesn't seem ideal.
- We probably can't be 100% consistent in applying this rule: for example, there may be some discrete pft-specific options on the netcdf file.
- Would it be better to just have netcdf all the way (with just a single namelist parameter pointing to the netcdf file - which would allow pointing to your own if you want)? Ben mentioned FATES was planning to go this way March 6, 2017.
-
I don't like working with namelists, because of the need to hard-code the variable names in the
namelist
statement. If we stick with text file-based input for some things, would it be worth considering a different format like .cfg? (Note that CISM has a cfg file reader.)
Rosie: What happens in FATES: Everything other than logicals is on a netcdf file. They store a cdl file in the repo, which is then turned into a netcdf file. They don't have a mechanism for having different defaults for different configurations (but could do different cdl files). They don't yet have things like different options for a parameterization.
Bill: Question on the table: Do we want the Fortran code to read parameters from netcdf, namelist, some other format like .cfg, or some mix of them?
Rosie: In FATES, there was an overwhelming desire to have all parameters on netcdf.
Sean: It's easier to read a text file than a netcdf file. Dave points out that we can put something like a ncdump of the file in your lnd_in.
- But the ncdump still doesn't look as nice as just the lnd_in file. One issue is that it's not separated by category. Might want to have some attribute or just use a naming convention so that variables are grouped together.
Katie asks what the user interface would be for setting pft-specific parameters. We talked about a few ideas; we'll come back to this.
Acceptability of numpy? People generally feel that's okay. One issue could be NOAA operational centers - but it's probably okay for them to use an already-built netcdf file.
- People agree that we should use netcdf for numerical parameters
- And, if we're going to have numerical parameters there, then feeling is that we should have all parameters there, including integer / logical switches.
- We want to do the rewrite of build-namelist into python first - maybe in the next 3 months-ish?
Mariana doesn't see any problem, but suggests contacting Dan Marsh.
It would be nice to cleanly separate state variables, diagnostic variables, fluxes, parameters etc. Based on an initial scan the separation that you have is not that clean.
What is the path forward: (1) Do you intend to cleanly separate state variables, diagnostic variables, fluxes, parameters etc. into different types? (2) Do you intend to have an isotope type? If you are this specific, do you plan to define isotopes as a separate type or by extending more general types?
I agree with Martyn to some extent. Ideally, I'd like a clean separation of the fundamental state variables. Long-term: For other variables, I see a lot of value in having fluxes and other auxiliary variables live in the module responsible for computing those variables, to the extent possible. In the case of summary variables (e.g., sum over the column, or variables just in the top layer), I could see introducing a new type like WaterStateSummaryType.
I also agree that this initial split doesn't feel as clean as it could be. I started with the hope that a clean separation would align with the delineation between variables for which we need an isotope version and variables for which we do not. I figured that, pragmatically, this is what we need right now, and we could take additional steps later. But if Martyn or others would like, I'd be happy to take some time to sit down with you and try to separate variables out along the two axes of (a) what needs to be replicated for isotopes and other tracers vs. what does not, and (b) states vs. diagnostics, etc. We could use that to try come up with a set of types that feels right.
I'm not sure I understand (2), so will get more clarification on that question.
Mariana suggests that we have a high-level WaterType, which just contains some other types. Point is: pull related types together to make the code more understable. Bill agrees.
Martyn suggests delineating states vs. diagnostic variables. Bill agrees. So there will be states that apply to all tracers, diagnostics that apply to all tracers, states that don't and diagnostics that don't.
- Bill - We may want to have a release branch for cesm2.0, since we'll need to support that code long-term. -- Erik -- I'd like to start with just release-clm5.0 if there's a need to diverge we can do that.
- Bill - With Matt R, have started on the work needed for water isotopes
- Bill - Plans for master diverging from release branch?
- Soon I'd like to bring
unified_land_model
to master. - At the point when they diverge, will we change the tag naming to ctsm tags? What should the version number start with?
- Soon I'd like to bring
- Bill - meeting times moving forward?
- Erik -- ChangeLog for release branch? Part of master ChangeLog? New file?
- Erik -- need a cleanup tag that updates to externals from beta10, clean's up any small issues.
- Erik -- time-stamps on files as per Dave in CLM science meeting?
- Erik -- Can I have an "I told you so moment"? ;-) The issue with orbit is really that we haven't been testing the variable year option. I brought this up as an issue, but we couldn't spend time on it. Fulfilling the maxim "if you don't test it -- it's broken" we ran into it as an issue (See #19, #260, and cime issues https://github.com/ESMCI/cime/issues/2044, https://github.com/ESMCI/cime/issues/2082)
- Erik -- VIC memory issue? VIC isn't running at f09, I think it needs more memory (#384).
- Erik -- #371? #316? #312? #276? #268? #262? #162? #13?
Erik will do the one cleanup tag, and then Bill will bring
unified_land_model
to master. At that point we'll start using 'ctsm'
tags: ctsm1.0.dev001
.
Want to add something in the ChangeLog at that point like:
Does this significantly change the science of:
- CLM4.5? No
- CLM5.0? No
Note that the isotope-enabled version needs to have similar science to the cesm2.0 version. Feeling is: For now, we'll bring the isotope-necessary stuff to master. If we find that we want to make a big science change on master, we can consider what to do at that point: either have diverging branches or maintain the old science via an option.
Erik points out that this would be helpful, so that phys options become runtime rather than build-time. Let's talk to Mariana about this.
For the next few months, let's keep separate meetings for the basic logistics (Monday PM meetings) vs. bigger-picture software meetings (Thurs AM meetings). We could reduce the Monday meetings to something like biweekly.
- Bill - Planning to move to isotopes, even though soil hydrology is only partially completed.
- Dave - Repository for planning documents and meeting presentations?
- Bill - New initial conditions files for release? (https://github.com/ESCOMP/ctsm/issues/312)
Can we use git large file storage (LFS)? Martyn tried it about a year ago and found it wasn't as good as it could be.
Feeling is: Let's put these on google drive with links from the wiki.
We should separate software vs. science meetings. In the science meetings, put links to presentations that are available.
We'll also have wiki pages for reports.
The previous version of mpt had an issue where it would sometimes change a bit.
However, the new version (16) can abort before the run starts. Jim just put in a workaround that restarts the run if it detects that that's what happened.
- EBK -- Some FATES tests are failing for the dev007 update. I think we could use updated finidat files for the FATES science update.
- EBK -- Cheyenne strangeness. Is there anything we can do about it?
- EBK -- PE layouts do we want to optimize for SP vs BGC-Crop+Ciso?
- EBK -- What configurations (compset+res) will be scientifically supported for CESM2.0? Meaning will have simulations that go along with it? We did simulations before -- do they need to be updated again?
- EBK -- Plan to update manage_externals.
- EBK -- Need to have a telecon with FATES developers about use of manage_externals
- Bill - tag status / ordering
Dave: cost is probably more important now than speed.
Would be useful to see plots of speed vs. cost for different PE layouts.
Bill suggests:
- Hypothesis: Having LND run be slightly less than ATM run is "optimal". (He doesn't necessarily believe that hypothesis, but that's the starting point.)
- Test that hypothesis. e.g., one possibility is that optimality is actually achieved with LND run being half the time of ATM run, due to the uneven nature of atm run time.
Idea: Get an optimal ratio of "LND run" to "ATM run", figured out for one resolution and configuration. Hope is that we can then do a quick cut at other resolutions and configurations by trying to get about the same ratio.
Dave: Should really try to get in the changes to speed up transient runs, since that will be a big bang for the buck.
Dave: feels we don't need to redo the simulations, but call the same things as before scientifically supported.
Want to add science_support flag for the other configurations in Keith's matrix. Also want to say that transient are scientifically supported.
Dave feels this science_support flag isn't as relevant for land-only cases, though - it really is more important for fully-coupled cases where things could be out of balance, etc.
What does scientifically supported mean? It means that you've run a simulation and it's giving the science you'd expect - i.e., confirmed that it's not doing something crazy.
Feeling is that we should list support for f09 and f19, but not coarser resolution. Since the land model is resolution independent, we expect other resolutions to be fine, too, but we'll just list f09 and f19 to be safe because that's what we've looked at. Erik notes that we did the simulations at g16, but to avoid confusion we'll list g17.
- Bill - https://github.com/ESCOMP/ctsm/issues/340
- Any objections to my making this change?
- Confirm logical to use to determine first time step, but not trigger on a branch or continue run
- Bill - Meaning of "known bugs introduced in this tag" in ChangeLog. I interpret this to mean, "bugs that were newly-introduced in this tag", whereas Erik seems to use this to list any bugs that have been opened recently.
- EBK - cime and cism update change answers for clm tags (cime changes because of orbit)
- EBK - Order of upcoming tags?
- EBK - Failing test for radtemp branch, floating point exception. I think it's due to something being set to NaN that shouldn't be.
- EBK - NOTE: cesm requirement of
model_doi_url
being added to history files, creating ctsm specificenv_archive.xml
in ourcime_config
. - EBK - Should we make a release tag based on current point of master? Or wait? Should the release branch be what we give cesm2.0? Or separate branch for it? I'd prefer it to be the same.
- EBK - FYI. meeting after this to go over Wednesday git tutorial
- EBK - cesm recognizes that we will have an update for initial conditions. I think that might be done as a change to cesm/cime_config as a ref case? If so CLM's default would be out of sync with it.
Erik will edit the ChangeLog template to separate this out into two things:
-
Bugs that were truly just introduced in this tag
-
Important bugs that were newly discovered since the last tag - even if the bugs pre-date the given tag. (To find this: look for issues that have been opened since last tag.)
Erik's plan: Once we have beta10, we'll move the release branch pointer to point to the version in beta10 and make a new release branch tag.
Question: when and who do we notify of release updates like this?
Bill: This discussion makes me think that we should direct people to checkout a specific tag rather than the head of the release branch. This way we won't continuously run into the problem of, "we've changed the release branch; who do we need to notify?". Instead, people will have to be deliberate about what tag to check out; this requires more work on their part, but ensures that they'll get the same version of the code with each clone they make, if that's what they want. (If we direct them to checkout the release branch, then the risk is that they'll get different versions each time they clone, when what they really want is the same version each time.)
- Dave and Erik tentatively agree, but feel we should ruminate on this a bit.
-
No recording?
-
Lots of remote participants
- Use ReadyTalk chat?
- EBK -- Talk about FATES workflow...
- EBK -- Dave Hart and the new storage paradigm. Plan for talking with CLM folks, CESM, CGD?
- EBK -- I just got a status update on svn tags...
- EBK -- I did the manage externals update for CTSM master, Ben any other tricks we need to know about manage_externals?
- EBK -- the testing for dev004 is done, a few things to finish up with it.
- Bill - okay to bring my init_interp fixes to master now, given
that they'll change answers (slightly) in cases that use
init_interp
? - Bill - Plans for https://github.com/ESCOMP/ctsm/pull/331
- Need new landuse_timeseries files
- Need CLM code
- Should I do this? At what priority?
Key point: Pull in a branch with the actual commit itself that you want - do NOT pull in the annotated tag.
Erik will take the lead on this. Bill will do the necessary updates to CTSM in a follow-on tag.
- EBK -- FATES process
- EBK -- Status of cheyenne and our tags?
- EBK -- Anything from yesterdays meeting to discuss here?
- EBK -- Was able to add ucar addresses to ctsm-dev, I have trouble with non-ucar addresses. Contacted CISL, they can add, suggested I use a different browser and/or the direct add option.
- Bill - status
- ben - how to incorporate ctsm into cam
- Bill - manage_externals
For people who want to do coupled runs, which do we recommend?
-
Adding CAM and CICE to the externals file in a CTSM checkout
-
Advantage: easier for development
-
Disadvantage: may run into external dependency problems (difficulty finding a working set)
-
Disadvantage: need to get your paths just right
-
-
Start with a CESM or CAM checkout, replacing clm/ctsm with your branch
We lean towards (2) - probably easier on balance
Ben has some patches. Up to components to pull them in when you want them.
Erik points out that we can pull this into CTSM master at any point via a PR. Ben will issue a PR once 1.0.2 is ready.
Tentative plan: Have a long-lived branch called fates-next-api. Changes to the api can go into there, and that branch would periodically be merged into master.
Point of having a long-lived branch is because there are multiple developers and having a stably-named branch can help people know where to look.
Ben points out that there would be some advantages to their having this in their own fork, so they'd have more control (over who has write access, etc.). But he's okay with the tentative plan if others want to go that way.
Bill doesn't want a proliferation of long-lived branches in the main repo, but is okay with the tentative plan since FATES is a large enough project.
Rosie raised the question of how we can ensure that setting up single-point simulations remains working and robust.
We have PTCLM, but some problems with it are:
- Feels overly complex
- Periodically breaks
Many people have their own scripts to do this. We want to get some people together to determine the use cases and requirements, and then evaluate how to proceed. Some possibilities are:
- Improve PTCLM and make sure it really keeps working moving forward
- Ditch PTCLM. Start with our favorite of someone else's scripts and make sure that's tested and maintained moving forward.
Yesterday, Ned raised a somewhat related point: desire for more robust SCAM capabilities. e.g., can't do restarts... probably not a big fundamental problem, but need to prioritize getting it working.
-
Bill - More debriefing of LIS discussion (Mariana wants to be present for this)
- Mariana's questions:
- What exactly is LIS?
- Would getting CTSM working in WRF address LIS's requirements for atmosphere-CTSM coupling?
- NOTE: We did not address Mariana's questions at today's meeting
- Mariana's questions:
-
Martyn - Reporting requirements
-
Bill - Protocol for reviewing PRs
Can add one or more reviewers to the PR.
If you are a reviewer, you can:
- Add a review. In the end, you should either "approve" or "request changes" on the PR. Then you can see if a PR is fully approved by ensuring that there's a green check by all reviewers' names.
- Remove yourself as a reviewer, if you're happy to defer to the other reviewers.
Mike asks if there is a review board that vets things that come in. Dave answers that CLM hasn't had something formal, though we've done this informally. Authors need to be invested enough to stick with us through the process of getting their developments working globally, meet our software standards, and as we get all tests passing.
- wjs - Brief update (stuck on some baseline failures)
Sean changed the number of canopy iterations from 40 to 2. This gave close to a factor of 2 difference in the timing of canflux / can_iter. Note that most of the time spent there is in photosynthesis.
There was some unexplained variability here. These runs were on hobart. One hypothesis is that the runs executed on different nodes which might have different hardware. Or there could be day-to-day variability. Or Martyn suggests that a science change in one place could make other code take longer. One idea to deal with this is to run a few ensemble members for each experiment. Another idea is to ensure that the two runs use the same hobart nodes.
It might be worth trying the timing without PHS.
Sean is doing some work on heat capacity that should help with this.
It could be worth looking at maps of how many iterations it takes in various places, at various times of day.
He also tried undoing the deeply nested calls in setting up some matrices for SoilTemperature(?). This improved the timing of that particular part of the code by about 1/3.
Maybe we can revert this back to something intermediate between what was there before and the current code - to maintain some modularity, but not the over-modularity of the current code.
A lot of the pieces of code that loop over all soil layers have similarly large costs. This might be worth investigating further to determine if we can identify some common things that can be done to improve performance of these loops over soil layers.
One big decision for bringing this into CESM: Do we want to (1) improve the modularization of the current MOSART code so that we can pull the new routing model into this code base as an alternative (reusing existing code for decomposition and other infrastructure), or (2) bring in the new model separately - which may involve copying some of the infrastructure code from MOSART/RTM (which currently have a lot of copied code between the two of them).
At a high-level, three steps for integration with CESM:
-
Get ESMF tools working with the polygon-based schemes (Joe Hamman is working on this)
-
Better understand similarities and differences between MOSART/RTM and the network-based models
-
Determine what can/should be done by the MOSART code, and what should be done by the new code. May involve some refactoring of the MOSART code to allow slotting in an alternative model.
In parallel, there will be work to generalize the current US-applicable models to global.
Once those are done, we can work on the lake / reservoir piece, and how the coupling happens between the river model and the rest of the system in that respect.
- wjs - Inputdata currently requires authentication (so requires registering for CESM release access). Are we okay with this, or would we like to request that this be changed to allow anonymous read access? (Mark confirmed he'd be okay with the latter if we want it.)
- ebk -- what do we need to do about cmip6? We had some discussion of this at the mornings CSEG meeting. Mariana was going to explain something to us.
- wjs - Erik: status of testing / baseline generation for 003 tag? If not complete, what should I do when the machine comes back up?
- wjs - Tutorial:
- We're tentatively structuring it as git basics - not assuming much prior knowledge. If we want to cover testing, too, that won't let us get much beyond git basics. Is this a problem, given that many people may have already figured out the basics by then? Alternatives would be to cut testing, or assume more background knowledge so that we could skip or breeze through the basics, covering some slightly more advanced git things.
- Brief agenda for tutorial
- What we'll ask people to do ahead of time
- Have a github account
- Do basic git setup on their machine (username & email, at the very least) - see recommended git setup document
- wjs - testing on my reduce-allocation branch is mostly good, except I'm getting some failures due to running init_interp. An LII test that I think should be passing is failing.
- ebk -- Next weeks meeting? I'll be in Arizona, but probably could call in.
- ebk -- clm-dev email list. New world for spam, and openness. Still email clm tags to it?
- potential bug in quadratic
Dave feels that we're going to want to keep most / all of the current default variable list.
Sheri will send out what her script is producing for historical and preindustrial simulations - i.e., which output variables are needed, and the implications for data volume.
Keith has put together a user_nl with what we think we need for all daily, hourly, etc.
Bill will redo testing for 003 once cheyenne is back up (hobart testing is done). Expect tests to pass, baseline comparisons to fail relative to 002.
Dave: Don't dwell on the basics. But Erik thinks there's still a fair amount of confusion - especially around the distributed nature. Maybe going through some of this at a conceptual level (pictures?) would help.
Going through an exercise that involves managing remotes would be helpful. Also, an example where two people need to collaborate on a branch. This is something a lot of people need and don't know how to do.
- Include how to do this in github (including adding collaborators)
Comparison / contrast with subversion could actually be useful
Let's just let ourselves stretch to 3 hours.
Also include recommended workflow for doing science with this - moving away from using SourceMods. Ben, Erik and Bill should meet with Brian to discuss this.
Include conflicts - at least "don't panic" (and maybe how to get yourself out of this).
Still try to keep some time on testing workflow - e.g., clm_short.
Mariana feels we can make this totally open for read.
It would help for CTSM to have inputdata totally open.
Bill will clear this with Gokhan.
Will make it so that only a few people can post, to avoid spam.
Dave: it would be nice if emails to LMWG also went to clm-dev. Is there a way to set this up? We wonder if it's possible to add clm-dev as a member of the LMWG mailing list. Otherwise, may need to add this as a recipient whenever emails go out to LMWG.
We should rename clm-dev to ctsm-dev.
- DML - user_mods CMIP6 behavior by default.
- bja - discussion of Erik's email regarding remotes, naming, manage_externals, etc.
- EBK - User's Guide. Working on some global changes. Some sections
can be removed now since the cime documentation is better. For the
root of the CLM directory plan to use an env variable
$CLM_ROOT_DIR
to illustrate paths, since there are two allowed directories now (. and components/clm). So I'll have a section at the top talking about the two locations for$CLM_ROOT_DIR
. We still need the tools and single-point chapters. And special cases (such as spinup) and examples are important. I think talking about the namelist xml files is important as well. As a user the most important sections are the quick-start, what's new since last version and the examples. For a new user talking about creating cases is important, but I think this section can be trimmed because the cime documentation is better. But, it's still good to have specific CLM examples. I think I can remove most or all of the troubleshooting section and point to cime's chapter. Since CO2 is now a namelist item, it's example can be removed. In terms of things that people do that need documentation to point to, setting up single-point cases with tower data, and how to spinup are the two most important that they need help with. And if we don't document it, we have to email them what to do rather than just point to a web-link. - EBK -- cheyenne shepherd problem still killing jobs and even in the middle of simulations. What should we do about it? I do have a ticket in with cisl for the problem I've been seeing with the FATES spinup. Apparantly, Cecile says using mpt2.15 (rather than mpt2.16) fixes it for her. CLM is using mpt2.15f right now, I haven't tried vanilla mpt2.15.
- wjs - Considering interpolating all out-of-the-box initial conditions files. Won't do this if there are plans to redo all initial conditions soon anyway.
- wjs - Should my performance changes come onto the clm5.0 release branch? Note that this will force everyone to interpolate their initial conditions. If so: Is the process right now to bring it to master, then at some point we'll bring things from the master branch to the release branch? (I forget: is the intention that the clm5.0 release branch will be used for cesm2, or that we'll be at a new clm5.1 release branch by then?)
- wjs - cmip6 usermods discussion at cseg meeting
From discussion at cseg meeting: We'll move ahead with usermods directories. Not clear whether we'll (1) define a few usermods that are used for all cmip6 runs, or (2) have user_nl auto-generated separately for every experiment. CSEG needs to work this out more.
Dave: it seems like we may need different sets of output for different periods of runs - e.g., need high-freq output for the last few decades of the historical run.
Probably go ahead and regenerate initial conditions files. Should be able to use the file name to determine the configuration to run to interpolate each file. (But see email from Keith.)
What to do about FATES tests that use initial conditions? For now, maybe
just set run_zero_weight_urban
in the testmods directories? Actually:
check with Ryan to see what's needed to regenerate initial conditions
that could be used for testing.
Some other things we have: bug fixes for energy balance, N bug fix, new N dep... need to determine which of those will come to the release branch.
For now, I'll bring my change to master.
What should we recommend remotes be named? Bill and Ben both prefer naming remotes explicitly, like "escomp", "billsacks", etc. - rather than generic "origin", "upstream".
Erik: part of the motivation here is consistency with the FATES workflow.
Rosie: How should we include FATES workflow pieces? Bill suggests having a chapter 3 in https://github.com/ESCOMP/ctsm/wiki/Getting-started-with-CTSM-in-git that is "Working with FATES in CTSM". Erik points out that the same workflow applies to any external (MOSART, etc.), but FATES has the largest user community.
Present: Martyn Clark, Mike Barlage, Sean Swenson, Bill Sacks
30-km CONUS
72 proc (might try scaling back to 36, because Noah-MP might saturate beyond 36).
What number should we focus on for total? Probably clm_run. This includes history output, which eventually we could probably reduce substantially - so we'll keep that in mind when looking at the results.
Note that the control run has special landunits, whereas others do not. Sean suggests an intermediate run that uses all pfts but no special landunits.
Just grass + bare compared with control: 55% reduction in time.
Also memory reduction: further 60% reduction.
Also going to 4 soil layers: further 22% reduction. About a 70%
reduction in hydro_without_drainage
and soiltemperature.
Combining all these, relative to control: 86% reduction compared with control. So now we're maybe at about 2x Noah-MP cost - but Mike wants to redo the Noah-MP timings.
Changing number of CanopyFlux iterations from 40 to 20 (consistent with Noah-MP), and also changing another number of iterations from 40 to 3 (consistent with Noah-MP): little effect - though maybe would get more effect if you had trees rather than just grasses.
With all of these combined, other than the iteration changes: big culprit is canflux (nearly half of the time), second most bgflux (bare ground). Together those two account for nearly 75% of the time.
Note: To get more detailed timing information, probably need to change
TIMER_DETAIL
and maybe TIMER_LEVEL
env_run.xml
variables.
- EBK - CPLHIST cases. I added a test case, but it can only run on cheyenne. Do we want to be able to run it elsewhere?
- EBK -- When moving over from svn do we rebase or merge?
- EBK - Process for mosart and rtm tags? I just created two for each. Did a PR, ran testing. Didn't ask for review since changes small enough. I think we should in general ask for review. Also, my first few tags have been lightweight tags, should I redo as annotated? Is it possible to make annotated the default in git config?
- EBK -- Surfdata for new conus and physgrid options for CAM? Will meet with Colin Z. in a few weeks.
- wjs - git tutorial?
- bja - Keith's documentation question for mosart
Currently, the path to the data is hard-coded in the test xmlchange command, so we can only run this test on cheyenne.
Currently this is just a problem for testing. But Erik wonders if this is a problem for users.
If we want to support this out-of-the-box, want to put it in inputdata,
at which point we can use $DIN_LOC_ROOT
. For now, we won't worry about
this.
Should we do a rebase or merge when bringing things over from svn and then bringing things up to date with latest master?
Ben: If you're just bringing over the latest (r272), doesn't matter. If you're pulling over the whole history, do a merge.
Should we always get a PR review? Feeling is: get one if you feel it's warranted, but not required.
Tagging: Use annotated tags moving forward, but don't go back and change previous tags
Let's delete the MOSART and RTM repos in NCAR, and for MOSART just put in links pointing to the appropriate chapter in the CTSM documentation.
For RTM, could point to the CLM45 tech note.
Let's include some workflow and best practice-related things like:
- Whether to do cleanup before submitting PR
- Create branches rather than SourceMods
We do need some basics (like how to create a branch).
Would like this to be hands-on: actually create a branch, commit it back, etc.
We might want to split this as:
- Brian gives a hands-on general git tutorial (but targeted to CTSM-specific workflows)
- We give some CTSM-specific workflow things
Remote participants? We should probably allow it, but we won't be able to provide support very easily.
Probably reserve a 3-hour block, but maybe aim for 2-ish hours.
Could also be helpful to include a bit on the testing workflow, too: e.g., create an SMS_D test and verify that it passes. Point is: this is really part of the process.
- Creating a failing test could be nice, too.
We can probably assume that people know basics of CESM, though.
- bja - push converted branches to escomp/ctsm or personal repo? If ctsm, are we deleting when pulled by devs?
- DML - Inevitable question at LMWG about CLM-WRF. Discuss timeline for LILAC? - bja - discussed with DML and MV 'end of year'.
Ben will put them in his fork, then have people move it to their own fork
-
wjs - Next steps for performance work? https://github.com/ESCOMP/ctsm/projects/4
-
Follow-ups from LMWG
-
Any developments on the Noah-MP side?
In order to do a more fair comparison between Noah-MP and CTSM, Mike set up CTSM runs with just one PFT and one bare. Initial results were a 10x difference. By only allocating memory in CTSM where needed, we can get down to 4x. It looks like we have some more low-hanging fruits that could bring CTSM more in line with Noah-MP.
The easy first thing would be to make it so that we only allocate memory for the necessary PFTs in non-transient runs.
The National Water Model has two SEs tasked with modularizing it. They're looking at it top-down.
Mike and Bill talked about what approach we could take that might make it easier to transition from Noah-MP to CTSM in the future.
- EBK -- KnownBugs and KnownLimitation files, what should be done with them? How do people figure out which important issues affect a given tag they are using?
- EBK -- I think we should do a tag with an update to beta08/beta09 with a cime branch.
- EBK -- for doing my tag I pushed changes for the ChangeLog directly to ESCOMP/master. I'm not sure that's a good thing.
- EBK -- Current priorities for tags, README, UG, SSP timeseries etc.?
- EBK -- Issue #262 hirespft option?, #258 code of conduct, #249 data assim
- EBK -- Branches to move over to git?
- wjs - Do we still need to maintain a list of changed files in the
ChangeLog?
- With git, it's easier to get a definitive version of this than it
was with svn (using git diff and
git log --first-parent filename
). - I'm fine keeping it if people really want it, but I wonder whether it will be useful enough moving forward to be worth the effort it takes to maintain it.
- ebk -- there were two parts to this that I found useful. The first part is that documenting the changes per file caused me to do my own code review, and that step has shown to be vital. git's PR mechanism does a better job of this and makes it public to everyone. The second part though is looking in the ChangeLog for when a specific file was changed. Having it in the ChangeLog allowed you to do it without network access. Since git is distributed we may not need that mechanism anymore.
- ebk -- in my latest tag I replaced that section with a listing of the PR's.
- With git, it's easier to get a definitive version of this than it
was with svn (using git diff and
- EBK -- exact planning for CLM release sequence? Do we give tagnames to release versions -- even if identical to non-release? Since, tags are different in git than in svn, I'm not sure we really want to make a release tag for each tag on master.
- wjs - Erik, I see you started making all tags into github
"pre-releases". What's the advantage of doing this? (I'm not saying we
shouldn't do it; I just want to understand that it's serving some
worthwhile purpose before we add that to our tag-making process.)
- ebk -- I wanted to document at the top the versions that actually work in git. And figured the "pre-release" tag was a good indicator that this is a tag not to use. I also added notes about the last tag that doesn't work in git, and called it a prerelease. I think a good reason for these notes is to document problems that you find out after a tag is made (a DO-NOT-USE note kind of thing). That's one thing I tried to put in the notes about the tag that required CIME_MODEL to be set for example. In terms of viewing things on github, these notes are useful in that they are what you see in the web-site. The main reason I did it now, is as I said to document the versions that people can't use or the pre-release versions before the version that has everything we want in place. In the long run, we may want to ONLY do this for the tags that go on the release branch.
Erik has tried to keep these up-to-date in releases.
Maintaining a document is problematic because it's a static document that's hard to keep up to date.
However, there isn't a good way from the issues page to see what issues apply to a given release version.
Dave feels it isn't worth maintaining this list. We can point people to the issues page and let them search as needed.
Feeling is it's okay to push things like ChangeLog updates directly to master.
How to track upcoming tags with github projects? In particular, how to include tags that include various issues?
(See also notes from a couple of weeks ago.)
Ben's idea: We can open a place-holder PR linking to the various issues that will be addressed.
Feeling is we don't need to list specific files changed. But let's include a list of PRs in each tag.
Make sure to have good commit messages.
The only thing that should go onto the release branch is merges
Every merge commit on the release branch should be tagged (often this may include a few changes together).
Feeling is that we should create a release tag tagging what's there right now: release-clm5.0.00.
We might have multiple tags on master before making the next release tag. e.g., we could get up to clm5.0.003 and then that would be equivalent to release-clm5.0.01.
Do we want some other way to distinguish between release and dev tags, to avoid confusion - since the numbers won't align?
We'll plan to have releases tagged like release-clm5.0.01, and dev tags like clm5.0.dev001.
Labeling tags on github: Plan is to label all tags on release branches as a release. We won't do anything for tags on master: leave them as just plain tags without release notes. (So we won't use pre-release for them moving forward.)
- EBK - can reseed_dead_plants, spinup_state, reset_snow, reset_snow_glc, reset_snow_glc_ela, be set or changed on a branch? Should these only be allowed for startup or hybrid?
- EBK Need to make a clm branch for
cesm2_0_beta08
. The problem is thatconfig_compsets.xml
is specifying a hybrid startup for B cases, but changes in CLM mean these files need to be interpolated rather than used without interpolation. So eitherconfig_compsets
need to change, or CLM has to figure out that these files need to be interpolated. - EBK -- upcoming priority after clm branch tag? UG, new-IC files, bugs-list?
- EBK -- changes coming down from CLM teamwork page? What do we do? Move into ctsm github issues?
- EBK -- FATES IC files (for both clm45 and clm50?) Do you want IC files for users or should users spinup?
- DML -- Can we use the "LND_TUNING_MODE" feature in env_run.xml to handle the nitrogen deposition for coupled runs -- ebk, we didn't want to move to 1850 CMIP6 ndep until we have both 1850, and hist ndep versus offline runs
This could be a useful thing for some of our needs.
How to do tag planning? One idea is to have a project like "upcoming tags". Then could have notes for each upcoming tag.
How to plan which issues go in a tag? We could do that with projects, too: Have a project for each tag, with the issues that will be included in that tag. Then the above notes could point to the appropriate project as a link.
It causes issues to set reseed_dead_plants
on a branch run - since it
violates the rule that answers shouldn't change in a branch run if your
namelist hasn't changed. Would someone ever want to do
reseed_dead_plants
in a branch run?
- Dave feels that this isn't really needed
- Erik will make it so this doesn't operate on a branch run, maybe
throwing an error in that case.
- Mariana: You're not allowed to throw an error in this case: you need to be able to kick off a branch run with the same namelist as the initial run. So write out a warning that says we're ignoring this.
The reset_snow
options already don't do anything on a branch.
What about spinup_state
? Erik will talk to Keith about that. Dave is
less inclined to stop allowing a branch run in that case.
Right now, the hybrid case pointed to for B1850 has incompatible initial
conditions. The logic in CLM says to have use_init_interp
false for
hybrid runs.
Let's make a refcase with a new initial conditions file. Can take an
initial conditions file from one of Cecile's recent cases (e.g. finidat
used for
/glade/p/cesmdata/cseg/runs/cesm2_0/b.e20.B1850.f09_g17.pi_control.all.266
)
Fix for branch cases with reseed_dead_plants
less critical.
Present: Dave Lawrence, Martyn Clark, Sean Swenson, Mike Barlage, Bill Sacks
-
Martyn - Update on paper
-
wjs - Project board for hillslope hydrology: https://github.com/ESCOMP/ctsm/projects/3
- I don't particularly like github projects for this purpose:
- Hard to see an overview, since all of the note is shown at once, rather than just the title
- Limit on number of characters in notes
- Seems better for tracking and organizing existing issues
- If we want projects organized around issues, there are various value-added services like ZenHub (free), maybe Zube (if pricing is low or free... it's possible this allows more than just github issues), and others
- Maybe just use individual issues for this purpose?: https://github.com/ESCOMP/ctsm/issues/222
- I don't particularly like github projects for this purpose:
-
wjs - Update on hillslope integration
-
Mike - timing tests
Martyn talked with Paul Dirmeyer about the possibility of having a commentary in JAMES. He thought that could be great.
Balance between not wanting to put something out too early vs. getting something out that people can reference.
This could be part of the CESM special issue
Should we include details on what will be included - e.g., which parameterizations, and from where? This could be a lot of details... maybe include in appendix? General feeling is to keep this higher-level.
Feeling is that, for now, let's try using individual issues for science enhancements. If that gets cumbersome, we can explore breaking things into separate issues (with projects) or some other solution
Martyn asks if we can reduce all the argument passing. e.g., inline some things, or introduce overall structures holding a bunch of the common arguments.
Bill sees pros & cons, but is open to this. If we want to package various things together, we could introduce a new locally-defined type for this, or just package them into the lateral_outflow ("this") object.
-
Martyn asked if this can be done with 'associate'. Bill doesn't think so.
-
Could copy data around, though there's performance overhead with that, and risk that you'd forget to do the copy. This risk is especially great for output arguments that you need to copy out (but we don't have many or any of those right now).
-
Could have pointers to the appropriate data, though pointers can have performance issues
Bill likes Martyn's idea of having some packages that contain commonly-grouped data - e.g., combining bounds, col, grc, etc. into one higher-level structure; maybe making a higher-level hydrology structure; etc.
Present: Dave Lawrence, Erik Kluzek, Ben Andre, Bill Sacks
- WJS - final decision on branch names
- DML - We will need aerosol deposition fields from coupler history output from fully coupled transient runs (historical and future)
- EBK - We have the CLM team on NCAR/CLM, should we move them over
to a team on CTSM? How should we handle teams and collaborators for
CTSM?
- WJS: it looks like people at least need to be members of the ESCOMP
organization to be assigned to issues, so I think that anyone who is
a potential issue assignee should be on a team with at least read
permission. Based on
https://help.github.com/articles/repository-permission-levels-for-an-organization/
it looks like people need write permission to have access to some
conveniences around issues and PRs - though that's more a
convenience than a necessity. There's a trade-off between that
convenience and the risk that someone will accidentally mess
something up (e.g., delete or overwrite a tag). The key conveniences
that come with write access (in addition to being able to write to
the repo) are:
- Request PR review
- Close, reopen and assign issues
- Apply labels and milestones
- Create project boards
- WJS: I think we want at least two groups:
- Admin group
- Larger group of core CTSM developers, with either read or write permission
- WJS: it looks like people at least need to be members of the ESCOMP
organization to be assigned to issues, so I think that anyone who is
a potential issue assignee should be on a team with at least read
permission. Based on
https://help.github.com/articles/repository-permission-levels-for-an-organization/
it looks like people need write permission to have access to some
conveniences around issues and PRs - though that's more a
convenience than a necessity. There's a trade-off between that
convenience and the risk that someone will accidentally mess
something up (e.g., delete or overwrite a tag). The key conveniences
that come with write access (in addition to being able to write to
the repo) are:
- WJS: if we expand write permissions (and maybe even if we don't),
I'm thinking we should ask Mark to expand the backup strategy to keep
a few years of backups - in case (for example) some old tag gets
overwritten and we don't notice it for a while
- I also was wondering if we should have some optional mechanism in manage_externals that allows you to list the SHA-1 along with the tag. (This would be used only when doing a checkout: after checking out the tag, it would check the checked-out SHA-1; if they don't match, it would abort with an error.) But I'm leaning towards thinking that this is overly paranoid of me....
- EBK - what are the requirements for us to say "the release is done"? Testing, list of machines tested on, bugs fixed, cleanup tasks to do, UG, Tech note, list of things to do, who does each of these? What is the name going to be for the release? -- from last week there are only a few requirements as well as being before the LMWG meeting.
- EBK -- Are we still putting planning on trello? Should we archive meeting notes?
- bja - also need to update copyright date in license file for rtm, mosart, ptclm
- EBK -- SSP3-7 datasets for clm50? Peter made, priority for general ability to do this?
- bja - git branch conversion upon request....
- bja - still not clear to me what is desired for #215
Main argument for "master": it follows the standard git convention, so appears in git tutorials, documentation, etc.
Main argument for "develop": it makes it more explicit that this is for development.
Dave points out there are other reasons for "master": it's consistent with FATES; also, it avoids naming confusion with individuals' development branches.
Final decision: "master".
What about a long-lived production branch? If we have one, we'll call it "production". Update: we changed our mind on this: see below.
It's not clear if there's strong benefit to having a long-lived production branch (that outweighs the complexity). We'll come back to this if/when needed.
For now we can create a release branch: release-clm5.0
Then we can instruct people to do:
git clone -b release-clm5.0 https://github.com/ESCOMP/ctsm.git
cd ctsm
./manage_externals/checkout_externals
Dave points out: It could be confusing to have "production" and "release". Feeling is: let's use "release" for everything, not have anything labeled "production".
Summary: Development will happen on "master". We may or may not have a single, long-lived "release" branch, from which all releases are made (either directly on the "release" branch or from branches off of the "release" branch). If we don't have a single "release"branch, then we'd still have individual release branches for each release; they would just come off of "master" in this case.
Should we direct people to the bulletin board or the issues tracker?
Maybe for now (until CTSM is a thing independent of CESM) continue to use the bulletin board.
We should sit down for a couple of hours to determine what we can cut from the User's Guide.
Suggestion that Erik starts with a quick take at this, then we can go through it as a group.
Ben thinks it's possible to have a link to the pdf from the html documentation... it may be a two-step process.
Mariana: not sure if we want to commit the pdf to the repo. May want to host the pdf from somewhere else for now.
Feeling is: let's have 3 teams:
-
CTSM-Admin
-
CTSM-Write
-
CTSM-Read
The admin team will be very small. Others will start on the Read team, and then can graduate to Write.
We'll think about an automated solution. For now send out email manually to clm-dev.
Bill: Would be good to prefix tags to make it clear the distinction between dev tags and release tags. e.g., maybe prefix dev tags with "dev"??
Branch: release-clm5.0
Tags: release-clm5.0.0, release-clm5.0.1, etc.
Feeling is: let's do what we can get done by next Friday, even if it means that we'll need a release update soon afterwards.
Present: Dave Lawrence, Ben Andre, Erik Kluzek, Bill Sacks, Keith Oleson, Rosie Fisher
- WJS - go through critical and high-priority bugs; determine what needs to be fixed for release
- WJS - develop vs. master plans from Jan 18 meeting
- Basic plan is to use gitflow: http://nvie.com/posts/a-successful-git-branching-model/
- However, in contrast to some descriptions where 'develop' is potentially unstable, we'll treat 'develop' like svn trunk: Branches can only come to develop after the full test suite passes. When we do releases on 'master' is more subjective and based on scientific readiness.
- What should the default branch be? Tension between what we'd want users to get ('master') and the default we'd want for PRs ('develop'). At yesterday's meeting, we said we'd use 'master' as the default. The risk there is that we might periodically mistakenly merge PRs into master, which would need to be reverted.
- EBK -- Do we do the part of the process for hotfix branches? And reserve those branch names?
- (Further reading, with discussions of when gitflow is too complex: http://scottchacon.com/2011/08/31/github-flow.html)
- EBK -- Do we both make tags and do PR's? Right now we mostly do not do code reviews, which is bad IMHO. I hope we do more reviews in git. When do we ask for reviews? Also should allow PR's that don't touch code, so they can be combined with other changes. What is the process for a develop tag to move over to release? How often do we do release tags on master? Still need to decide on naming convention.
- WJS - Worth having a simple_bfb tag for issues? Point is: could help us find issues that could be combined into a bfb tag (ebk likes this, but it will need someone responsible to make it happen)
- WJS - Ben: should we do anything more on manage_externals for v1, or make v1 with what we have? When we're ready, can you pull it into the andre-standalone branch (or show me how)?
Dave: Feels we should be a little more lenient about what we mean by release, and how much needs to be fixed for that. There's a lot of positive benefit to releasing before the meeting. We can always put out a release update via a new tag on master.
Bill & Ben: Feel that there are still some big issues that should be fixed before release.
Only machines that need to be supported in the release are cheyenne and hobart. We should explicitly mention the machines/compilers known to work.
Erik: For tools, we just test on cheyenne; feels we can expand that for the cesm2 release.
We'll probably have a release update that comes with the CESM2 release
At yesterday's ctsm meeting, we discussed what kind of github workflow / branching model we want in order to support development and releases. I'd like to discuss this further at today's clm-cmt meeting to make sure everyone is on board with this.
The key aspect of yesterday's proposal is that we'd have two main, long-lived branches:
-
develop: This would take the place of our current trunk. Most branches branch off of develop and get merged back into develop. A key difference for us (compared with what's suggested in the blog post) would be that branches only get merged to develop after they are run through the full test suite – so from a pure software engineering standpoint, all commits on develop are "production-ready".
-
master: This would be the branch along which releases are tagged. The only commits that go on master are merges from release branches, when we're ready to make a release.
This is the "gitflow" workflow, which is presented here:
http://nvie.com/posts/a-successful-git-branching-model/
The core elements are illustrated nicely in the figure at the top of that page. If you aren't familiar with gitflow, it would be helpful if you could at least look at that figure to familiarize yourself with it. The discussion in the rest of that post is also worthwhile to read at some point – I think we would adopt most of the workflow as described there.
I was initially hesitant to adopt this relatively complex branching model. (e.g., http://scottchacon.com/2011/08/31/github-flow.html argues for a simpler workflow in many cases.) However, the requirement to distinguish "blessed" versions (which may be released, say, every few months) makes me feel like gitflow is probably the right choice for us. The downsides are that this more complex workflow will involve a bit more overhead and carries a larger risk of mistakes.
People are generally happy with the gitflow workflow.
However, don't like the names develop vs. master.
Suggest using "master" for what used to be trunk, and something like "release" or "production" or "stable".
General consensus is for "production".
Feeling is that we want to merge latest master into a dev branch before doing final testing (similarly to our svn workflow), in order to catch integration issues. However, need to make sure to use '--no-ff' when doing the merge.
Feeling is that we shouldn't recommend rebasing to scientists. One reason is that, if you've done any science from your branch, the provenance is lost if you rebase.
General feeling is that we don't want to duplicate information that's elsewhere, because it's easier to maintain.
Rosie suggests hosting a webpage with the relevant information. The README.rst file in the repo will then just point to that. Ben suggests adding some other links in the README.rst, too - like pointing to cime user's guide.
Probably point to http://cesm-development.github.io/cime/doc/build/html/quickstart.html - though we aren't sure if it's going to stay there or move to esmci.github.io.
There's currently more documentation here http://esmci.github.io/cime/
e.g., this applies to naming for cfg files for manage_externals.
Let's keep things named CLM for now, until post-release.
https://escomp.github.io/mosart/doc/build/html/tech_note/MOSART/CLM50_Tech_Note_MOSART.html is incorrect, according to Ben.
Dave/Keith think we should just point to the CLM tech note for this.
Bill: We didn't discuss this, but based on Erik's feedback on the agenda, I have added a "simple bfb" tag for issues. The idea is: this tag can be used to find issues that can be relatively easily pulled in to a larger bfb tag, to combine multiple changes in one set of testing.
Present: Dave Lawrence, Mike Barlage, Sean Swenson, Martyn Clark, Bill Sacks
-
Meetings: do we want to maintain separate meetings for CLM software and science vs. CTSM or combine those? Also, bigger CTSM meetings?
- We are deferring this item
-
wjs: status: I haven't been able to spend much time on the lateral flow refactor. The extraction of the power law routine is done-ish (need to figure out cause for answer changes). Need to put in the hillslope version and the clm4.5 version, then finish cleaning up the routine in SoilHydrologyMod. Hillslope branch moved to git.
Martyn asks: For branding, will we call it CLM or CTSM?
Do we want to do something like "CTSM-CLM5"? People think that could work.
Dave: For CESM, you're supposed to do something like "CESM2 (list of things that differ from default)" - e.g., "CESM2 (WACCM)"
Martyn feels it's important to include "CTSM". Dave thinks we need to keep the "CLM" brand, at least for now. So something like "CTSM-CLM" is probably best, at least for now.
For the short-term: Dave is comfortable with "CTSM-CLM5", but is a bit hesitant because we haven't published anything with CTSM yet. He's happy as long as it has CLM somewhere in the name.
Feeling that we'd like to get something out there sooner rather than later.
Martyn: One option would be to start with a vision paper, then follow up with a more detailed technical paper. Or should we start with the more detailed technical paper?
Dave: JAMES probably wants to see a working model. Maybe something short like a GEWEX or EOS article.
Martyn: May be able to do a commentary in JAMES. He'll look into this.
Dave: This could be a way to get some things out there like naming convention, version control, etc. - kind of boring, but stuff we should get out there.
Bill: For the most part, developments will happen in people's forks
Mike: Does it make sense to have some collaborative development via people's forks rather than the main fork?
Bill thinks that can definitely make sense, though prefers that things get integrated into the main repository more frequently so that things don't get too out-of-sync.
How do we want to handle releases - particularly, how to point people to the right version of the code they should be using?
Martyn asks if we want separate master vs. develop for that. Or Sean suggests having a "release" branch.
Feeling is: Let's have "master" and "develop".
-
develop will take the place of our current trunk. Things need to pass all tests before coming to develop.
-
master is where releases are made and tagged. This contains blessed versions. This will be the default that people get when they clone.
Bill: I think this is roughly equivalent to http://nvie.com/posts/a-successful-git-branching-model/
In the short-term, before the CLM5 release, we won't have a master branch. The first existence of the master branch (and the first tag along that) will be the CLM5 release.
Present: Dave Lawrence, Rosie Fisher, Ben Andre, Mariana Vertenstein, Bill Sacks, Erik Kluzek
-
DML - Single point on Cheyenne
-
ebk -- RTM/mosart issue #1, missing history file with gnu compiler
-
ebk -- note ran
yellowstone_pgi
test list on bothcheyenne_intel
/cheyenne_gnu
worked fine outside of above issue with gnu for rtm/mosart. -
ebk -- moving to new testlist format (also allows wallclock time, and memleak fraction), and for mosart/rtm. Some of those tests weren't being run because they weren't changed from
aux_clm45
toaux_clm
. Some machines are gone now: janus, hopper. There was a testlist for "aux_clm_ys_pgi
", I don't think that applies anymore. -
dml - tag for diagnostic skin temperature
-
dml - moving forcing data to cisl's data archives
-
bja - branch migration for git
We can use github projects for this
Can you do a single-point run using slurm, going to geyser/caldera?
Currently, no way to do this from cheyenne: on cheyenne, you end up using a full node.
You could login to caldera and set up a yellowstone case
But we can set things up to use the share queue. Bill will do that.
Just with gnu
See https://github.com/ESCOMP/mosart/issues/3
Feeling is: let's determine if this is decStart, but then let's not worry about this for right now. Let's just move this to intel.
Dave suggested starting by moving cru/ncep forcing data over. If that seems to work well, we could move gswp3 over.
These are data that have been stored in inputdata.
This is not providing a way for others to download the data - it's just an alternative place to store the data locally so it's not filling up our quota.
Side-note: Mark Moore has been talking about setting up a gridFTP server to replace svn inputdata.
Need to make sym links from the old place to the new so old tags continue to work.
Is it an issue that we won't have backups of these data? The GSWP3 and cru-ncep data are in the svn inputdata repo. The WATCH data won't be backed up, which is probably okay: we just want to make sure that the scripts are backed up.
CTSM development will be incremental development from the CLM5 release - won't break things like FATES integration.
How will we deal with big changes coming in - documenting / notifying people?
Feeling is that we'll use the tag/version numbers to denote these significant developments that users may be interested in.
Tag numbers will just refer to ctsm numbers - not clm version numbers.
We'll maintain the ability to revert to clm5.0 namelist options. We'll have ability to use clm5.0 or "current". As CLM evolves (clm5.1, clm5.2, etc.) you would not (e.g.) be able to go back to clm5.1 with a single setting, though you could go back to clm5.0.
How do we want to document bug fixes that affect the science of certain versions? No clear answer....
Dave requests that we have a 20-min presentation at the LMWG meeting walking people through the new workflow, where to get the new version of ctsm, etc.
This could be a live demo as opposed to a presentation.
Ben points out: could be useful to have a quick start slide with key commands.
-
Git migration follow-up: see email from Bill before the holidays
-
Agendas now stored in the notes
-
Ditching origflag
-
Status of lateral flow modularization
-
Interest in CTSM: What's our strategy to introduce people?
https://github.com/escomp/ctsm
has migrated from old repo - in unified_land_model
branch. This will
become master after CLM5 release is done.
People are comfortable with removing it as long as it's in the history. It will be in the CLM5 release.
We may want to re-introduce the parameterizations that were covered with origflag eventually - but can be ditched for now.
Eventually, we may want to treat qflx_latflow_out
and
qflx_latflow_out_vol
in a more unified / consistent way. Will do that
in a second step.
We can do this quickly after the CLM5 release.
We should have some sort of document letting people know what's happening: CTSM branching off of CLM5, with CLM remaining an instantiation of CTSM.
We want to maintain the CLM "brand", probably via a suite of physics options.
We may want to think about how to verify that changes we make moving forward maintain the science of CLM5. Could do something like the verification test that's used for atmosphere and ocean?
- Bill notes that you can't rely on roundoff-level changes remaining roundoff-level in most cases - things tend to blow up over some time, at least in some grid cells.
As we change things like the numerics, do we need to maintain the old numerics, too? Or can we allow changes that are greater than roundoff, which reproduce CLM5 science without necessarily reproducing CLM5 answers? Feeling is that we can be more flexible here: want to maintain CLM5 science capabilities, but not the exact answers.
But we still want a tool to tell us if a change leads to anything going off the rails - e.g., the roundoff-level hydrology reordering that led methane to go to 0. So instead of or in addition to a more formal verification tool, could have something that checks whether any variable went grossly off the rails.
Tag numbering: For CLM, we've come up with a scheme where we flag science changes. We'll probably drop this in CTSM (since it's multi-model), and just note this in the ChangeLog.
We could start with CTSM v. 0 being CLM5.
Currently we've been using tags like clm4_5_18_r270
. clm4_5
gives
release / interim release versions. 18
tells you when the science has
changed in the last release version. r270
is bumped for every tag.
For CTSM, maybe we can get away with just two numbers: a major release version and a tag number.
Updated feeling: let's use 3 numbers:
-
1st number: major release
-
2nd number: big change (e.g., introducing FATES)
-
3rd number: normal changes
Mariana suggests following the model of cime: The rst source for the tech note and user's guide lives in the main repo, so it can/should be updated with the source code. People like this idea.
How do we document what CTSM is? What documents do we point people to?
There's the original vision document, and other documents that we prepared in earlier meetings. Feeling is that we can make those documents available on the wiki.
Should we have a design document? We could have a high-level design document, based on Martyn's ideas (and maybe add something on LILAC). Other than that, the best thing right now could be to point people to some good code examples.
Do we want an email list?
Sometime in spring / summer we want to more broadly introduce CTSM to people.
Timing? When we have the hydrology of Noah-MP working within CTSM could be the time to talk broadly about this.
Want to make the point that LILAC will facilitate adoption.
Present: Dave Lawrence, Rosie Fisher, Ben Andre, Mariana Vertenstein, Bill Sacks, Erik Kluzek.
We don't want to switch over to git until we're ready to totally drop the svn trunk
The git transition will be in beta09. There are still some changes needed in beta08, in the next week or so.
Still more work is needed to get tools working, probably. Erik will work on that.
Let's create a stable development branch in git, so we can bring together whatever changes in git we want. We can start with that at the head of the branch in pr #189.
People feel that the svn to git migration is good to go: we won't need to redo this. So it's safe to base other changes off of what's there.
What is the workflow for tagging once we've moved to git? Do we want to make a tag every time a PR is merged to master, or make less frequent tags?
Initial feeling is that we don't need to tag everything.
We could distinguish between documentation-only changes (or things like .gitignore) vs. substantive changes (code, updating version of mosart, etc.) - allowing documentation-only changes to come to master without full testing and without being tagged.
Dave points out that our whole tag numbering will need to change with CTSM, since it is a multi-model system - so the science version is kind of meaningless.
You should be able to do make latexpdf
. You need the right tools
installed for that to work, though. One problem is that the tech note
and User's Guide may be bundled into one pdf.
-
General
-
Documents
-
Bugs/Issues
-
Tutorials
-
Development guides
CTSM Users:
CTSM Developer Team
-
Meetings
-
Notes
-
Editing documentation (tech note, user's guide)