2025‐01‐07 (CESM Project Meeting)

Jan 7, 2025

Teagan and Mike led a discussion on how to organize observational datasets at a CESM Project Meeting. Lots of thoughts:

No opposition to idea of keeping obs in a centralized location
- Some concern about datasets that we are not allowed to make available publically.
  - Most data could be stored in public place, but some in a private directory with similar structure
  - Only public data in key_metrics, but other examples could rely on private data (may only be available on NCAR machines)
Metadata is very important! Users need to know origin of data, any changes made, etc
Lots of obs are already in the RDA
- Climate Data Guide, for example
- Keep data in current location, but link to data commons? (will this cause issues with repository?)
Is inputdata a reasonable place to keep data?
- Some data is already in inputdata (seaice SST dataset is used in seaice notebooks, and is also forcing for F-compsets)
Should we have a separate repository for processing scripts?
- This is a good idea, but comes with additional cost (someone needs to ensure those scripts continue to work as python evolves)
- OMWG has repository of scripts used for tx2_3v2 grid: datasets, validation, etc; good example of documentation needed for script repo?
Data volume a concern?
- Monthly 1° datasets are trivial compared daily (or high resolution spatial data)
- There's a benefit to interpolating datasets to model output, but that would be in addition to native grid; CTSM uses mksurfdata for batch interpolation, can CUPiD use something similar?