Skip to content

2025‐01‐07 (CESM Project Meeting)

Michael Levy edited this page Jan 17, 2025 · 1 revision

Jan 7, 2025

Teagan and Mike led a discussion on how to organize observational datasets at a CESM Project Meeting. Lots of thoughts:

  • No opposition to idea of keeping obs in a centralized location
    • Some concern about datasets that we are not allowed to make available publically.
      • Most data could be stored in public place, but some in a private directory with similar structure
      • Only public data in key_metrics, but other examples could rely on private data (may only be available on NCAR machines)
  • Metadata is very important! Users need to know origin of data, any changes made, etc
  • Lots of obs are already in the RDA
    • Climate Data Guide, for example
    • Keep data in current location, but link to data commons? (will this cause issues with repository?)
  • Is inputdata a reasonable place to keep data?
    • Some data is already in inputdata (seaice SST dataset is used in seaice notebooks, and is also forcing for F-compsets)
  • Should we have a separate repository for processing scripts?
    • This is a good idea, but comes with additional cost (someone needs to ensure those scripts continue to work as python evolves)
    • OMWG has repository of scripts used for tx2_3v2 grid: datasets, validation, etc; good example of documentation needed for script repo?
  • Data volume a concern?
    • Monthly 1° datasets are trivial compared daily (or high resolution spatial data)
    • There's a benefit to interpolating datasets to model output, but that would be in addition to native grid; CTSM uses mksurfdata for batch interpolation, can CUPiD use something similar?
Clone this wiki locally