Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ACCESS-CM3 Builder #168

Open
dougiesquire opened this issue May 2, 2024 · 15 comments
Open

ACCESS-CM3 Builder #168

dougiesquire opened this issue May 2, 2024 · 15 comments
Assignees
Labels
enhancement New feature or request

Comments

@dougiesquire
Copy link
Collaborator

ACCESS-CM3 is now at the point of producing output. It could be useful to have an ACCESS-CM3 Builder to allow us to build datastores for ACCESS-CM3 data.

ping @MartinDix, @kieranricardo

@dougiesquire dougiesquire added the enhancement New feature or request label May 2, 2024
@anton-seaice
Copy link
Collaborator

@sfiddes would also like a builder for UM regional nesting suite output - its not clear to be if this is the same or a different builder ? If someone is looking at this soon she can provide some example output :)

@marc-white
Copy link
Collaborator

I think I'm on the hook for new builders @anton-seaice , so if you can provide some demo data I can start looking into it.

@marc-white marc-white self-assigned this Aug 21, 2024
@dougiesquire
Copy link
Collaborator Author

@paolap set up a builder (in a fork?) for Aus2200 output that could do what we need. Paola, are you planning on contributing that in to the main package?

@sfiddes
Copy link

sfiddes commented Aug 21, 2024

I have put a couple of files here: /scratch/public/slf563/UM_RNS. It is fairly niche sorry, but if there is something that works for AUS2022 that would be a good start... Also, I've just used iris to convert from pp files, so the naming might also be a bit off. I am looking for a better way to do the data processing to use CF conventions etc.. So perhaps not the best dataset to start with - AUS2022 probably better, but with recognition that not everyone uses AUS2022 :)

@paolap
Copy link

paolap commented Aug 21, 2024

@paolap set up a builder (in a fork?) for Aus2200 output that could do what we need. Paola, are you planning on contributing that in to the main package?

Yes that was the idea, I was waiting to have time to review what I did and make it more flexible so that a user could control which extra fields would appear in the catalogue attributes. I'm busy all this week and I would need to adapt it to whatever change Marc introduced, I can have another look at it next week and than set a pull_request I did put a link in another issue.

I also had a chat with Sonya previously about post-processing her data as Mopper has already the capacity of doing so, but currently starting from the raw model output rather than from the iris version. One issue with iris is that seems to make up standard_names, which is a potential source of information for mopper, so it would confuse the tool. Again I won't have anytime to look at this until next week.
Finally if there are examples of CM3 output I would love to get a sample so we can provide mapping for this version in our tool.

@marc-white
Copy link
Collaborator

@paolap how did you go with your new builder?

@paolap
Copy link

paolap commented Sep 13, 2024

I didn't do anything more, maybe next week I'll have some time to review it and pull your changes. It was working for me for the example I wanted to run. The catalogue I produced is the one Navid used in his atmospheric cookbook example.

@paolap
Copy link

paolap commented Oct 23, 2024

Finally, I'm going to have some time next week to look into this again, so far I just updated my fork and merge your updates from main. Let me know, if there's more from other branches that I should be aware of, i.e. more changes to the builders in particular.

@marc-white
Copy link
Collaborator

Hi all, is there some representative ACCESS-CM3 data I can use for this?

@paolap
Copy link

paolap commented Nov 14, 2024

@marc-white finally I had sometime to look at this, way later than I would have liked to! I updated the MopperBuilder class to follow the changes you implemented including using the new class _AccessNCFileInfo

As mopper is a wrapper to CMOR that writes one variable per file it would be great if it was possible to avoid a multivariable file setup when it's not needed. Multivariable esm catalogues have some limitations, it's also potentially more confusing from a user point of view.

My changes are in my own fork in the aus2200 branch, the main branch should be up to date with the official repo:

https://github.com/paolap/access-nri-intake-catalog

@marc-white
Copy link
Collaborator

Thanks @paolap , I'm going to pull the branch into the main repo to take a look & continue work.

@marc-white
Copy link
Collaborator

@paolap what are you providing as the arguments fpattern and toselect to the MopperBuilder (as an example)?

@paolap
Copy link

paolap commented Nov 20, 2024

This config file should show what they are for the dataset I sued as test:

https://github.com/paolap/access-nri-intake-catalog/blob/aus2200/config/access-mopper.yaml

Basically "fpattern" is set in the file used as configuration for mopper to build the directory structure of the post-processed output.
See:

"path_template" and "file_templat"e in this yaml file:
https://github.com/ACCESS-Community-Hub/ACCESS-MOPPeR/blob/main/ACDD_conf.yaml
~ line 38

This follow more strict rules for CMIP6 data as shown in the equivalent file if following CMIP6.
CMOR will use this template to create directories and build the filenames.

This is why it made sense for me to use the same setup for the builder. Then I added toselect to allow a user to chose what attributes they wanted in the catalogue.
NB that the fpattern won't be exactly the same, as for example the variable name is identified as variable_id in CMOR (can't be changed) but is identify as "variable" in your catalogue. But it's easy enough for a user to adapt the terms.

The date_range is determined by the frequency automatically by CMOR as fromdate-todate, hence it doesn't appear in the template.
For example for monthly data will be something like YYYYMM-YYYYMM
But for 3 hourly is YYYYMMDDhhmm-YYYYMMDDhhmm
Where YYYYMMDDhhmm will represent the actual timestamp of the first and last timestep.

@marc-white
Copy link
Collaborator

Thanks @paolap , that's good information. Following on from that:

  • Is the structure of fpattern liable to change very much/very often? And if it does change, is the one recorded in access-mopper.yaml a sensible default to fall back on if people forget to define one?
  • Similarly, do you expect people to have strongly differing requests for the value of toselect?

@paolap
Copy link

paolap commented Nov 20, 2024

It could change, it's up to a user really, in my experience most people would stick to what it's suggested there. If it changes it should be quite straightforward to get it from whoever process the data or work it out from the data itself.
As for "toselect", I don't think the person who requests for a dataset to be added would have strong views, it's more for whoever manages the catalogue. As it can add fields that might be useful for querying the data.
In that exampl, I didn't use "realm" because it's an atmosphere only run, but it's possible for other runs that "realm" is also there and then you would want to select it.
In general both should be more useful when you have something complex and more "CMIP" style.

And I forgot to say that the data I used as an example is getting published, but it will then be in a different project "bs94" hopefully it will be available before I finish work in two weeks, and i can give you the right path, I will open another ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: In Progress
Development

When branches are created from issues, their pull requests are automatically linked.

5 participants