-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make DRS path templates configurable per rootpath #1894
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1894 +/- ##
=======================================
Coverage 94.51% 94.52%
=======================================
Files 246 246
Lines 14024 14036 +12
=======================================
+ Hits 13255 13267 +12
Misses 769 769 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to say awesome, this is very cool! and it is, but I am starting to quiver at every addition to the configuration. This one in particular is cumbersome, and as awesome its functionality, as bad as the way it's done in the configuration; here "bad" in the sense of user friendliness, nothing to do with how it's implemented. A few points:
- I would add a check if the user actually inputs
drs: ...
and that differs from the one in/path/: drs
- I know you are mapping path to drs but the name ESGF to which the path is mapped to is confusing especially since the examples on DKRZ are not related to ESGF but, rather, to some bundled data in one rando directory
- overall, this is testing the limit of how more complicated can we make things for users, I am not at all at ease with this TBF, and I know I'm not helping by moaning but - there's gotta be an easier route. Just imagine you are a first time user of ESMValTool - how would you feel seeing such a configuration file? And not you, as a talented software engineer, but you as a PhD student just starting studying climate modelling
~/climate_data: ESGF | ||
|
||
Note that this notation combines the ``rootpath`` and ``drs`` settings, so it | ||
is not necessary to specify the directory structure in under ``drs`` in this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is not necessary to specify the directory structure in under ``drs`` in this | |
is not necessary to specify the directory structure under ``drs`` in this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if they do, and either that (drs: ...
) or this one is wrong, or, if they differ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well OK, but one needs to toss a warning at least, say eg "You specified drs: DRS but the other one is used... so that one gets ignored etc"
The alternative would be to completely redesign how we configure finding data. There are some good ideas out there already, see e.g. #795 (comment), the issues linked therein, and #1165. However, none of these options would be backward compatible. That's why I opted for keeping the existing functionality and extending it with yet another option. |
yes! Redesign is the key here - at the end of the day, the user will still get their data, but going through much less a pain how to configure their environment. If we go on like this, even us, the devs, will start getting lost in configuration - I, for one, am starting to already, and only have about 6 years experience with the Tool 😁 |
# CORDEX: /work/ik1017/C3SCORDEX/data/c3s-cordex/output | ||
# CMIP6: | ||
# /work/bd0854/DATA/ESMValTool2/CMIP6_DKRZ: DKRZ | ||
# /work/bd0854/DATA/ESMValTool2/download: ESGF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK let's try work with this - leaving aside my rather negative comments above for now, can we at least make this a reverse key-val instance, so maybe it's a bit more user-firndly? eg:
CMIP6:
- DKRZ: /work/bd0854/DATA/ESMValTool2/CMIP6_DKRZ
- ESGF: /work/bd0854/DATA/ESMValTool2/download
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it more user-friendly to reverse the key and values? Do you mean to use a list or a dict? This example doesn't look like valid yaml.
Hi @bouweandela, sorry for being late in reviewing this. I will take a look once the comment's from @valeriupredoi are addressed, would that be ok? Please let us know what is needed to bring this PR to the finish line 🍺 |
How about adding a new item to config-user.yml that can be used to directly construct data_sources:
- type: esmvalcore.local.DataSource
project: CMIP6
rootpath: /work/bd0854/DATA/ESMValTool2/CMIP6_DKRZ
dirname_template: '{activity}/{institute}/{dataset}/{exp}/{ensemble}/{mip}/{short_name}/{grid}/{version}'
filename_template: '{short_name}_{mip}_{dataset}_{exp}_{ensemble}_{grid}*.nc' the This has some advantages:
There's also a few disadvantages:
@ESMValGroup/technical-lead-development-team Please comment if you have an opinion. |
I personally like this new feature very much (the way it is implemented right now)! Specifying DRS per rootpath will make my daily work much easier (I don't know how much time I wasted in the past with creating nested directories when I wanted to use data from two different sources). So I am definitely in favor of having this in v2.8. About the key-value order: since rootpaths are unique and DRS are not (each rootpath has a single DRS, but each DRS can be used for multiple rootpaths) the order that @bouweandela proposed (e.g., About the customized data sources: I also like this idea very much! Being able to define custom file and dir templates quickly in the config-user file without messing with a custom config-developer file is great! However, I agree that this might be more intimidating for new users. Maybe we could add some defaults in the config-developer file and the user has the possibility to add custom entries in the config-user file? This could mean that the config-developer and config-user files contain identical options with the config-developer as machine/default configuration, and the config-user as the user-defined configuration. The current distinction of these two files feels rather arbitrary to me. I think this has been proposed in the past, but I cannot find the corresponding issue. Maybe something for a v3.0? TLDR; I am in favor of merging this as is, and we should think about allowing custom data sources in a future version. |
So here's the thing - just like Eisenhower once said, I like all this, as long as it's not in the config-user file 😁 Are you guys not realizing that we're slowly beginning to become a tool with such a complicated configuration that one needs less time to learn to fly a Boeing 737 than to learn to correctly configure ESMValTool? I'll bark more on the call today 🍺 |
To be honest, I think the current version of specifying DRS+rootpath in two different places ( |
this is not decommissioning the other config option, it adds to it it as an extra option. At any rate, let's talk on the call, we can add here and in #1609 an executive summary of our discussion on the call, later on |
The @ESMValGroup/technical-lead-development-team decided to postpone this feature to the next release. |
Thanks @bouweandela for all your work on this already and agreeing to shift this to the next release. Let's make sure that we have a release planning meeting soon after v2.8 is released to discuss this PR. Or even a specific meeting to discuss how we could revamp the configuration of ESMValTool in order to increase the user-friendliness of the Tool |
I'd really like to revive this discussion here. The changes proposed in this PR, i.e., being able to do something like this download_dir: /work/bd0854/DATA/ESMValTool2/download
rootpath:
CMIP6:
/work/bd0854/DATA/ESMValTool2/CMIP6_DKRZ: DKRZ
/work/bd0854/DATA/ESMValTool2/download: ESGF would be really helpful, especially when dealing with native model output. I really hope we can include this in v2.9. |
Since there has been no movement on this and we'll have the discussion about how to configure the tool in the future at the SMHI workshop ESMValGroup/Community#98 planned right after the v2.9 release, I'm bumping this to the next release. |
At the workshop it was agreed that a new configuration file format is acceptable and I'll write a proposal. Dropping this from the v2.10 milestone because it seems unlikely that we can agree on the new format before that time. |
It's been 8 months since the workshop, and there was no progress regarding the new configuration. Is there any way we can merge this feature as it is proposed in this PR? I know that it would be much cleaner to refactor our configuration, but this just doesn't seem realistic at this point. On the other hand, this issue has been open since 2018 (!), and many people would benefit a lot from this. |
Sorry for being slow with the writeup for the new configuration, I'll try to get it done next week. About merging this as-is, maybe that would be something to discuss at the next @ESMValGroup/technical-lead-development-team meeting? |
Good idea, I just added it to the agenda. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Judas has re-converted himself 😁
don't blame me for holding this up for too long and now we have a bazillion conflicts 😁 |
@bouweandela I would really like to have this in v2.12. Would it be fine for you if I solve the merge conflicts and finalize this PR? |
Yes, that would be great! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this for a number of different cases and it works fine but only as long as you do not remove the DRS entry completely. If you do so while not explicitly specifying a DRS format (e.g. when reusing lines from an old config file) the tool shows some unexpected behavior. For example obs4mips is then expected to follow the ESGF DRS instead of 'default' as it was the case for old configuration files.
Maybe this behavior could be changed so the DRS section can be removed safely?
Thanks for testing @axel-lauer! 🚀 The reason for this behavior is the default entry for I agree that this behavior is confusing, but this was not introduced with this PR. With the current main branch you get the same result. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks everyone, looks great! 🚀
Description
Add the possibility to specify the names of the directory and filename templates in config-user.yml per path instead of per project.
Example
Closes #129
Link to documentation: https://esmvaltool--1894.org.readthedocs.build/projects/ESMValCore/en/1894/quickstart/find_data.html#explaining-config-user-rootpath
Before you get started
Checklist
It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.
To help with the number pull requests: