-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend ACCESS-OM3 filename regex #178
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #178 +/- ##
==========================================
+ Coverage 96.98% 97.33% +0.35%
==========================================
Files 9 9
Lines 631 902 +271
==========================================
+ Hits 612 878 +266
- Misses 19 24 +5 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this won't work if the filename patterns change during the experiment? e.g. half the results use ...1900-XX-XX.nc and the other half use 1900-01-daily.nc ?
tests/test_source_utils.py
Outdated
), | ||
), | ||
( | ||
"access-om3.cice.h.1900-01-daily", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also add "access-om3.cice.h.daily.1900-01" ?
I'm not sure what you mean by "won't work" here. They would get distinct |
I think we're now heading (back) towards using CMIP6-like vocab for the frequency E.g. So let's put a pin in the PR until the naming is resolved. |
Okay, I think we've finialised a format - see here. I'll update this PR. |
My concern is that the archive script which turns "access-om3.cice.h.1900-01-01" into "access-om3.cice.h.1900-01-daily" might fail, and then get fixed. Meaning the catalog ends up with two fileids when ideally we want them to be considered the same dataset. |
Right, this sort of issue can't really be handled with the way dataset grouping is done at the moment. The fix for this currently would be to go back and concatenate the files that were missed. I think in your specific example that's actually the best solution anyway. But we will be revisiting how the grouping is done soon to make it more robust. |
I think with |
@marc-white I got halfway through this and then got distracted. The issue is that the regex description for ACCESS-OM3 is overly prescriptive and doesn't work with, for example, annual files. Would you be able to take this over? (It may be easier to just close this and start again, given your recent changes) This comment contains a summary of what ACCESS-OM3 output should look like in the not-too-distant future. @anton-seaice can also help answer any questions. |
Yep, happy to take over. |
Hi @marc-white, I wanted to check in regarding the status of this PR. I am currently having issues with the regex description for ACCESS-OM3, as it doesn’t seem to work with annual files. |
@minghangli-uni this got buried under other things, but I'll take a look now. |
So is this where the filename formats got to? The original issue is pretty convoluted... COSIMA/access-om3#190 (comment) |
@minghangli-uni could you give me some sample filenames that you're trying to work with? |
Sure, you can find some in /scratch/tm70/ml0072/access-om3/archive/longerexpt5_rr_mean_snap3-perturb-5873b4db/output000 |
@minghangli-uni I'm a bit confused, which are the annual files in there you're having issues with? I can only see what appear to be monthly files, with a smattering of others... |
I think the problem I am having is related to the annual filenames. here PATTERNS = [
rf"[^\.]*\.{PATTERNS_HELPERS['om3_components']}\..*({PATTERNS_HELPERS['ymds']}|{PATTERNS_HELPERS['ymd']}|{PATTERNS_HELPERS['ym']})$", # ACCESS-OM3
] It now doesnt support something like ['y']? |
@minghangli-uni ah yes, I see now. @dougiesquire has already made that update in this pull request, although I can see a couple of other things missing when I compare to your sample directory that I'll need to update. |
…ncludes *partial* cherry-picked test updates - more to follow, including test data updates)
@minghangli-uni @dougiesquire @anton-seaice is the |
All of our current configs use the names 'access-om3' but I wouldn't rely on that too much. It would be nice to be able to change that name without having to change any code in the catalog builders. |
@anton-seaice it's more that the current regex updates for ACCESS-OM3 have broken one of the |
Which pattern isn't working ? Its probably fine to dispose of it but also nice if users can make modifications to filenames if they want without breaking things |
…asons (see #178 discussion)
I managed to work out the issue with the regular expressions, and it seems to now be in a state where it works with both the new During this, I ran into an interesting regex conundrum that sucked up a lot of my day, which I wondered if anyone had seen before... The current PATTERNS = [
rf"[^\.]*\.{PATTERNS_HELPERS['om3_components']}\..*({PATTERNS_HELPERS['ymds']}|{PATTERNS_HELPERS['ymd']}|{PATTERNS_HELPERS['ym']})$", # ACCESS-OM3
] (look at the group in parenthesis that references the The re documentation says that However, if I added a Y only pattern to the end of the
It's almost like the
Any thoughts from anyone (but particularly @dougiesquire )? |
I agree that's odd. Unfortunately nothing springs to mind sorry. I'm definitely not a regex expert. |
Thats pretty odd, I guess you could test it by changing the order of patterns in |
Yep, tried that. It always seemed to lock on to the "Y" pattern, regardless of where I put it. |
Closes #176. Decided just to add a quick fix for this. However, this whole process needs to be revisited soon as it clearly isn't scaling to the range of ACCESS outputs.