Merging attributes of different cubes before concatenation/merging #5050

schlunma · 2022-11-04T14:39:18Z

✨ Feature Request

Cube concatenation and merging fails if the cubes contain different attributes. A possible way out of this is iris.util.equalise_attributes, which simply deletes attributes that are not identical across cubes.

An alternative approach could be to "merge" attributes that differ, e.g., combine them in a single string with a given delimiter, e.g, " ".

Example

Merge the attributes exp: historical and exp: ssp585 to exp: historical ssp585.

I can image three possible locations for code like this:

A separate function in the util module, e.g, iris.util.merge_attributes.
As part of the concatenate/merge code.
As part of iris.common.resolve.

Is this something that might be interesting for Iris? If yes, I could try to tackle this.

Motivation

Using the existing function iris.util.equalise_attributes is not an optimal solution if one wants to keep attributes.

Additional context

A possible implementation is given in the ESMValTool repository here.

The text was updated successfully, but these errors were encountered:

pp-mo · 2022-11-09T11:37:25Z

From offline "peloton" discussions ...

@bjlittle
we could introduce a callback that is specific to merge or concatenate, which allows users to have some fine grained controls in the process

@pp-mo
I don't see the motivation for this... Regarding Merging attributes of different cubes before concatenation/merging #5050, I don't think a callback is a particularly sensible approach, since it's not about controlling merge/concat, just the handling of attributes, which AFAICT can happily be done in advance ?
So, I think we really need callbacks in loading, because there are no data objects before the load which you can inspect or modify, but we need to specify changes that affect the loading process itself. Whereas, for merge/concat, we can just pre-adjust the cubes going in.
Or am I missing something here ?

@bjlittle
The difference is that the callback on load only sees one cube. There is no sense of controlling how metadata is combined (no matter how odd what the user wants to do)... I was just floating the idea of opening up the box of what merge/concatenate does to allow for user specialisation

@bjlittle
(merge and concatenate are impenetrable black boxes of magic, as they stand)

@bsherratt
I could imagine a callback to merge more along the lines of
cubes.merge(some_attr=lambda val1, val2: f"{val1} {val2}")
ie just telling it exactly what you want for specific attributes
@bsherratt
and if there are still unresolvable differences then it would error/not merge as usual

@bjlittle
yup, something along those lines

@pp-mo re: "callback on load only sees one cube..."
Sorry, I was rather forgetting that the principal use of merge is on loading ...

bjlittle · 2022-11-09T11:40:41Z

@schlunma Thanks for raising this issue 👍

In general, merge and concatenate are black boxes of magic that the user can't influence. I've often wanted to relax that contract in a flexible and meaningful way so that the user can control some of the decisions that are currently hardwired.

Just to get people thinking about this in a general way... for example, what if we were to expose a callback-like capability to merge and concatenate which would allow the user to control certain behaviours, such has how pain-points like Cube.attributes are combined?

Clearly, needs a lot of thought, but to me, without thinking about this too deeply, seems attractive in principle.

The mechanics of how that's implemented is a separate concern, but at the moment I'm more interested to know whether such a capability would have genuine user benefit.

Just a thought...

pp-mo · 2022-11-09T11:53:56Z

Re: @bjlittle "The difference is that the callback on load only sees one cube"
and @bsherratt cubes.merge(some_attr=lambda val1, val2: f"{val1} {val2}")

That's nice, but the proposed form still only deals with cubes 2 at a time.
I'm not too sure what it looks like if we generalise that, but I guess we can provide a
"function : *cube_metadata --> result_metadata"
?

I certainly do agree that it makes sense to apply an operation to a specific attribute, as suggested -- ? or a set of them, somehow?. But anyway control it somehow : Applying it regardless to all attributes will be liable to mess up anything with controlled meaning (i.e. not CF terms, which Iris excludes from user attributes, but terms with meaning according to "some other conventions", like for a data repository)

One thing that also bothers me is that with multiple cubes (>=3), it's not too clear whether for general purposes we would always want to form a list of content options, or just make an entry for each distinct value (i.e. a "set" approach),
e.g. cube::calc_method = "standard; alt1; altx" or cube::calc_method = "alt1; alt1; alt1; standard; alt1; altx" .
NOTE: the linked implementation in ESMValTool is using the "set" approach.
This is obviously about specifics of this particular operation. But it may illuminate how a general facility needs to be designed.

schlunma · 2022-11-10T15:37:03Z

Thanks all for having a look at this!

3 points from our (ESMValTool) perspective:

From our perspective a simple function iris.util.merge_attributes would be enough, but of course a callback would also be great! Though I guess to keep it as general as possible the call signature should be something like callback(cubes) -> cubes.
We are actually using concatenate directly, not within a load call. Thus, for us, a callback within concatenate would be totally fine, but I guess it would also be a good idea to allow setting this callback in the load functions.
Regarding the set vs. list approach - we just opted to the set approach out of convenience, but would also be 100% happy with the list approach as long as all attributes for all cubes are equal.

larsbarring · 2022-11-10T16:14:38Z

I had -- and still have -- a use case very similar to the what @schlunma describes (and that is why equalize_attributes now at least return the deleted attributes instead of just deleting them). But the question is in what order concatenate and merge does their magic on the set of individual, see here for a use case example.

github-actions · 2024-12-01T00:23:27Z

In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity.

If this issue is still important to you, then please comment on this issue and the stale label will be removed.

Otherwise this issue will be automatically closed in 28 days time.

pp-mo · 2024-12-02T10:01:12Z

Re-raising this is timely, as we're hoping to address some related questions under the banner of #5375, when we concentrate on some 'dragon tasks' over the next few weeks.

However I see that the needs here go beyond the use of 'equalise_attributes' or similar, in that it seems you may need an operation which will focus on some specific identified attributes.
I think that would probably still fit into the plans to provide a more general-purpose configurable "equalise" operations, as described here (and related comments)

What is the status of the outstanding need here @schlunma @larsbarring ?
Can you see whether/how it might fit into the proposed scheme ?

larsbarring · 2024-12-02T14:23:18Z

I believe our needs have been solved by several "pre-processor" tools that harmonise the data before going into the main analysis tools. A key part of the pre-processors is the take care of the output from the iris.util.equalise_attributes as part of the general handling of global attributes. For our purposes, where the data cube is spread across several time-slice files, the order of the individual files' attribute is straightforward in relation to the concatenation order. This is not necessarily the true in more general 2-d or even n-d cases.

But with respect coordinate attributes, that often contain floating point data, I still think that something else than exact equality is required.

ping @nCarolina

schlunma · 2024-12-03T10:59:35Z

We have a custom solution in ESMValCore that basically does what I described in this PR description. Since we are quite pleased with this, I am happy to close this issue here in favor of #5375. If there was a similar feature within Iris, we would use it, but this is really not critical for us at all.

schlunma added Feature: Merge/Concatenate Feature: ESMValTool labels Nov 4, 2022

schlunma mentioned this issue Nov 4, 2022

Made equalized attributes in concatenated cubes consistent across runs ESMValGroup/ESMValCore#1783

Merged

10 tasks

ESadek-MO assigned pp-mo Nov 9, 2022

trexfeathers moved this to 🆕 New in ESMValTool Feb 20, 2023

trexfeathers added this to ESMValTool Feb 20, 2023

trexfeathers added this to 🚴 Peloton Jun 23, 2023

trexfeathers unassigned pp-mo Jul 19, 2023

scitools-ci bot removed this from 🚴 Peloton Dec 15, 2023

scitools-ci bot added this to 🚴 Peloton Dec 15, 2023

trexfeathers removed the status in ESMValTool Feb 29, 2024

github-actions bot added the Stale A stale issue/pull-request label Dec 1, 2024

github-actions bot removed the Stale A stale issue/pull-request label Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merging attributes of different cubes before concatenation/merging #5050

Merging attributes of different cubes before concatenation/merging #5050

schlunma commented Nov 4, 2022

pp-mo commented Nov 9, 2022 •

edited

Loading

bjlittle commented Nov 9, 2022 •

edited

Loading

pp-mo commented Nov 9, 2022

schlunma commented Nov 10, 2022 •

edited

Loading

larsbarring commented Nov 10, 2022

github-actions bot commented Dec 1, 2024

pp-mo commented Dec 2, 2024 •

edited

Loading

larsbarring commented Dec 2, 2024 •

edited

Loading

schlunma commented Dec 3, 2024

Merging attributes of different cubes before concatenation/merging #5050

Merging attributes of different cubes before concatenation/merging #5050

Comments

schlunma commented Nov 4, 2022

✨ Feature Request

Example

Motivation

Additional context

pp-mo commented Nov 9, 2022 • edited Loading

bjlittle commented Nov 9, 2022 • edited Loading

pp-mo commented Nov 9, 2022

schlunma commented Nov 10, 2022 • edited Loading

larsbarring commented Nov 10, 2022

github-actions bot commented Dec 1, 2024

pp-mo commented Dec 2, 2024 • edited Loading

larsbarring commented Dec 2, 2024 • edited Loading

schlunma commented Dec 3, 2024

pp-mo commented Nov 9, 2022 •

edited

Loading

bjlittle commented Nov 9, 2022 •

edited

Loading

schlunma commented Nov 10, 2022 •

edited

Loading

pp-mo commented Dec 2, 2024 •

edited

Loading

larsbarring commented Dec 2, 2024 •

edited

Loading