Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mutate superseding transmute should allow ordering columns #6861

Open
epruesse opened this issue May 30, 2023 · 9 comments
Open

mutate superseding transmute should allow ordering columns #6861

epruesse opened this issue May 30, 2023 · 9 comments
Labels
columns ↔️ Operations on columns: mutate(), select(), rename(), relocate() feature a feature request or enhancement

Comments

@epruesse
Copy link

I recently noticed that transmute has been marked as superseded by mutate(.keep="none"). However, it turned out that mutate doesn't replicate column ordering behavior of transmute, but does something odd:

> data.frame(a=1, b=2) %>% transmute(a, x=b*2, b)
  a x b
1 1 4 2
> data.frame(a=1, b=2) %>% mutate(a, x=b*2, b, .keep="none")
  a b x
1 1 2 4

With more complex examples, the ordering becomes pretty confusing and difficult to explain. I'm guessing this may have to do with the .keep = "used" use case resorting things. For .keep = "none", explicit column ordering as given, replicating or approximating transmute behavior, would be much more useful (e.g. order of first LHS mention or last LHS mention).

@mgirlich
Copy link

mgirlich commented Jun 6, 2023

This is one of the reasons why I still use transmute() over mutate(.keep = "none").

@hadley hadley added bug an unexpected problem or unintended behavior columns ↔️ Operations on columns: mutate(), select(), rename(), relocate() labels Jun 28, 2023
@DavisVaughan
Copy link
Member

DavisVaughan commented Jul 17, 2023

Two very related PRs:

Extremely important paragraph:

The dev behavior of .keep = "none" is overall more consistent with the rest of the mutate() options, makes it easier to predict the output when combined with .before and .after, and simplifies the implementation because it means that .keep never affects the column ordering, it is mainly about which columns get dropped (#6035 goes into this in great detail).

So we should be extremely careful when considering if we want to make any changes here. #6035's big insight is that .keep should not affect the column ordering at all, and I don't think we should go back to that.

An important invariant that falls out here is that .keep plays no role in the column ordering, and I think that is valuable. I think giving keep = "none" special behavior in a few places that changed column order is what made this so hard to get correct before.

I spent a lot of time thinking about those two PRs, and I still think the current implementation is solid theoretically, so I don't think this is a bug as much as some way to incorporate a separate idea from transmute() over into mutate().

@DavisVaughan DavisVaughan removed the bug an unexpected problem or unintended behavior label Jul 17, 2023
@DavisVaughan
Copy link
Member

DavisVaughan commented Nov 3, 2023

I've refreshed myself on the logic in #6035 (comment), and I am confident that the current implementation of mutate(.keep = ) with its current 4 variants is correct.

A key principle is that only 1 argument should be able to affect the output ordering. As of right now, .keep does not affect the output ordering in any way. Only .before and .after affect the output ordering, and even they are mutually exclusive because anything else is ambiguous.

So, the only thing I think I can offer is a 5th variant of .keep, let's call it "transmute" for now for lack of a better name. If .keep = "transmute" is set, then .keep would work like "none" but would also now affect the output ordering, meaning that .before and .after would be disallowed in this one case (again, only 1 thing should be able to affect the output ordering).

I feel like .keep = "transmute" is both a bad name and a great name. Bad because it isn't super descriptive on its own, but great because it evokes the legacy idea of "transmute". And also great because we already have "none", and this is basically "none" + transmute ordering, and I can't think of a different word for that idea.

@DavisVaughan DavisVaughan added the feature a feature request or enhancement label Nov 3, 2023
@mmuurr
Copy link

mmuurr commented Nov 20, 2023

If .keep = "transmute" is simply going to replicate the previous transmute() behavior ... perhaps we can just un-supersede/deprecate transmute()?

As a second bit of reasoning, there's code clarity: I find (and I assume other readers of code are like me) that putting .keep = "none" at the bottom of an expression fundamentally makes reading data pipelines harder, and that seeing transmute() at the start of a step is a clear indicator that we'll be defining a 'new' table in this upcoming step.

I miss my friend transmute() :-)

@eutwt
Copy link
Contributor

eutwt commented Dec 19, 2023

If reframe() had a .size argument similar to vctrs::vec_recycle_common(), I think reframe(..., .size = n()) would be transmute().

@wgrundlingh
Copy link

How about adding another option: .order = c('original', 'update') or whichever choice of words fit better here, with the default being original. Maybe even c('default', 'new'). The default would be to maintain the order of the original frame (as is the case currently), or update it to the new order (the way transmute does.

@orthospar
Copy link

Just a quick comment to say that I second the idea of either adding an .order or a .keep option to preserve the ordering output as expected from transmute.

I am currently working on a a script that converts default data tables as generated from specific hardware to human-readable tables suitable for publication in reports. As such, column order is important. I would have used transmute, but for long term stability decided to go with mutate and .keep = "none".

What is frustrating me is that columns that are carried across unchanged are staying in their original order, and any new columns (either renamed or calculated) are appended to the right in the order they are called. As a further example:

> old <- data.frame(var1 = 1:5, var2 = 6:10, var3 = 11:15, var4 = 16:20, var100 = 101:105)

> old %>% mutate(
  var1 = var1, 
  var2a = var2,
  var3 = var3, 
  var4 = var4,
  var5 = var4*2,
  .keep = "none"
)

Where I expect the order of "var1", "var2a", "var3", "var4", "var5", I instead get "var1", "var3", "var4", "var2a", "var5".

@jxu
Copy link

jxu commented Jun 28, 2024

Just a quick comment to say that I second the idea of either adding an .order or a .keep option to preserve the ordering output as expected from transmute.

I am currently working on a a script that converts default data tables as generated from specific hardware to human-readable tables suitable for publication in reports. As such, column order is important. I would have used transmute, but for long term stability decided to go with mutate and .keep = "none".

What is frustrating me is that columns that are carried across unchanged are staying in their original order, and any new columns (either renamed or calculated) are appended to the right in the order they are called. As a further example:

> old <- data.frame(var1 = 1:5, var2 = 6:10, var3 = 11:15, var4 = 16:20, var100 = 101:105)

> old %>% mutate(
  var1 = var1, 
  var2a = var2,
  var3 = var3, 
  var4 = var4,
  var5 = var4*2,
  .keep = "none"
)

Where I expect the order of "var1", "var2a", "var3", "var4", "var5", I instead get "var1", "var3", "var4", "var2a", "var5".

Precisely. New columns are appended to the right, but since we explicitly name every column we want, we should be able to control exactly how the columns end up like transmute intuitively does.

@pitakakariki
Copy link

I learned about transmute when I tried to use mutate semantics with select. That's how I usually think when I use transmute now - as a variant of select, rather than a variant of mutate.

If the goal is to reduce the number of verbs, an alternative could be to add mutate semantics to select and have that supersede transmute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
columns ↔️ Operations on columns: mutate(), select(), rename(), relocate() feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

10 participants