tar_group without dplyr dependency #395
Closed
robitalec
started this conversation in
Show and tell
Replies: 1 comment 1 reply
-
Neat use case, thanks for sharing! Another workaround is use |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Prework
Proposal
Not sure this is necessarily "new feature" but I wasn't sure exactly where to organize it.
I was just testing out the
targets::tar_group
and relatedtarchetypes::tar_group_by
functions.If one is working with a
data.table
, the requirement to usedplyr::group_by
can be tricky.The data.table class is lost, and downstream targets can fail.
I adapted the example
_targets.R
created withtargets::tar_script
:This works fine as expected.
The next step was to use the
group_by
andtar_group
functions:Since the
summ
function is expecting a data.table, this (unsurprisingly) breaks.So I had a look at the source for
tar_group
hereThis looked to me similar in to using data.table's
.GRP
function to append a column indicating a group index. In this case named "tar_group".So I modified the function to do that instead (obviously we'd need to replace much of the assertions and checks, but just in an exploratory mode for now)
Here's that target updated with the quick function.
And finally, I realized that it's really just a special column name (tar_group) for targets.
This brings me to why I'm not sure that's really an issue describing a new feature, it's rather maybe just a neat workaround.
If we assign a column to the data and use iteration = 'group', we don't need a
tar_group
ordplyr::group_by
call at all.And that's it! Data.table-aware
tar_group
equivalent. I'm going to use it in my own target workflows. Not sure how it fits into targets, but thought you might want to hear about it.This small example doesn't really seem necessary given the power of data.table's by. But where we pass a grouped data.table to downstream targets that require chunks of a data.table for other kinds of processing (say converting to a matrix), this kind of tar_group flexibility could be really useful. I ran into this when I was using a map over file paths, but then wanted to run on arbitrary chunks for ID and years within file paths. So I couldn't just set up a wrapper static map with a list of years for example.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions