Replies: 2 comments
-
Hi @san-r , intersect
union
setdiff
|
Beta Was this translation helpful? Give feedback.
-
Thanks for quick reply. Here are my observations: intersectWe can use inner-join method to get the result. We need to list all column names with the -j parameter, which may be difficult if working with tables with hundreds of columns. Also, I think "then unsparsify" is not needed here since both tables have same column names. unionUsing "uniq -a" on the listed multiple csv files does the job. Another way which does the same thing in two steps is: setdiffWe can use anti-join method to get the result. We need to list all column names with the -j parameter, which may be difficult if working with tables with hundreds of columns. Also, I think "then unsparsify" is not needed here since both tables have same column names. So we know the workaround methods. Thanks. |
Beta Was this translation helpful? Give feedback.
-
This is with reference to set operations available in R, which may be seen in the second page of https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf under the section "Combine Data Sets" sub-section "Set Operations":
Suppose we have two tables as under:
y.csv
z.csv
Now the Set Operations:
intersect (Rows that appear in both y and z)
R command:
intersect(y, z)
Equivalent Miller command:
?
union (Rows that appear in either or both y and z)
R command:
union(y, z)
Equivalent Miller command:
?
setdiff (Rows that appear in y but not z)
R command:
setdiff(y, z)
Equivalent Miller command:
?
Can anyone please post the Miller equivalent of these operations.
Beta Was this translation helpful? Give feedback.
All reactions