-
I'm looking to do some very quick filtering of outliers in a number of different columns using different criteria. For example, I'd like to filter out all records with a z-score greater than 3 in one column. I can see how to do this as multiple chained steps: create a new column with a z-score, filter against the z-score column. Is it possible to do without creating an intermediate column? Thanks. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Filter expressions can use window and aggregate functions, so try the following: dt.filter(d => (d.x - op.mean(d.x)) / op.stdev(d.x) <= 3) Just keep in mind that the |
Beta Was this translation helpful? Give feedback.
Filter expressions can use window and aggregate functions, so try the following:
Just keep in mind that the
mean
andstdev
values are computed by group, and so will be sensitive to any precedinggroupby
verb.