How to Apply RAPIDS Separately on Train and Test Datasets? #232
Replies: 1 comment
-
Hi @AnasKhann22, thanks for using RAPIDS! You do not necessarily need to run RAPIDS separately for your train and test partitions, but you certainly could. If you would prefer to split your data into train and test sets before processing, you could install two separate copies of RAPIDS (one for train, one for test), and reuse the same configuration file you use to process your training set to process your test set (first making any necessary changes, e.g., to the participants section) to ensure that those data are processed using a consistent set of parameters. Otherwise, you could process all of your data together in a single copy of RAPIDS and split those processed data into your train and test sets afterward. In this case, however, you may want to consider applying any desired data cleaning steps, especially those that apply to features/columns, separately to the train and test sets to avoid data leakage. Let us know if you have any other questions. |
Beta Was this translation helpful? Give feedback.
-
I am using RAPIDS for feature extraction in my project and need guidance on how to apply transformations consistently across training and test datasets.
I am starting with splitting my raw sensor data into Train and Test sets and then applying preprocessing on Train set only but I want to know do I need to apply RAPIDS separately on Train and Test sets and how can I use Train set parameters to use in test set for RAPIDS feature extraction? Thanks!
Beta Was this translation helpful? Give feedback.
All reactions