How to Apply RAPIDS Separately on Train and Test Datasets? #232

AnasKhann22 · 2024-07-11T13:08:28Z

AnasKhann22
Jul 11, 2024

I am using RAPIDS for feature extraction in my project and need guidance on how to apply transformations consistently across training and test datasets.

I am starting with splitting my raw sensor data into Train and Test sets and then applying preprocessing on Train set only but I want to know do I need to apply RAPIDS separately on Train and Test sets and how can I use Train set parameters to use in test set for RAPIDS feature extraction? Thanks!

jenniferfedor · 2024-07-12T20:49:55Z

jenniferfedor
Jul 12, 2024
Collaborator

Hi @AnasKhann22, thanks for using RAPIDS! You do not necessarily need to run RAPIDS separately for your train and test partitions, but you certainly could.

If you would prefer to split your data into train and test sets before processing, you could install two separate copies of RAPIDS (one for train, one for test), and reuse the same configuration file you use to process your training set to process your test set (first making any necessary changes, e.g., to the participants section) to ensure that those data are processed using a consistent set of parameters.

Otherwise, you could process all of your data together in a single copy of RAPIDS and split those processed data into your train and test sets afterward. In this case, however, you may want to consider applying any desired data cleaning steps, especially those that apply to features/columns, separately to the train and test sets to avoid data leakage.

Let us know if you have any other questions.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to Apply RAPIDS Separately on Train and Test Datasets? #232

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How to Apply RAPIDS Separately on Train and Test Datasets? #232

AnasKhann22 Jul 11, 2024

Replies: 1 comment

jenniferfedor Jul 12, 2024 Collaborator

AnasKhann22
Jul 11, 2024

jenniferfedor
Jul 12, 2024
Collaborator