You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I notice that both in the README.md and train.php the same mistake is made:
Namely that ZScaleStandardizer is used BEFORE the train test split and not AFTER. This results in information leakage right from the start and puts into question all the various metrics at the end.
The correct approach would be using ZScaleStandardizer on the training set only and capturing it's parameters to repeat on the testing set before using the trained model to make predictions.
This can potentially mislead newer users studying machine learning into bad habits that will later need to be unlearned.
The text was updated successfully, but these errors were encountered:
I notice that both in the
README.md
andtrain.php
the same mistake is made:Namely that
ZScaleStandardizer
is used BEFORE the train test split and not AFTER. This results in information leakage right from the start and puts into question all the various metrics at the end.The correct approach would be using
ZScaleStandardizer
on the training set only and capturing it's parameters to repeat on the testing set before using the trained model to make predictions.This can potentially mislead newer users studying machine learning into bad habits that will later need to be unlearned.
The text was updated successfully, but these errors were encountered: