Replace spaces with underscores in column names also for the predict function #689

GoldenGoldy · 2024-08-02T16:27:49Z

GoldenGoldy
Aug 2, 2024

I found that PySR warns about spaces in column names when passing the .fit function data where this occurs. It then replaces the spaces in the column names with underscores and prints a warning about this. You can then proceed with fitting the data as per normal.
When later calling the .predict function, this does not attempt to make the same replacement of spaces with underscores in the column names.
So, if we have a fitted model and want to use it to make predictions, and we pass data to the .predict function in the same format that we used for the .fit function, we can run into the following issue:
The predict function (in sr.py) contains the following code line "X = X.reindex(columns=self.feature_names_in_)". This results in NaN values in case the column names have spaces, because now it tries to match the column names (with spaces) with the feature names of the model, but in the latter the spaces were replaced by underscores.
We then get the somewhat confusing message "ValueError: Input X contains NaN.", which leads one to believe that there are NaN values in the data even while there are none, they only get introduced by the reindex which can't match the column names.

All this can be avoided of course, once you are aware of the problem and avoid using spaces in the column names from the beginning. However, it might be more consistent, and allow for a better user experience, if the .predict function also replaces spaces in the column names with underscores?

MilesCranmer · 2024-08-02T16:36:13Z

MilesCranmer
Aug 2, 2024
Maintainer

I think this is a bug. I transferred to #690.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace spaces with underscores in column names also for the predict function #689

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Replace spaces with underscores in column names also for the predict function #689

GoldenGoldy Aug 2, 2024

Replies: 1 comment

MilesCranmer Aug 2, 2024 Maintainer

GoldenGoldy
Aug 2, 2024

MilesCranmer
Aug 2, 2024
Maintainer