Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design matrix X contains hetereogenuous types #22

Open
GStechschulte opened this issue Nov 6, 2024 · 0 comments
Open

Design matrix X contains hetereogenuous types #22

GStechschulte opened this issue Nov 6, 2024 · 0 comments
Labels
question Further information is requested

Comments

@GStechschulte
Copy link
Owner

GStechschulte commented Nov 6, 2024

In Rust, we are required to specify the type of the ndarray passed by the Python user, e.g PyReadonlyArray2<f64> where f64 is the type of the ndarray. What happens if the design matrix X contains features of different data types?

In the original PyMC-BART implementation, the Python user can pass the following types

X : PyTensor Variable, Pandas/Polars DataFrame or numpy array
        The covariate matrix.

However, in the underlying Python code we convert X to an numpy array and cast the type to a float, thereby changing the types of all the dimensions (features) to a float.

This can have implications in the underlying Rust code. For example, instead of having to define enums for different split value thresholds, we know a priori that all split value thresholds will be f64 due to the type cast at the Python level.

Furthermore, there are different SplitRules such as ContinuousSplit and OneHotSplit defined for different feature data types. If the type cast on X is performed at the Python level, then at the Rust level, we will need to cast f64 to i32, perform the split rule, and then cast back to f64—which is not that big of a deal.

@GStechschulte GStechschulte added the question Further information is requested label Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant