You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the number of predictors is large, model.matrix quickly blows up memory when using the formula interface. For example, I get memory errors when trying to fit a model with 30k predictors and 100 GB of RAM.
A simple solution is to use the fastDummies R package to convert factors/character features to numeric dummy variables. This function is much more memory efficient, i.e. I am running my same model on a computer with 15 GB RAM (when printing gc(), it says at most 2 GB of RAM was used).
Here's an example of how to use fastDummies to setup the x matrix:
suppressPackageStartupMessages(library(fastDummies));
data(iris);
x<-iris;
head(x);
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species#> 1 5.1 3.5 1.4 0.2 setosa#> 2 4.9 3.0 1.4 0.2 setosa#> 3 4.7 3.2 1.3 0.2 setosa#> 4 4.6 3.1 1.5 0.2 setosa#> 5 5.0 3.6 1.4 0.2 setosa#> 6 5.4 3.9 1.7 0.4 setosax.matrix<- as.matrix(fastDummies::dummy_columns(
.data=x,
remove_first_dummy=TRUE, # use K-1 dummy variables for a factor with K levelsremove_selected_columns=TRUE# remove the original factor variables, otherwise it still keeps them by default
));
rownames(x.matrix) <- rownames(x); # if patient ids are rownames, need to readd here.
head(x.matrix)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species_versicolor#> 1 5.1 3.5 1.4 0.2 0#> 2 4.9 3.0 1.4 0.2 0#> 3 4.7 3.2 1.3 0.2 0#> 4 4.6 3.1 1.5 0.2 0#> 5 5.0 3.6 1.4 0.2 0#> 6 5.4 3.9 1.7 0.4 0#> Species_virginica#> 1 0#> 2 0#> 3 0#> 4 0#> 5 0#> 6 0
When the number of predictors is large,
model.matrix
quickly blows up memory when using the formula interface. For example, I get memory errors when trying to fit a model with 30k predictors and 100 GB of RAM.A simple solution is to use the fastDummies R package to convert factors/character features to numeric dummy variables. This function is much more memory efficient, i.e. I am running my same model on a computer with 15 GB RAM (when printing
gc()
, it says at most 2 GB of RAM was used).Here's an example of how to use fastDummies to setup the x matrix:
Created on 2024-07-25 with reprex v2.0.2
The text was updated successfully, but these errors were encountered: