-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plot - shapr #418
Comments
Hi. I agree this does not look good. I thought we had fixed things like this in #406, but maybe this is an edge case. Please provide a complete runnable example, and we will look into it. |
Hi @martinju! Thank you so much for this! I hope this is enough information - please let me now otherwise: Data_O <- read_csv(synthetic_data)
# Remove rows with missing values
Data_O <- Data_O[complete.cases(Data_O),]
# Handle extremes of target
Data_O <- Data_O %>% filter(actief_in_inst_2022_SCH > 0.60)
Data_O$actief_in_inst_2022_SCH <- sqrt(Data_O$actief_in_inst_2022_SCH)
# Features
check <- as.data.frame(model.matrix(~., data = Data_O[, c(3, 32:36, 38, 55:68)]))
check[] <- lapply(check, as.numeric)
check <- as.matrix(check)
check <- check[, -1]
# Outcome variable
y <- as.numeric(Data_O$actief_in_inst_2022_SCH)
# Split dataset into training (70%) and test (30%) sets
samp <- sample(nrow(Data_O), 0.7 * nrow(Data_O))
Train1 <- check[samp, ]
Train1 <- as.data.frame(Train1)
Test1 <- check[-samp, ]
Test1 <- as.data.frame(Test1)
Y_train <- y[samp]
Y_test <- y[-samp]
# Train Random Forest model
rf.fit <- ranger::ranger(Y_train ~ .,
data = Train1,
mtry = 14,
max.depth = 3,
replace = FALSE,
min.node.size = 40,
sample.fraction = 0.8,
respect.unordered.factors = "order",
importance = "permutation")
# SHAPR
p <- mean(Y_train)
library(shapr)
explanation <- shapr::explain(
rf.fit,
Test1,
Train1,
approach = "gaussian",
phi0 = p
)
library(ggplot2)
library(ggbeeswarm)
# Plot
if (requireNamespace("ggplot2", quietly = TRUE)) {
plot(explanation, plot_type = "scatter")
plot(explanation, plot_type = "beeswarm")
}
[synthetic_data.csv](https://github.com/user-attachments/files/17777841/synthetic_data.csv)
Thanks!
Hanneleer |
Hi @martinju. I was wondering if you have had a chance to look into the issue. I’m still struggling to figure out what might be going wrong and whether there’s something I might be misunderstanding or doing incorrectly. I really appreciate any insights you can share whenever you have time! |
Hi. Just confirming that I have started looking into this. I have fixed the vertical issue by using corral for scaling in ggbeeswarm instead, to then realizing the original issue might be the horizontal scaling. Not sure what is going on there. Will look more into it tomorrow/Friday. |
Hi again #424 should fix the issue if you use plot(explanation, plot_type = "beeswarm", corral = "wrap") Please confirm that it fixes the issue for you. You can now also further control the behavior of the beeswarm plot with the ... arguements passed to |
Thanks a lot @martinju !! It works, really appreciate it!! |
Dear all,
I attempted to plot the Shapley values using the shapr package, but I encountered an issue. Here is the plot I generated:
Has anyone else experienced a similar issue? I don’t think the plot is displaying correctly, especially with the strange vertical lines. Any advice would be greatly appreciated!
Thanks!
The text was updated successfully, but these errors were encountered: