fusion reading list #4

tdhock · 2024-02-15T18:17:20Z

https://arxiv.org/pdf/1611.00953.pdf L1 on weights + L2 squared fusion between all groups https://cloud.r-project.org/web/packages/fuser/vignettes/subgroup_fusion.html

https://rdrr.io/cran/genlasso/man/fusedlasso.html possible to implement L1 on weights + L1 fusion between pairs of weights in different groups, if we create large matrix X with lots of 0 (maybe tricky to code the graph correctly)

?? Guillaume Obozinski, Ben Taskar, and Michael I Jordan. Joint covariate selec-
tion and joint subspace selection for multiple classification problems. Statistics
and Computing, 20(2):231–252, 2010 ??

try implementing new learner in mlr3 like this https://github.com/mlr-org/mlr3learners/blob/main/R/LearnerRegrKKNN.R
with auto_tuner https://mlr3book.mlr-org.com/chapters/chapter4/hyperparameter_optimization.html#sec-autotuner

@EngineerDanny

tdhock · 2024-02-16T15:51:34Z

please read my blog about mlr3 hyper-parameter auto_tuner https://tdhock.github.io/blog/2024/hyper-parameter-tuning/

EngineerDanny · 2024-02-29T16:56:07Z

I have created a prototype fuser learner here ->
https://github.com/EngineerDanny/necromass/blob/main/LearnerRegrFuser.Rmd
There are some questions that need to be answered for it to be fully compatible with the framework.

EngineerDanny · 2024-02-29T17:00:04Z

Other public microbiome datasets with different groups -> https://github.com/twbattaglia/MicrobeDS

tdhock · 2024-02-29T18:13:46Z

great start for LearnerRegrFuser. I would suggest already sending an issue to the fuser package authors, to tell them you are working on an mlr3 interface, and maybe ask to make sure there is not already one implemented elsewhere?
eventually it would be good to move that code from Rmd to an R package, you can submit to CRAN, maybe named mlr3fuser, similar to https://github.com/tdhock/mlr3resampling

EngineerDanny · 2024-03-19T17:43:46Z

@tdhock I have opened the issue about the fuser for mlr3 here.
I have been trying to fix this issue but so far no success. Maybe you can help with the auto-tuner?

Error: <LearnerRegrFuser:regr.fuser> cannot be trained with TuneToken 
present in hyperparameter: lambda

When I run this:

if(require(future))plan("multisession")
bench.result <- mlr3::benchmark(bench.grid, store_models = TRUE)

This is the instance of the class after applying the tuner on lambda.

<LearnerRegrFuser:regr.fuser>: Fuser
* Model: -
* Parameters: lambda=<RangeTuneToken>, gamma=0.01, tol=9e-05, num.it=5000, intercept=TRUE, scaling=FALSE
* Packages: mlr3, mlr3learners, fuser
* Predict Types:  [response]
* Feature Types: logical, integer, numeric
* Properties: -

UPDATE: I have been able to fix it. It had to do with the fuser object which if I want to apply the auto-tuner, it has to be a mlr3tuning::auto_tuner object instead in the learner list.

EngineerDanny · 2024-03-19T21:10:38Z

@tdhock I have this issue, could you help?

Error: Cannot combine stratification with grouping

This is the R code:

N <- 300
abs.x <- 20
set.seed(1)
x.mat <- matrix(runif(N * 3, -abs.x, abs.x), ncol = 3)  # Ensure X has more than two features
colnames(x.mat) <- paste0("feature", 1:3)

library(data.table)
(task.dt <- data.table(
  x = x.mat,
  y = sin(rowSums(x.mat)) + rnorm(N, sd = 0.5)
))

# Create a grouping variable
task.dt[, sample_group := rep(1:3, length.out = .N)]

# Check the distribution of groups
table(group.tab <- task.dt$sample_group)

# Create a regression task with the grouping variable
reg.task <- mlr3::TaskRegr$new("sin", task.dt, target = "y")
group.task <- reg.task$set_col_roles("sample_group", c("group", "stratum"))

same_other_cv <- mlr3resampling::ResamplingSameOtherCV$new()
same_other_cv$param_set$values$folds <- 2


fuser.learner = lrn("regr.fuser")
#fuser.learner$param_set$values$num.it <- paradox::to_tune(1, 100)
fuser.learner$param_set$values$lambda <- paradox::to_tune(0.001, 1, log=TRUE)
#fuser.learner$param_set$values$gamma <- paradox::to_tune(0.001, 1, log=TRUE)
subtrain.valid.cv <- mlr3::ResamplingCV$new()
subtrain.valid.cv$param_set$values$folds <- 2
grid.search.5 <- mlr3tuning::TunerGridSearch$new()
grid.search.5$param_set$values$resolution <- 5
fuser.learner.tuned = mlr3tuning::auto_tuner(
  tuner = grid.search.5,
  learner = fuser.learner,
  resampling = subtrain.valid.cv,
  measure = mlr3::msr("regr.mse"))
reg.learner.list <- list(
  mlr3::LearnerRegrFeatureless$new(), fuser.learner.tuned)


(same.other.grid <- mlr3::benchmark_grid(
  group.task,
  reg.learner.list,
  same_other_cv))

if(require(future))plan("multisession") 
bench.result <- mlr3::benchmark(same.other.grid, store_models = TRUE)

EngineerDanny · 2024-03-20T18:33:45Z

@tdhock Another issue that I faced while I was isolating just the lrn("regr.fuser") class was all and other works fine but same does not work because it can't train when there is only one group. The exact error was Error in G[i, j] : subscript out of bounds. In the library, G is a k by k (number of groups) matrix which controls the amount of information sharing between the groups.
Essentially I think only all is useful in the fuser package because of the way it works.

My question is how do I specify in mlr3resampling::ResamplingSameOtherCV$new() to run for say only all and other.

tdhock · 2024-03-26T16:55:33Z

"cannot combine stratification with grouping" comes from using mlr3::ResamplingCV which does not support both, even though your task defines both, so to work-around that I had to fork that code and remove the error message, so please try the code in this branch tdhock/mlr3resampling#8

EngineerDanny · 2024-03-28T00:16:01Z

He has an issue here FrankD/fuser#1 that when there is no information sharing because there is only one group it should default to the normal LASSO.

EngineerDanny · 2024-03-28T16:52:06Z

I get the error below when I use the mlr3tuning::auto_tuner.
It works fine when I use the normal LearnerRegrFuser class.

Error in benchmark_grid(self$task, self$learner, resampling, param_values = list(xss)) : 
  A Resampling is instantiated for a task with a different number of observations

tdhock · 2024-03-28T17:23:06Z

remotes::install_github("tdhock/mlr3resampling@cv-ignore-group")

EngineerDanny · 2024-04-11T16:24:57Z

This is the first run of the fuser on the necromass data.
fuser does not perform better than featureless in some cases.
I think there is a bug in the code, maybe the implementation of fuser?.
I am yet to find out the issue.

EngineerDanny · 2024-04-18T16:00:36Z

To address the consistent error issue above,

fixed the hyper-parameter range specifically the issue was with the tol.
made sure to use cv.glmnet instead of just the glmnet which I was using earlier as the fallback learner.

necromass

moving_pictures (publicly available data set)

tdhock · 2024-04-18T18:14:23Z

by the way I updated mlr3resampling on CRAN, you may want to update and read https://cloud.r-project.org/web/packages/mlr3resampling/vignettes/ResamplingSameOtherSizesCV.html

especially the section "Use with auto_tuner on a task with stratification and grouping" which shows how you can use ResamplingSameOtherSizesCV to work-around the "cannot combine stratification with grouping" error
this is a major update with breaking changes, so if you want to use it, you would have to change column role "group" in old code to new column role "subset". I realized that "group" has a different meaning in the other mlr3 classes (rows with the same group ID will always stay together, and will not be split into different train/test sets), so I created this new class that can handle train/test on different subsets, as well as groups of observations

EngineerDanny · 2024-05-01T18:14:04Z

@tdhock I cant seem to find my way around this fuser algorithm. I have fixed the issue with the indexing but it still doesn't seem to do better than featureless in most cases.
Sometimes, it does extremely better (Second figure).
I used autotuner with fuser, specifically with RandomSearch because the GridSearch was taking very long:

fuser.learner =  LearnerRegrFuser$new()
fuser.learner$param_set$values$lambda <- paradox::to_tune(0.001, 1, log=TRUE)
fuser.learner$param_set$values$gamma <- paradox::to_tune(0.001, 1, log=TRUE)
fuser.learner$param_set$values$tol <- paradox::to_tune(1e-10, 1e-2, log=TRUE)

These are the results on three public datasets. What do you think about this:

moving_pictures

hmpv13

hmpv35

Still investigating this issue, could be that there is an issue with the actual fuser implementation.
Maybe I am not using a larger range of hyper-parameters for the cross-validation.

tdhock · 2024-05-01T20:04:48Z

you should take default value for tol (not tuned)
lambda and gamma ranges look reasonable but you should check to see if you are selecting the largest or smallest values. and maybe compare to the lambda/penalty value that glmnet selects.

EngineerDanny · 2024-05-02T16:57:28Z

This is the truth response graph for the boston housing dataset.

Fuser

CVGlmnet

EngineerDanny self-assigned this Feb 16, 2024

EngineerDanny closed this as completed May 1, 2024

EngineerDanny reopened this May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fusion reading list #4

fusion reading list #4

tdhock commented Feb 15, 2024 •

edited

Loading

tdhock commented Feb 16, 2024

EngineerDanny commented Feb 29, 2024

EngineerDanny commented Feb 29, 2024

tdhock commented Feb 29, 2024

EngineerDanny commented Mar 19, 2024 •

edited

Loading

EngineerDanny commented Mar 19, 2024 •

edited

Loading

EngineerDanny commented Mar 20, 2024

tdhock commented Mar 26, 2024

EngineerDanny commented Mar 28, 2024

EngineerDanny commented Mar 28, 2024

tdhock commented Mar 28, 2024

EngineerDanny commented Apr 11, 2024

EngineerDanny commented Apr 18, 2024

tdhock commented Apr 18, 2024

EngineerDanny commented May 1, 2024

tdhock commented May 1, 2024

EngineerDanny commented May 2, 2024

fusion reading list #4

fusion reading list #4

Comments

tdhock commented Feb 15, 2024 • edited Loading

tdhock commented Feb 16, 2024

EngineerDanny commented Feb 29, 2024

EngineerDanny commented Feb 29, 2024

tdhock commented Feb 29, 2024

EngineerDanny commented Mar 19, 2024 • edited Loading

EngineerDanny commented Mar 19, 2024 • edited Loading

EngineerDanny commented Mar 20, 2024

tdhock commented Mar 26, 2024

EngineerDanny commented Mar 28, 2024

EngineerDanny commented Mar 28, 2024

tdhock commented Mar 28, 2024

EngineerDanny commented Apr 11, 2024

EngineerDanny commented Apr 18, 2024

tdhock commented Apr 18, 2024

EngineerDanny commented May 1, 2024

tdhock commented May 1, 2024

EngineerDanny commented May 2, 2024

tdhock commented Feb 15, 2024 •

edited

Loading

EngineerDanny commented Mar 19, 2024 •

edited

Loading

EngineerDanny commented Mar 19, 2024 •

edited

Loading