Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement mofa2 wrapper for MAE #144

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

RiboRings
Copy link

The current method for MultiAssayExperiments requires some manual work which the end user may not be familiar with. As in this example, after transforming assays you also need to:

  1. remove unnecessary assays (like assay(mae[[1]], "counts") <- NULL)
  2. run create_mofa_from_MultiAssayExperiment
  3. Get default options and modify them if necessary (like model_opts$num_factors <- 5)
  4. run prepare_mofa

As an alternative to performing those 4 steps separately, I propose to include them in a wrapper function, whose result can be directly passed to run_mofa.

@RiboRings
Copy link
Author

The R script of interest is create_mofa_from_mae.R. I left run_mofa out of the wrapper in case the user would like to make additional changes prior to running, but run_mofa could also be included in the wrapper.

@antagomir
Copy link

Could you more clearly show here with an example how something can be done more efficiently with this new contribution? E.g. show parallel examples of current & new way. This will make it easier to see how this will be useful.

@RiboRings
Copy link
Author

RiboRings commented Jan 19, 2024

Sure! The following example is inspired by Chapter 15 of the OMA book.

Import and preprocess dataset:

library(mia)
data("HintikkaXOData", package = "mia")
mae <- HintikkaXOData

# For simplicity, classify all high-fat diets as high-fat, and all the low-fat 
# diets as low-fat diets
colData(mae)$Diet <- ifelse(colData(mae)$Diet == "High-fat" | 
                              colData(mae)$Diet == "High-fat + XOS", 
                            "High-fat", "Low-fat")

# Transforming microbiome data with rclr
mae[[1]] <- transformAssay(mae[[1]], method = "relabundance")
mae[[1]] <- transformAssay(mae[[1]], assay.type = "relabundance", method = "rclr")

# Transforming metabolomic data with log10
mae[[2]] <- transformAssay(mae[[2]], assay.type = "nmr",
                            MARGIN = "samples",
                            method = "log10")

# Transforming biomarker data with z-transform
mae[[3]] <- transformAssay(mae[[3]], assay.type = "signals",
                           MARGIN = "features",
                           method = "z", pseudocount = 1)

Example with wrapper:

prep_model <- mofa2(mae,
                    assay.types = c("rclr", "log10", "z"),
                    groups = "Diet",
                    num_factors = 5,
                    extract_metadata = TRUE)

Same example without wrapper:

# Removing assays no longer needed
assay(mae[[1]], "counts") <- NULL
assay(mae[[1]], "log10") <- NULL
assay(mae[[2]], "nmr") <- NULL
assay(mae[[3]], "signals") <- NULL

# Building our mofa model
model <- create_mofa_from_MultiAssayExperiment(mae,
                                               groups = "Diet", 
                                               extract_metadata = TRUE)

# Specifying custom model options
model_opts <- get_default_model_options(model)
model_opts$num_factors <- 5

# Preparing model
prep_model <- prepare_mofa(
  object = model,
  model_options = model_opts
)

Then, the prep_model object can be passed to run_mofa for training:

trained_model <- run_mofa(prep_model, use_basilisk = TRUE)

@antagomir
Copy link

Hmm seems more straightfwd and easy to implement.

@antagomir
Copy link

Up.

@antagomir
Copy link

@rargelaguet any thoughts on when this PR could be considered..?

@artur-sannikov
Copy link

Hi, I'm also interested in this feature. Are you planning to merge?

@antagomir
Copy link

Hi @rargelaguet - any thoughts on when this PR could be considered..?

@rargelaguet
Copy link
Contributor

Hi @antagomir and @RiboRings, thanks a lot for the pull request and I am very sorry for the delay on my side.

I like the idea of having a general wrapper called [mofa2] that handles all input format and simplifies the process of creating the MOFA object. But the mofa2 function in the pull request only handles the MultiAssayExperiment format, which I think can be a bit misleading. I can extend the mofa2 function in your pull request to handle the other formats, and then proceed to merge. What do you think?

Thanks again for you feedback and contribution!

@RiboRings
Copy link
Author

Thank you for the reply!

It sounds good to me. Feel free to let me know if anything should be changed/improved in the method for MultiAssayExperiment.

@antagomir
Copy link

Sounds good.

It just came to my mind that SingleCellExperiment/TreeSummarizedExperiment objects have alternative experiments (altExp) that are also commonly used to store multiple data tables. If it is welcome, we could add support for that format. It is probably as important in practice as the MultiAssayExperiment.

@rargelaguet
Copy link
Contributor

Hi @antagomir and @RiboRings,

thanks again for your suggestions, very much appreciated.

I have been exploring the option to implement a one-line wrapper function that creates a MOFA2 object from multiple input formats and that it also parses the data/model/training options. This is now implemented a function called mofa2 in the test_improved_mae branch. Could you test if this function works well for you and then I will proceed to merge to the main branch

Note in this function I have incorporated an argument called assays that should solve the issue of removing unnecessary assays in a MultiAssayExperiment objects

@RiboRings
Copy link
Author

Hi @rargelaguet!

Thank you for getting back to this discussion.

For me, the mofa2 wrapper gives the expected output with MAE as we originally discussed. However, I find it might be a bit confusing to have both the arguments assays and assay for the case of a single cell experiment, especially because at most one assay can be selected without error. For example:

library(mia)
data("Tengeler2020", package = "mia")
tse <- Tengeler2020
tse <- transformAssay(tse, method = "relabundance")

# Using assay argument
create_mofa(tse, assay = "counts")
> Creating MOFA object from a SingleCellExperiment object...

# Using assays argument
create_mofa(tse, assays = "counts")
> Creating MOFA object from a SingleCellExperiment object...

# Giving two assays as input
create_mofa(tse, assays = c("counts", "relabundance"))
> Creating MOFA object from a SingleCellExperiment object...
> Error in value[[3L]](cond) :
> assay(<TreeSummarizedExperiment>, i="character", ...)' invalid subscript 'i'
> attempt to extract more than one element

Maybe one option would be to just keep only assays argument and clarify in the vignettes that it should be one assay for SCE.

@antagomir
Copy link

antagomir commented Jun 30, 2024

Great. I made a quick test following the example on the manpage.

library(mia)
data("HintikkaXOData", package = "mia")
mae <- HintikkaXOData
experiments(mae)
mofa_obj <- mofa2(mae, assays = c("counts", "nmr"))

This throws the error:

Error in names(assays) <- names(experiments(mae)) :
'names' attribute [3] must be the same length as the vector [2]

When I drop the third object from MAE it works:

experiments(mae)$biomarkers <- NULL
mofa_obj <- mofa2(mae, assays = c("counts", "nmr"))

Checking data options...
Checking training options...
Checking model options...
Warning messages:
1: In prepare_mofa(object = new("MOFA", data = list(microbiota = list( :
Some view(s) have less than 15 features, MOFA will have little power to to learn meaningful factors for these view(s)....
2: In prepare_mofa(object = new("MOFA", data = list(microbiota = list( :
The total number of samples is very small for learning 15 factors.
Try to reduce the number of factors to obtain meaningful results. It should not exceed ~10.

It might be less error-prone if the user was required to specify assay for all availble experiments, for instance like

experiments(mae)$biomarkers <- NULL
mofa_obj <- mofa2(mae, assays = c("counts", "nmr", NULL))

or

experiments(mae)$biomarkers <- NULL
mofa_obj <- mofa2(mae, assays = c("counts", NULL, "signals"))

using NULL to indicate experiments that should be dropped out from the analysis?

@antagomir
Copy link

antagomir commented Jun 30, 2024

If support for altExps could be added that would be great. The logic could be the same than for MAE (i.e. specifying the vector of assays, one for each altExp). Often altExp is the recommended choice when samples match one-to-one.

I noticed that mofa2 function does not yet have @export tag

@RiboRings
Copy link
Author

Hi @rargelaguet! Any news when this could be merged?

@antagomir
Copy link

antagomir commented Sep 9, 2024

I think we could also help to finalize this by making a PR against the development branch, if useful.

@RiboRings
Copy link
Author

RiboRings commented Oct 25, 2024

Hi @rargelaguet! Any update on this?

@artur-sannikov
Copy link

I second @RiboRings, any updates @rargelaguet?

Comment on lines +82 to +93
.select_assays <- function(mae, assay.types) {
# Give corresponding experiment names to assay.types
names(assay.types) <- names(experiments(mae))
# For every experiment in MAE
for ( exp in names(experiments(mae)) ){
# Keep only selected assay.type from a given experiment
assays(mae[[exp]]) <- list(assay(mae[[exp]], assay.types[[exp]]))
# Update assay names
assayNames(mae[[exp]]) <- assay.types[[exp]]
}
return(mae)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can lead to an errors that might be hard for user to solve:

  1. MAE can have also other datatypes than SEs. The object just has to have rows and columns.
  2. What if the length of assay.types does not match with length of experiments?

@TuomasBorman
Copy link

Something like this might be more explicit with the assay names and experiments

MOFA2 <- function(mae, assay.types, experiments = names(mae), ...){
    if( !(length(experiments) == length(assay.types) ) ) stop(".")
    if( !(all(experiments %in% names(mae) ) || all(is.numeric(experiments) & experiments > 0 & experiments <= length(experiments(mae)) )) ) stop(".")
    mae <- mae[, , experiments]
    # Loop through experiments
    for( i in seq_len(length(mae)) ){
        tse <- mae[[i]]
        assay.type <- assay.types[[i]]
        tse <- .subset_experiment(tse, assay.type, i, ...)
        mae[[i]] <- tse
    }
    return(mae)
}

.subset_experiment <- function(tse, assay.type, i, altexps = NULL, ...){
    if( !(is.null(altexps) || length(altexps) >= i) ) stop(".")
    altexp <- altexps[[i]]
    
    if( !is(tse, "SummarizedExperiment") ) stop(".")
    if( !is.null(altexp) && !is.na(altexp) && !is(tse, "SingleCellExperiment")) stop("Object must be SCE to use altExp.")
    if( !(is.null(altexp) || is.na(altexp) || altexp %in% altExpNames(tse) )  ) stop(".")
    if( !is.null(altexp) && !is.na(altexp) ){
        tse <- altExp(tse, altexp)
    }
    if( !assay.type %in% assayNames(tse) ) stop(".")
    assays(tse) <- assays(tse)[assay.type]
    return(tse)
}
library(mia)
data("HintikkaXOData")

# Prepare data
mae <- HintikkaXOData
altExp(mae[[1]], "asd") <- mae[[1]]

MOFA2(
    mae,
    experiments = c(1, 2),
    assay.types = c("counts", "nmr"),
    altexps = c("asd", NA)
)

Comment on lines +97 to +104
# For every option in a set (data, model, train, ...)
for ( opt in names(default) ){
# If that option is found among arguments
if ( opt %in% names(list(...)) ){
# Replace default with value specified in arguments
default[[opt]] <- list(...)[[opt]]
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Came to my mind while resting, does this work to avoid loops?

user_options <- list(...)
set_options <- intersect(names(default), names(user_options))
default[set_options] <- user_options[set_options]

@artur-sannikov
Copy link

Any updates @rargelaguet?

Also pinging @RiboRings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants