Using intersectRows when different names are used for the same entity #228

llrs · 2017-12-04T17:42:30Z

I have one dataset of 16 S sequencing of intestinal biopsies and another one from the stools which end up into different OTUs. I can find to which taxa does each OTU belong to and in the phylogenetic analysis they are usually merged into a single object (phyloseq, metagenomeSeq) extending the rowData (I assume), or could be stored in rowData because the names of the OTUs (I have OTU_1, OTU_2, ...) aren't really meaningful. What is meaningful is the taxonomy I have in a matrix that is in those objects (phylo-class, MRexperiment-class).

See example output:

MR_i  ## And MR_s is a similar object
## MRexperiment (storageMode: environment)
## assayData: 499 features, 103 samples 
##   element names: counts 
## protocolData: none
## phenoData
##   sampleNames: 5.B009 4.B008 ... 103.B104 (103 total)
##   varLabels: Sample_Code Patient_ID ... ID (12 total)
##   varMetadata: labelDescription
## featureData
##   featureNames: OTU_1 OTU_10 ... OTU_998 (499 total)
##   fvarLabels: Domain Phylum ... Species (7 total)
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:  

(MAE  <- MultiAssayExperiment(experiments = list("intestinal" = MR_i, "stools" = MR_s), colData = meta))
## A MultiAssayExperiment object of 2 listed
##  experiments with user-defined names and respective classes. 
##  Containing an ExperimentList class object of length 2: 
##  [1] intestinal: MRexperiment with 499 rows and 103 columns 
##  [2] stools: MRexperiment with 535 rows and 103 columns 
## Features: 
##  experiments() - obtain the ExperimentList instance 
##  colData() - the primary/phenotype DataFrame 
##  sampleMap() - the sample availability DataFrame 
##  `$`, `[`, `[[` - extract colData columns, subset, or experiment 
##  *Format() - convert into a long or wide DataFrame 
##  assays() - convert ExperimentList to a SimpleList of matrices

When I build one of MAE object with them and I use intersectRows I end up with those with the same name but different taxonomic classification.

intersectRows(MAE)
## A MultiAssayExperiment object of 2 listed
##  experiments with user-defined names and respective classes. 
##  Containing an ExperimentList class object of length 2: 
##  [1] intestinal: MRexperiment with 235 rows and 103 columns 
##  [2] stools: MRexperiment with 235 rows and 103 columns 
## Features: 
##  experiments() - obtain the ExperimentList instance 
##  colData() - the primary/phenotype DataFrame 
##  sampleMap() - the sample availability DataFrame 
##  `$`, `[`, `[[` - extract colData columns, subset, or experiment 
##  *Format() - convert into a long or wide DataFrame 
##  assays() - convert ExperimentList to a SimpleList of matrices
c(head(rownames(b)[[1]]), tail(rownames(b)[[1]]))
## [1] "OTU_1"   "OTU_10"  "OTU_100" "OTU_101" "OTU_102" "OTU_103" "OTU_94"  "OTU_95"  "OTU_96"  "OTU_97"  "OTU_98"  "OTU_99"

Instead the OTU_1073 from intestinal assay and the OTU_1037 from the stools assay are the same species.

Could intersectRows use the rowData (or fvarLabels) of each experiment if available to reorder(?) and select the rows of the experiment?

Also if I have metagenomics and RNA-seq assays in the same object, I would like to tell intersectRows which experiments to subset by row. I could be interested in just one Phylum and relate it to the other assays on the experiment.

The package looks great, thanks for the effort!

The text was updated successfully, but these errors were encountered:

LiNk-NY · 2017-12-07T00:09:06Z

Hi Lluís, @llrs
Thank you for the report.
The assumption here is that all the objects in the ExperimentList support a rowData method.
It would be good to make use of this data perhaps we could add a byRowData argument.
Regards,
Marcel

llrs · 2017-12-07T10:26:00Z

I tried building another object (SummarizedExperiment) with the same data:

MultiAssayExperiment(list("intestinal" = SE_i, "stools" = SE_s))
## A MultiAssayExperiment object of 2 listed
##  experiments with user-defined names and respective classes. 
##  Containing an ExperimentList class object of length 2: 
##  [1] intestinal: SummarizedExperiment with 532 rows and 178 columns 
##  [2] stools: SummarizedExperiment with 568 rows and 152 columns 
## Features: 
##  experiments() - obtain the ExperimentList instance 
##  colData() - the primary/phenotype DataFrame 
##  sampleMap() - the sample availability DataFrame 
##  `$`, `[`, `[[` - extract colData columns, subset, or experiment 
##  *Format() - convert into a long or wide DataFrame 
##  assays() - convert ExperimentList to a SimpleList of matrices
colData(mae)
## DataFrame with 330 rows and 0 columns

But then my problem is how to encode the colData, see this question in the support site.

It might be for another enhancement but using each SummarizedExperiment's colData to create a common colData would simplify the creation of the MAE objects. It would have many caveats but maybe looking for common columns and creating a column for the row names of each sample in the SummarizedExperiment would work.

lwaldron · 2017-12-08T17:15:34Z

@LiNk-NY I wonder if the enhancement should be more general than byRowData - how about function signatures for subsetByRow and subsetByColumn, where the function is something that will be applied to each list element? Something like:

setMethod("subsetByRow", c("ExperimentList", "function"), function(x, y) {
   sublist <- lapply(x, y)
   x <- subsetByRow(x, sublist)
   x
})

This could be used for subsetting by rowData (although with more complicated user syntax than a more specific subsetByrowData), but also for filtering by row means, variance, etc.

LiNk-NY · 2018-04-18T23:17:01Z

I think Martin @mtmorgan would say, you want to define a method for a class rather than a function.
And the desired functionality should either conform to the MultiAssayExperiment API or
extend the class.

(Martin, feel free to chime in)

stale · 2019-01-02T16:33:32Z

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

llrs · 2019-01-02T16:57:54Z

It's been a while but are there some updates?

I'm commenting to prevent the bot closing the issue

LiNk-NY · 2019-01-02T17:57:36Z

Hi Lluís, @llrs

What you describe seems to require a row map structure where subsets can be done
based on a third variable.
We don't have something like that planned in the immediate future although it is
an important problem to tackle. FWIW, we do have helper functions to homogenize rows
across experiments in TCGAutils (see symbolsToRanges and mirToRanges).
Perhaps you can write a function that will do this for you in terms of matching
and re-ordering OTU rows across experiments using a map. You could then use
a list or List or row names to subset.

If you are working with a consistent number of samples ('colnames') and rows,
it may also be worthwhile to look into data structures that make use of a
row graph representation such as LoomExperiment.

Best regards,
Marcel

lwaldron · 2019-01-04T15:40:35Z

Just discussed this with @LiNk-NY. This should provide a workable solution with minimal change:

the subsetByRow() function should provide an i argument that allows you specify which experiments will be subset, with the default being all.

Other helper functions subsetByRowData() and intersectByRowData() would also be useful. These would provide an additional argument for the column name of the rowData to use instead of column names. They would silently do nothing for any experiments that either 1) don't have rowData, or 2) don't have the specified colname in their rowData.

LiNk-NY added the enhancement label Dec 4, 2017

LiNk-NY self-assigned this Dec 4, 2017

stale bot added the outdated label Jan 2, 2019

stale bot removed the outdated label Jan 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using intersectRows when different names are used for the same entity #228

Using intersectRows when different names are used for the same entity #228

llrs commented Dec 4, 2017

LiNk-NY commented Dec 7, 2017 •

edited

Loading

llrs commented Dec 7, 2017

lwaldron commented Dec 8, 2017

LiNk-NY commented Apr 18, 2018

stale bot commented Jan 2, 2019

llrs commented Jan 2, 2019

LiNk-NY commented Jan 2, 2019

lwaldron commented Jan 4, 2019

Using intersectRows when different names are used for the same entity #228

Using intersectRows when different names are used for the same entity #228

Comments

llrs commented Dec 4, 2017

LiNk-NY commented Dec 7, 2017 • edited Loading

llrs commented Dec 7, 2017

lwaldron commented Dec 8, 2017

LiNk-NY commented Apr 18, 2018

stale bot commented Jan 2, 2019

llrs commented Jan 2, 2019

LiNk-NY commented Jan 2, 2019

lwaldron commented Jan 4, 2019

LiNk-NY commented Dec 7, 2017 •

edited

Loading