Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIG: new data structures for Bioconductor #8

Open
lwaldron opened this issue Jun 2, 2018 · 24 comments
Open

SIG: new data structures for Bioconductor #8

lwaldron opened this issue Jun 2, 2018 · 24 comments

Comments

@lwaldron
Copy link
Contributor

lwaldron commented Jun 2, 2018

From @lwaldron on October 22, 2017 4:26

This SIG will discuss recent and needed Bioconductor data classes. Some recent or in-testing data classes to discuss are:

  • MultiAssayExperiment (for "gluing" different types of assays together)
  • RaggedExperiment (for copy number, mutations, or other data represented by different genomic ranges for each sample)
  • restfulSE::RESTfulSummarizedExperiment, restfulSE::BQSummarizedExperiment for remote storage + local interactive analysis of very large datasets

One presently identified need is a Bioconductor class for representing the drug sensitivity data from pharmacogenomics studies such as the Cancer Cell Line Encyclopedia (CCLE) and NCI-60. These studies perform standard -omics assays, but also dose-response experiments where cell lines are subjected to varying doses of each of numerous compounds. Responses are measured as cell viability, and the resulting dose-response curves are summarized using measures such as LC-50. The full dose-response data are a 3-D array (dose x time x cell line), which should be stored in addition to summary measure matrices (e.g. LC-50 concentration x cell line) The PharmacoGx Bioconductor package from the @bhaibeka lab provides numerous curated pharmacogenomics datasets as rich PharmacoSet objects, but these lack the flexibility and novel data storage models that would be available using a SummarizedExperiment-derived object for sensitivity data contained along with -omics assays within a MultiAssayExperiment. Therefore a desired outcome from this SIG is a draft class definition for cell line drug sensitivity data extending from SummarizedExperiment. This would accomplish both a needed new data class, and experience for those participating in extending existing core data structures to novel data types.

Topic leader: Levi Waldron @lwaldron
Scribe: Vincent Carey @vjcitn (Vince can I volunteer you?)

Any interested participants are invited to use the issue to ask questions, suggest other relevant topics for discussion, and/or express their interest in participating.

Copied from original issue: Bioconductor/EuroBioc2017#5

@lwaldron
Copy link
Contributor Author

lwaldron commented Jun 2, 2018

From @vjcitn on October 22, 2017 4:45

Levi, I'll try to function as a scribe.

On Sun, Oct 22, 2017 at 12:26 AM, Levi Waldron [email protected]
wrote:

This SIG will discuss recent and needed Bioconductor data classes. Some
recent or in-testing data classes to discuss are:

  • MultiAssayExperiment (for "gluing" different types of assays
    together)
  • RaggedExperiment (for copy number, mutations, or other data
    represented by different genomic ranges for each sample)
  • restfulSE::RESTfulSummarizedExperiment, restfulSE::
    BQSummarizedExperiment for remote storage + local interactive analysis
    of very large datasets

One presently identified need is a Bioconductor class for representing the
drug sensitivity data from pharmacogenomics studies. Such studies, such as
the Cancer Cell Line Encyclopedia (CCLE) and NCI-60, perform standard
-omics assays, but also dose-response experiments where cell lines are
subjected to varying doses of each of numerous compounds, and the responses
are measured as cell viability. The resulting dose-response curves are then
summarized using measures such as LC-50. The PharmacoGx
https://bioconductor.org/packages/PharmacoGx/ Bioconductor package from
the @bhaibeka https://github.com/bhaibeka lab provides numerous curated
pharmacogenomics datasets as rich PharmacoSet objects, but these lack the
flexibility and novel data storage models that would be available using a
SummarizedExperiment-derived object for sensitivity data contained along
with -omics assays within a MultiAssayExperiment. Therefore a desired
outcome from this SIG is a draft class definition for cell line drug
sensitivity data extending from SummarizedExperiment. This would
accomplish both a needed new data class, and experience for those
participating in extending existing core data structures to novel data
types.

Topic leader: Levi Waldron @lwaldron https://github.com/lwaldron
Scribe: Vincent Carey @vjcitn https://github.com/vjcitn (Vince can I
volunteer you?)

Any interested participants are invited to use the issue to ask questions,
suggest other relevant topics for discussion, and/or express their interest
in participating.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
Bioconductor/EuroBioc2017#5, or mute the
thread
https://github.com/notifications/unsubscribe-auth/AEaOwv1RZzOxnM9Z1_COPtKH4nAJ5y5Fks5susP1gaJpZM4QBz8Z
.

@lwaldron
Copy link
Contributor Author

lwaldron commented Jun 2, 2018

From @bhaibeka on October 23, 2017 11:41

I strongly support this initiative of course. Many of these datasets are now available (see picture) and although PharmacoGx::PharmacoSet objects do their job, they do not deal efficiently with data access and storage.

@p-smirnov has deep experience with these pharmacogenomics datasets and would be interested in contributing.

Available datasets:
screen shot 2017-10-23 at 7 39 43 am

@lwaldron
Copy link
Contributor Author

lwaldron commented Jun 2, 2018

From @vjcitn on October 23, 2017 13:4

Hi Ben -- where is that image from? Public domain? I am working on a
proposal
that might benefit from the elegance. Thanks, Vince

On Mon, Oct 23, 2017 at 7:41 AM, Benjamin Haibe-Kains <
[email protected]> wrote:

I strongly support this initiative of course. Many of these datasets are
now available (see picture) and although PharmacoGx::PharmacoSet objects do
their job, they do not deal efficiently with data access and storage.

@p-smirnov https://github.com/p-smirnov has deep experience with these
pharmacogenomics datasets and would be interested in contributing.

Available datasets:
[image: screen shot 2017-10-23 at 7 39 43 am]
https://user-images.githubusercontent.com/594954/31887189-65623074-b7c5-11e7-95fb-f34814f0035c.png


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
Bioconductor/EuroBioc2017#5 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AEaOwtW_MRUcnj4WONFgTsf-xCN3rlDOks5svHuDgaJpZM4QBz8Z
.

@lwaldron
Copy link
Contributor Author

lwaldron commented Jun 2, 2018

From @bhaibeka on October 24, 2017 20:5

I drew the picture from scratch, feel free to reuse. For more, you can borrow any slides from here: https://www.pmgenomics.ca/bhklab/research/presentations

@lwaldron
Copy link
Contributor Author

lwaldron commented Jun 2, 2018

From @p-smirnov on November 7, 2017 1:47

I would like to attend, just waiting for confirmation from the conference about registration. It would be great for PharmacoGx to leverage the MultiAssayExperiment class for data storage and would better integrate our package into Bioconductor.

@lwaldron
Copy link
Contributor Author

lwaldron commented Jun 2, 2018

From @lgatto on November 7, 2017 9:18

@p-smirnov haven't you received your invitation email yet?

@lwaldron
Copy link
Contributor Author

lwaldron commented Jun 2, 2018

From @p-smirnov on November 7, 2017 13:5

@lgatto I searched through my email and found it last from last Friday. It was sorted out of my inbox so I missed seeing it.

@lwaldron
Copy link
Contributor Author

lwaldron commented Jun 2, 2018

Great you can come @p-smirnov, I'm really looking forward to it!

@lwaldron
Copy link
Contributor Author

lwaldron commented Jun 2, 2018

Initial agenda. Understood now from Laurent's comment below that we have four hours, 1-5pm. So here is a tentative schedule - I've scheduled more time for the pharmacogenomics component only because I know the measurable outcome to hopefully come from it, but certainly don't mind balancing if the VariantExperiment discussion needs more time.

  1. (1-4pm) @p-smirnov / @lwaldron: representing pharmacogenomics data
    • how to represent dose x response x cell line data based on SummarizedExperiment. Want 1+ assays for summary measures like LC-50 (rows= compounds, columns=cell lines), but also want to store the complete dose-response data (for example, one assay with rows=compounds, columns=cell lines, 3rd dimension = dose, and a second assay with the third dimension = response?). What additional requirements would be added to SummarizedExperiment?
    • any additional requirements to MultiAssayExperiment to represent complete pharmacogenomics experiments as currently done by PharmacoSet in PharmacoGX
    • We should try to have a draft implementation with coerced PharmacoSet object by the end of the meeting
  2. (4-5pm) @rcastelo: on-disk data structures for genome-scale variant analysis. essentially, but not limited to, discuss about a recent project at https://github.com/Bioconductor/VariantExperiment that implements a GDS backend for 'SummarizedExperiment' objects and how this could be extended to other objects such as 'VRanges' to store and access large variant genotype and annotation data sets.

Outcomes:

  • a draft pharmacogenomics class implementation with an existing PharmacoSet object coerced to this class

@lwaldron
Copy link
Contributor Author

lwaldron commented Jun 2, 2018

From @lgatto on November 28, 2017 18:25

From how I read the schedule, I think we have two hours? Or can we extend this to use two sessions?

Yes, it's meant to from 1pm to 5 pm. We will be serving coffee at 3pm, but people are free to grab a cup and continue as they see fit.

@lwaldron
Copy link
Contributor Author

lwaldron commented Jun 2, 2018

From @lawremi on November 28, 2017 18:45

I feel bad that I can't make it to this SIG. I guess it's not feasible for me to attend remotely? Looking forward to the minutes.

@lwaldron
Copy link
Contributor Author

lwaldron commented Jun 2, 2018

From @lgatto on November 28, 2017 20:59

@lawremi @lawremi - nothing stops you form using hangouts and a google doc/etherpad for remote participation.

@lwaldron
Copy link
Contributor Author

lwaldron commented Jun 2, 2018

@lawremi you're willing to attend any of it between 1-5pm UK time (5-9am west coast time?), we'd certainly appreciate your presence.

@lwaldron
Copy link
Contributor Author

lwaldron commented Jun 2, 2018

From @lawremi on November 29, 2017 3:50

Unfortunately I'll be in Australia and I think that's 12-4 AM so probably not. I'll at least be trying to sleep ;)

@lwaldron
Copy link
Contributor Author

lwaldron commented Jun 2, 2018

A gist providing some dose-viability data to play with.
PDF output

source("https://gist.githubusercontent.com/lwaldron/ab3e6ab3ddc8815a01e3c46969aad130/raw/b85ab86c5b9ec3de1ec07d9dd33b1d01400edc29/FIMMdose-viability.R")
pset2se(fimm)

@lwaldron
Copy link
Contributor Author

lwaldron commented Jun 2, 2018

And some slides for pharmacogenomics and for on-disk data structures

@lwaldron
Copy link
Contributor Author

lwaldron commented Jun 2, 2018

From @federicomarini on December 4, 2017 14:17

Here's the link for the benchmarking work by Mike Smith we touched upon:

http://www.msmith.de/2017/11/17/10x-1/

@lwaldron
Copy link
Contributor Author

lwaldron commented Jun 2, 2018

From @vjcitn on December 4, 2017 14:20

I had volunteered to be a scribe for this meeting. Very rudimentary notes
are at

https://docs.google.com/document/d/15FWsVlQEGUTn5ys0GRL56ixOHzG04J7kq1IPMiRQyKM/edit?usp=sharing

On Mon, Dec 4, 2017 at 2:17 PM, Federico Marini [email protected]
wrote:

Here's the link for the benchmarking work by Mike Smith we touched upon:

http://www.msmith.de/2017/11/17/10x-1/


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
Bioconductor/EuroBioc2017#5 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AEaOwiUjMV6ShiHWxpOPyDdbxG-XGRkEks5s8_7ugaJpZM4QBz8Z
.

@lwaldron
Copy link
Contributor Author

lwaldron commented Jun 2, 2018

@bhaibeka @p-smirnov @vjcitn want to continue this BOF at Bioc2018 in July?

@vjcitn
Copy link

vjcitn commented Jun 2, 2018

Levi, it is a good idea to try to continue this SIG. Thanks

@p-smirnov
Copy link

I have a prototype of a "long" format way of storing drug sensitivity data I would like some feedback on.

@bhaibeka would you be able to attend?

@lwaldron
Copy link
Contributor Author

You're on the agenda @p-smirnov . Anyone else, just let me know, either in advance or during the session...

@lwaldron
Copy link
Contributor Author

Here are the (more or less) slides I presented: https://www.slideshare.net/LeviWaldron/why-reuse-core-classes

And the code I used to demo exploring the inheritance and methods of some classes:

library(SummarizedExperiment)
extends("SummarizedExperiment")
showClass("SummarizedExperiment")
methods(class="SummarizedExperiment")
setdiff(methods(class="SummarizedExperiment"), methods(class="Vector"))
library(SingleCellExperiment)
extends("SingleCellExperiment")
setdiff(methods(class="SingleCellExperiment"), methods(class="RangedSummarizedExperiment"))

library(MultiAssayExperiment)
browseURL("https://github.com/waldronlab/MultiAssayExperiment/blob/ce42a3e1508b32b8ddd783d8f0ad788d364e28e7/R/MultiAssayExperiment-class.R#L105")
extends("MultiAssayExperiment")
showClass("MultiAssayExperiment")
methods(class="MultiAssayExperiment")

@lwaldron
Copy link
Contributor Author

And the repo @p-smirnov posted with some code to define a demo object: https://github.com/bhklab/longArray

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants