Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support training from AnnCollection #3018

Open
jacobkimmel opened this issue Oct 17, 2024 · 2 comments
Open

Support training from AnnCollection #3018

jacobkimmel opened this issue Oct 17, 2024 · 2 comments

Comments

@jacobkimmel
Copy link
Contributor

jacobkimmel commented Oct 17, 2024

Is your feature request related to a problem? Please describe.

  • Currently, scvi-tools does not support training from the new anndata.experimental.AnnCollection API
  • AnnCollection is great! For teams training models on large datasets, it's a game changer.

Describe the solution you'd like

  • Support for training from AnnCollection objects through the existing API would be great.
  • Ideally, users would not be required to make substantial changes to the existing scvi-tools workflow

Question for maintainers

  • I've implemented a solution that achieves the desired outcome without modifying any existing scvi-tools code.
  • In brief, I wrote a set of wrappers for AnnCollection that mimic the anndata.AnnData API in all the ways scvi-tools expects. We've successfully trained simple models with this solution.
  • In practice, users wrap their collection objects (wrapped_collection = Wrapper(collection)) then proceed with the scvi-tools workflow as normal (setup_anndata(wrapped_collection, ...), etc.).
  • Would the team be interested in incorporating this interface into the main scvi-tools? If so, I can send in a PR. I'd imagine it living as a separate module under .data.
@canergen
Copy link
Member

Hi, thanks for the suggestion. We are currently looking into supporting MappedCollection from lamindb. However, AnnCollection works with setup_anndata while MappedCollection requires Custom Dataloaders. Do you used AnnCollection in disk-backed mode or are the datasets loaded to memory?
Could you provide the code and we can then discuss within scverse with the AnnData developers how to enable this and how stable AnnCollection is? We could have a similar function to e.g. the organize multiome function in multiVI

def organize_multiome_anndatas(
.

@jacobkimmel
Copy link
Contributor Author

However, AnnCollection works with setup_anndata while MappedCollection requires Custom Dataloaders. Do you used AnnCollection in disk-backed mode or are the datasets loaded to memory?

I set it up to use AnnCollection with backed AnnData objects. I don't really see an advantage to using AnnCollection if everything is in memory anyway -- the overhead of anndata.concat(...) is pretty minimal.

Here's a sample snippet of how I created the objects.

# get some data
gdown.download(url="https://drive.google.com/uc?id=1X5N9rOaIqiGxZRyr1fyZ6NpDPeATXoaC",
            output="pbmc_seurat_v4.h5ad", quiet=False)
gdown.download(url="https://drive.google.com/uc?id=1JgaXNwNeoEqX7zJL-jJD3cfXDGurMrq9",
            output="covid_cite.h5ad", quiet=False)

# load in backed
covid = sc.read('covid_cite.h5ad', backed="r")
pbmc = sc.read('pbmc_seurat_v4.h5ad', backed="r")

# make a collection
collection = AnnCollection([covid, pbmc], join_vars="inner", join_obs="inner", label='dataset')

# use the wrapper
wrapped_collection = AnnFaker(collection)

# train a model
scvi.model.SCVI.setup_anndata(
    wrapped_collection,
    layer="test",
    batch_key="dataset",
)

model = scvi.model.SCANVI(wrapped_collection, n_latent=10)

model.train(max_epochs=20)
# training completes, latent matches expectations

Could you provide the code and we can then discuss within scverse with the AnnData developers how to enable this and how stable AnnCollection is?

Sure, here's a minimal implementation in colab: https://colab.research.google.com/drive/1v9B62IfLM8qBfgmvDYnCs3GZaaUvnG26?usp=sharing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants