Interest in co-authoring a R-native scalable single-cell analysis pipeline #1029

stemangiola · 2023-03-03T00:55:54Z

stemangiola
Mar 3, 2023

Hello Team,

At WEHI (Melbourne), we developed a scalable pipeline for single-cell data preprocessing and analysis, currently used in a large-scale COVID-19 multi-omics project. It is currently implemented in makeflow (https://cctools.readthedocs.io/en/latest/makeflow/), which is a make-like workflow system that works well in SLURM.

We plan to convert this pipeline to R Targets to allow researchers to obtain quality (in the analysis) and scalability without leaving R.

However, I would much like to have someone from Targets on the team to ensure that the implementation, documentation and testing will result in robust, broadly applicable software without the user encountering issues setting up the pipeline in various high-performance computing infrastructures. I know that Tergates does not directly manage the backend schedulers, but I'm sure the Target team has plenty of experience in creating production-worthy pipelines.

The involvement I am thinking of is mostly supervision and third-party testing and ensuring we are implementing R Targets at its fullest potential and robustness.

In the hope of your interest, I show you the dependency structure to give you the feeling of the pipeline and an abstract about the rationale and impact of our endeavour

Abstract:

Single-cell RNA sequencing (scRNA-seq) has been instrumental in understanding cellular heterogeneity at the single-cell level. The Human Cell Atlas (HCA) and other single-cell atlases have produced vast amounts of scRNA-seq data, necessitating scalable and efficient analysis pipelines. Pre-integration analyses that follow best practices are time-consuming and suitable for automation, allowing scientists to focus on the manual curation and integration phases.

While various pipelines exist, many require expertise outside of R. Here, we present a scalable and automated pipeline in R that follows best practices for pre-integration analysis of single-cell data. This pipeline is designed to handle large-scale data from the atlas-level data sets. The pipeline incorporates quality control metrics and ensures reproducibility and accuracy.

Being native to R, using the Targets framework, this pipeline package has a shallow-learning curve and represents a convenient alternative to manual R analyses and job submission to computer clusters.

wlandau · 2023-03-03T21:29:19Z

wlandau
Mar 3, 2023
Maintainer

@stemangiola, it is encouraging that targets may play a role in this project. However, I do not have the capacity for this type of comprehensive review and open-ended involvement. However, if you encounter specific issues or odd behaviors as you implement your pipeline, I can help troubleshoot obstacles on a limited-scope basis as described at https://books.ropensci.org/targets/help.html. Since you are interested in the high-performance computing aspects of targets, I recommend you have a look at https://books.ropensci.org/targets/performance.html and https://books.ropensci.org/targets/hpc.html. In addition, the talks by @Aariq and @joelnitta at https://ropensci.org/commcalls/jan2023-targets/ are excellent and could definitely help you.

0 replies

rcorty · 2023-04-29T22:43:10Z

rcorty
Apr 29, 2023

I don't know enough about {targets} to fulfill what you're asking, but I'd love to read the resulting manuscript and use the product!

0 replies

joelnitta · 2023-05-01T04:53:54Z

joelnitta
May 1, 2023

Cool project! Unfortunately I'm not be able to help with this as a collaborator, but I am curious about your set up: do you use docker containers? Is it a single container per workflow step, or a monolithic container for the whole thing?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interest in co-authoring a R-native scalable single-cell analysis pipeline #1029

{{title}}

Replies: 3 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Interest in co-authoring a R-native scalable single-cell analysis pipeline #1029

stemangiola Mar 3, 2023

Replies: 3 comments

wlandau Mar 3, 2023 Maintainer

rcorty Apr 29, 2023

joelnitta May 1, 2023

stemangiola
Mar 3, 2023

wlandau
Mar 3, 2023
Maintainer

rcorty
Apr 29, 2023

joelnitta
May 1, 2023