Interest in co-authoring a R-native scalable single-cell analysis pipeline #1029
Replies: 3 comments
-
@stemangiola, it is encouraging that |
Beta Was this translation helpful? Give feedback.
-
I don't know enough about {targets} to fulfill what you're asking, but I'd love to read the resulting manuscript and use the product! |
Beta Was this translation helpful? Give feedback.
-
Cool project! Unfortunately I'm not be able to help with this as a collaborator, but I am curious about your set up: do you use docker containers? Is it a single container per workflow step, or a monolithic container for the whole thing? |
Beta Was this translation helpful? Give feedback.
-
Hello Team,
At WEHI (Melbourne), we developed a scalable pipeline for single-cell data preprocessing and analysis, currently used in a large-scale COVID-19 multi-omics project. It is currently implemented in
makeflow
(https://cctools.readthedocs.io/en/latest/makeflow/), which is a make-like workflow system that works well in SLURM.We plan to convert this pipeline to R
Targets
to allow researchers to obtain quality (in the analysis) and scalability without leaving R.However, I would much like to have someone from
Targets
on the team to ensure that the implementation, documentation and testing will result in robust, broadly applicable software without the user encountering issues setting up the pipeline in various high-performance computing infrastructures. I know that Tergates does not directly manage the backend schedulers, but I'm sure the Target team has plenty of experience in creating production-worthy pipelines.The involvement I am thinking of is mostly supervision and third-party testing and ensuring we are implementing R
Targets
at its fullest potential and robustness.In the hope of your interest, I show you the dependency structure to give you the feeling of the pipeline and an abstract about the rationale and impact of our endeavour
Abstract:
Single-cell RNA sequencing (scRNA-seq) has been instrumental in understanding cellular heterogeneity at the single-cell level. The Human Cell Atlas (HCA) and other single-cell atlases have produced vast amounts of scRNA-seq data, necessitating scalable and efficient analysis pipelines. Pre-integration analyses that follow best practices are time-consuming and suitable for automation, allowing scientists to focus on the manual curation and integration phases.
While various pipelines exist, many require expertise outside of R. Here, we present a scalable and automated pipeline in R that follows best practices for pre-integration analysis of single-cell data. This pipeline is designed to handle large-scale data from the atlas-level data sets. The pipeline incorporates quality control metrics and ensures reproducibility and accuracy.
Being native to R, using the Targets framework, this pipeline package has a shallow-learning curve and represents a convenient alternative to manual R analyses and job submission to computer clusters.
Beta Was this translation helpful? Give feedback.
All reactions