Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Enable nb building & new curation #7

Merged
merged 5 commits into from
Aug 15, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 28 additions & 47 deletions docs/guide/bulk_rna_seq.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@
"import lnschema_bionty as lb\n",
"import pandas as pd\n",
"import os\n",
"import anndata as ad\n",
"from pathlib import Path\n",
"\n",
"ln.settings.verbosity = 3 # show hints"
Expand Down Expand Up @@ -95,7 +96,7 @@
"id": "3e1224fd",
"metadata": {},
"source": [
"[nf-core rnaseq](https://nf-co.re/rnaseq/3.12.0) is arguably one of the most popular pipelines for bulk RNA sequencing using STAR, RSEM, HISAT2 or Salmon with gene/isoform counts and extensive quality control.\n",
"The Nextflow pipeline [nf-core rnaseq](https://nf-co.re/rnaseq/3.12.0) is arguably one of the most popular pipelines for bulk RNA sequencing using STAR, RSEM, HISAT2 or Salmon with gene/isoform counts and extensive quality control.\n",
"\n",
"First, we create a new Transform object for our pipeline run."
]
Expand Down Expand Up @@ -130,7 +131,7 @@
"id": "b20dbc7d-0e75-4b06-8f7a-d540bffbdb44",
"metadata": {},
"source": [
"We download the [test data](https://github.com/nf-core/test-datasets/tree/rnaseq3) for the pipeline to track it with Lamin."
"We download the [test data](https://github.com/nf-core/test-datasets/tree/rnaseq3) for the pipeline which we track with Lamin."
]
},
{
Expand All @@ -140,6 +141,7 @@
"metadata": {},
"outputs": [],
"source": [
"%%capture command\n",
"!git clone https://github.com/nf-core/test-datasets --single-branch --branch rnaseq3"
]
},
Expand Down Expand Up @@ -183,6 +185,7 @@
"outputs": [],
"source": [
"run.input_files.set(input_fastqs_file)\n",
"run.reference = \"lamin_rnaseq\"\n",
"run.reference_type = \"nextflow_name\""
]
},
Expand All @@ -201,7 +204,7 @@
"metadata": {},
"outputs": [],
"source": [
"os.environ[\"LAMINDB_RUN_ID\"] = \"lamin_rnaseq\""
"os.environ[\"LAMINDB_RUN_ID\"] = run.reference"
]
},
{
Expand All @@ -216,13 +219,10 @@
"cell_type": "code",
"execution_count": null,
"id": "2219c55e",
"metadata": {
"jupyter": {
"outputs_hidden": true
}
},
"metadata": {},
"outputs": [],
"source": [
"%%capture command\n",
"!nextflow run nf-core/rnaseq -r 3.11.2 -profile test,docker --outdir rna-seq-results -name $LAMINDB_RUN_ID -resume"
]
},
Expand All @@ -231,7 +231,7 @@
"id": "a56e8a22-94dd-413b-989d-f13f59addbe6",
"metadata": {},
"source": [
"As a first step, we ingest all results from the pipeline run."
"As a first step, we ingest all multiqc plots from the pipeline run."
]
},
{
Expand All @@ -256,29 +256,6 @@
"multiqc_file"
]
},
{
"cell_type": "markdown",
"id": "8e3813b0-d2c8-4126-bc96-a0fd68cc8b98",
"metadata": {},
"source": [
"Let's examine the multiqc report:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "48361e66-f50b-45d6-ae6a-1d6c24426d81",
"metadata": {},
"outputs": [],
"source": [
"import shutil\n",
"from IPython.display import IFrame\n",
"\n",
"# Copying file to a directory accessible by the IPython Tornado web server\n",
"shutil.copy(multiqc_file.stage(), \"./multiqc_report.html\")\n",
"IFrame(src=\"multiqc_report.html\", width=1000, height=600)"
]
},
{
"cell_type": "markdown",
"id": "29bae36c-dac6-4314-b85b-f3afd7e47fbd",
Expand All @@ -300,35 +277,39 @@
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c1a58047-0c25-4632-b355-69610c6176f3",
"cell_type": "markdown",
"id": "22c88eed-61e0-4d12-96bb-ea4e10f476c0",
"metadata": {},
"outputs": [],
"source": [
"salmon_gene_counts_table = ln.File.from_df(salmon_gene_counts_table_df, run=run)\n",
"ln.save(salmon_gene_counts_table)"
"We curate the count table analogously to {doc}`docs:/bulkrna`."
]
},
{
"cell_type": "markdown",
"id": "813ae546-3b76-4aaa-ace0-4621eeadd839",
"cell_type": "code",
"execution_count": null,
"id": "5b0ca2da-8bff-4750-972d-3f1c0cdb28e8",
"metadata": {},
"outputs": [],
"source": [
"We further track all genes that are associated with the count table."
"salmon_gene_counts_table_df = salmon_gene_counts_table_df.T\n",
"var = pd.DataFrame(\n",
" {\"gene_name\": salmon_gene_counts_table_df.loc[\"gene_name\"].values},\n",
" index=salmon_gene_counts_table_df.loc[\"gene_id\"],\n",
")\n",
"adata = ad.AnnData(salmon_gene_counts_table_df.iloc[2:].astype(\"float32\"), var=var)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e655b46d-2bee-404e-9ecc-0d219e97d976",
"id": "c1a58047-0c25-4632-b355-69610c6176f3",
"metadata": {},
"outputs": [],
"source": [
"genes = ln.FeatureSet.from_values(\n",
" salmon_gene_counts_table_df[\"gene_name\"], lb.Gene.symbol\n",
"curated_salmon_gene_counts_file = ln.File.from_anndata(\n",
" adata, description=\"Curated bulk RNA counts\", var_ref=lb.Gene.stable_id, run=run\n",
")\n",
"salmon_gene_counts_table.features.add_feature_set(genes, slot=\"rna\")"
"ln.save(curated_salmon_gene_counts_file)"
]
},
{
Expand All @@ -338,7 +319,7 @@
"metadata": {},
"outputs": [],
"source": [
"salmon_gene_counts_table.describe()"
"curated_salmon_gene_counts_file.describe()"
]
},
{
Expand All @@ -354,7 +335,7 @@
"id": "8bba6911-70b6-4a99-a95e-6c9659435af6",
"metadata": {},
"source": [
"Lamin makes it easy to track pipeline executions and to ingest and output files that can subsequently be used for custom downstream analyses. This is complementary to nf-tower."
"Lamin makes it easy to track pipeline executions and to ingest input and output files that can subsequently be used for advanced downstream analyses. This is complementary to nf-tower."
]
}
],
Expand Down
2 changes: 1 addition & 1 deletion docs/guide/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,5 @@ This makes it both easy for the user to understand the documentation, and for th
```{toctree}
:maxdepth: 1

quickstart
bulk_rna_seq
```