Skip to content

Commit

Permalink
✨ Enable nb building & new curation (#7)
Browse files Browse the repository at this point in the history
* ✨ Use new cleanup API

Signed-off-by: zethson <[email protected]>

* ✨ Hide some output

Signed-off-by: zethson <[email protected]>

* 🎨 Use proper doc reference syntax

Signed-off-by: zethson <[email protected]>

* 🎨 Without html

Signed-off-by: zethson <[email protected]>

* 🎨 Remove IFrame

Signed-off-by: zethson <[email protected]>

---------

Signed-off-by: zethson <[email protected]>
  • Loading branch information
Zethson authored Aug 15, 2023
1 parent bc133a1 commit 6b3b8d8
Show file tree
Hide file tree
Showing 2 changed files with 29 additions and 48 deletions.
75 changes: 28 additions & 47 deletions docs/guide/bulk_rna_seq.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@
"import lnschema_bionty as lb\n",
"import pandas as pd\n",
"import os\n",
"import anndata as ad\n",
"from pathlib import Path\n",
"\n",
"ln.settings.verbosity = 3 # show hints"
Expand Down Expand Up @@ -95,7 +96,7 @@
"id": "3e1224fd",
"metadata": {},
"source": [
"[nf-core rnaseq](https://nf-co.re/rnaseq/3.12.0) is arguably one of the most popular pipelines for bulk RNA sequencing using STAR, RSEM, HISAT2 or Salmon with gene/isoform counts and extensive quality control.\n",
"The Nextflow pipeline [nf-core rnaseq](https://nf-co.re/rnaseq/3.12.0) is arguably one of the most popular pipelines for bulk RNA sequencing using STAR, RSEM, HISAT2 or Salmon with gene/isoform counts and extensive quality control.\n",
"\n",
"First, we create a new Transform object for our pipeline run."
]
Expand Down Expand Up @@ -130,7 +131,7 @@
"id": "b20dbc7d-0e75-4b06-8f7a-d540bffbdb44",
"metadata": {},
"source": [
"We download the [test data](https://github.com/nf-core/test-datasets/tree/rnaseq3) for the pipeline to track it with Lamin."
"We download the [test data](https://github.com/nf-core/test-datasets/tree/rnaseq3) for the pipeline which we track with Lamin."
]
},
{
Expand All @@ -140,6 +141,7 @@
"metadata": {},
"outputs": [],
"source": [
"%%capture command\n",
"!git clone https://github.com/nf-core/test-datasets --single-branch --branch rnaseq3"
]
},
Expand Down Expand Up @@ -183,6 +185,7 @@
"outputs": [],
"source": [
"run.input_files.set(input_fastqs_file)\n",
"run.reference = \"lamin_rnaseq\"\n",
"run.reference_type = \"nextflow_name\""
]
},
Expand All @@ -201,7 +204,7 @@
"metadata": {},
"outputs": [],
"source": [
"os.environ[\"LAMINDB_RUN_ID\"] = \"lamin_rnaseq\""
"os.environ[\"LAMINDB_RUN_ID\"] = run.reference"
]
},
{
Expand All @@ -216,13 +219,10 @@
"cell_type": "code",
"execution_count": null,
"id": "2219c55e",
"metadata": {
"jupyter": {
"outputs_hidden": true
}
},
"metadata": {},
"outputs": [],
"source": [
"%%capture command\n",
"!nextflow run nf-core/rnaseq -r 3.11.2 -profile test,docker --outdir rna-seq-results -name $LAMINDB_RUN_ID -resume"
]
},
Expand All @@ -231,7 +231,7 @@
"id": "a56e8a22-94dd-413b-989d-f13f59addbe6",
"metadata": {},
"source": [
"As a first step, we ingest all results from the pipeline run."
"As a first step, we ingest all multiqc plots from the pipeline run."
]
},
{
Expand All @@ -256,29 +256,6 @@
"multiqc_file"
]
},
{
"cell_type": "markdown",
"id": "8e3813b0-d2c8-4126-bc96-a0fd68cc8b98",
"metadata": {},
"source": [
"Let's examine the multiqc report:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "48361e66-f50b-45d6-ae6a-1d6c24426d81",
"metadata": {},
"outputs": [],
"source": [
"import shutil\n",
"from IPython.display import IFrame\n",
"\n",
"# Copying file to a directory accessible by the IPython Tornado web server\n",
"shutil.copy(multiqc_file.stage(), \"./multiqc_report.html\")\n",
"IFrame(src=\"multiqc_report.html\", width=1000, height=600)"
]
},
{
"cell_type": "markdown",
"id": "29bae36c-dac6-4314-b85b-f3afd7e47fbd",
Expand All @@ -300,35 +277,39 @@
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c1a58047-0c25-4632-b355-69610c6176f3",
"cell_type": "markdown",
"id": "22c88eed-61e0-4d12-96bb-ea4e10f476c0",
"metadata": {},
"outputs": [],
"source": [
"salmon_gene_counts_table = ln.File.from_df(salmon_gene_counts_table_df, run=run)\n",
"ln.save(salmon_gene_counts_table)"
"We curate the count table analogously to {doc}`docs:/bulkrna`."
]
},
{
"cell_type": "markdown",
"id": "813ae546-3b76-4aaa-ace0-4621eeadd839",
"cell_type": "code",
"execution_count": null,
"id": "5b0ca2da-8bff-4750-972d-3f1c0cdb28e8",
"metadata": {},
"outputs": [],
"source": [
"We further track all genes that are associated with the count table."
"salmon_gene_counts_table_df = salmon_gene_counts_table_df.T\n",
"var = pd.DataFrame(\n",
" {\"gene_name\": salmon_gene_counts_table_df.loc[\"gene_name\"].values},\n",
" index=salmon_gene_counts_table_df.loc[\"gene_id\"],\n",
")\n",
"adata = ad.AnnData(salmon_gene_counts_table_df.iloc[2:].astype(\"float32\"), var=var)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e655b46d-2bee-404e-9ecc-0d219e97d976",
"id": "c1a58047-0c25-4632-b355-69610c6176f3",
"metadata": {},
"outputs": [],
"source": [
"genes = ln.FeatureSet.from_values(\n",
" salmon_gene_counts_table_df[\"gene_name\"], lb.Gene.symbol\n",
"curated_salmon_gene_counts_file = ln.File.from_anndata(\n",
" adata, description=\"Curated bulk RNA counts\", var_ref=lb.Gene.stable_id, run=run\n",
")\n",
"salmon_gene_counts_table.features.add_feature_set(genes, slot=\"rna\")"
"ln.save(curated_salmon_gene_counts_file)"
]
},
{
Expand All @@ -338,7 +319,7 @@
"metadata": {},
"outputs": [],
"source": [
"salmon_gene_counts_table.describe()"
"curated_salmon_gene_counts_file.describe()"
]
},
{
Expand All @@ -354,7 +335,7 @@
"id": "8bba6911-70b6-4a99-a95e-6c9659435af6",
"metadata": {},
"source": [
"Lamin makes it easy to track pipeline executions and to ingest and output files that can subsequently be used for custom downstream analyses. This is complementary to nf-tower."
"Lamin makes it easy to track pipeline executions and to ingest input and output files that can subsequently be used for advanced downstream analyses. This is complementary to nf-tower."
]
}
],
Expand Down
2 changes: 1 addition & 1 deletion docs/guide/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,5 @@ This makes it both easy for the user to understand the documentation, and for th
```{toctree}
:maxdepth: 1
quickstart
bulk_rna_seq
```

0 comments on commit 6b3b8d8

Please sign in to comment.