Skip to content

Commit

Permalink
💄 Fine tune wording
Browse files Browse the repository at this point in the history
  • Loading branch information
falexwolf committed Aug 18, 2023
1 parent 1728a30 commit 2b67628
Showing 1 changed file with 71 additions and 47 deletions.
118 changes: 71 additions & 47 deletions docs/guide/bulk_rna_seq.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -15,25 +15,17 @@
"source": [
"[Nextflow](https://www.nextflow.io/) is a workflow management system used for executing scientific workflows across platforms scalably, portably, and reproducibly.\n",
"\n",
"The workflow [nf-core rnaseq](https://nf-co.re/rnaseq/3.12.0) is arguably one of the most popular pipelines for bulk RNA sequencing using STAR, RSEM, HISAT2 or Salmon with gene/isoform counts and extensive quality control.\n"
"Here, we'll run `nf-core/rnaseq` to process `.fastq` files from bulk RNA sequencing using STAR, RSEM, HISAT2, Salmon with gene/isoform counts and extensive quality control ([reference](https://nf-co.re/rnaseq/3.12.0)).\n",
"\n",
"![](https://raw.githubusercontent.com/nf-core/rnaseq/3.12.0//docs/images/nf-core-rnaseq_metro_map_grey.png)\n"
]
},
{
"cell_type": "markdown",
"id": "531093fd-67af-40fd-b481-fbf68828bcfd",
"id": "b7c8e52d",
"metadata": {},
"source": [
"## Setup"
]
},
{
"cell_type": "markdown",
"id": "f51e29c1",
"metadata": {},
"source": [
"To run this notebook, you need to load a LaminDB instance that has the `bionty` schema mounted.\n",
"\n",
"Here, we’ll create a test instance (skip if you’d like to run it using your instance):"
"Let's create a test instance:"
]
},
{
Expand All @@ -57,8 +49,7 @@
"metadata": {},
"outputs": [],
"source": [
"import lamindb as ln\n",
"from pathlib import Path"
"import lamindb as ln"
]
},
{
Expand All @@ -69,6 +60,14 @@
"## Download test data"
]
},
{
"cell_type": "markdown",
"id": "4f32ae96",
"metadata": {},
"source": [
"Download test data using git:"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand All @@ -88,7 +87,7 @@
"id": "be7f913a",
"metadata": {},
"source": [
"To keep track of the download, let's create a \"Download\" transform and a track a run pointing to the reference url:"
"Track the download:"
]
},
{
Expand All @@ -99,17 +98,16 @@
"outputs": [],
"source": [
"download = ln.Transform(name=\"Download\")\n",
"ln.track(\n",
" download, reference=\"https://github.com/nf-core/test-datasets\", reference_type=\"url\"\n",
")"
"download_url = \"https://github.com/nf-core/test-datasets\"\n",
"ln.track(download, reference=download_url, reference_type=\"url\")"
]
},
{
"cell_type": "markdown",
"id": "26d980c5",
"metadata": {},
"source": [
"Let's register the files we need from the download, they'll automatically be linked against the download run:"
"Register input files - they'll automatically be linked against the download run:"
]
},
{
Expand All @@ -123,18 +121,18 @@
},
"outputs": [],
"source": [
"input_fastqs_file = ln.File.from_dir(\"test-datasets/testdata/GSE110004/\")\n",
"ln.save(input_fastqs_file)\n",
"sample_sheet_file = ln.File(\"test-datasets/samplesheet/v3.10/samplesheet_test.csv\")\n",
"ln.save(sample_sheet_file)"
"sample_sheet = ln.File(\"test-datasets/samplesheet/v3.10/samplesheet_test.csv\")\n",
"ln.save(sample_sheet)\n",
"input_fastqs = ln.File.from_dir(\"test-datasets/testdata/GSE110004/\")\n",
"ln.save(input_fastqs)"
]
},
{
"cell_type": "markdown",
"id": "f915ff7a",
"metadata": {},
"source": [
"Let's visualize data lineage for one of the files:"
"Visualize data lineage for one of the files:"
]
},
{
Expand All @@ -144,23 +142,31 @@
"metadata": {},
"outputs": [],
"source": [
"sample_sheet_file.view_lineage()"
"sample_sheet.view_lineage()"
]
},
{
"cell_type": "markdown",
"id": "ecb68cf2-1188-4f8b-a2ab-01c60d5779b8",
"metadata": {},
"source": [
"## Track the nf-core rnaseq run"
"## Track the Nextflow run"
]
},
{
"cell_type": "markdown",
"id": "3b698d87",
"metadata": {},
"source": [
"(We'd start here if input files were tracked in the cloud with LaminDB rather than downloaded through git.)"
]
},
{
"cell_type": "markdown",
"id": "3e1224fd",
"metadata": {},
"source": [
"Let's now track the Nextflow workflow:"
"Track the Nextflow pipeline & run:"
]
},
{
Expand All @@ -176,7 +182,6 @@
" type=\"pipeline\",\n",
" reference=\"https://github.com/laminlabs/nextflow-lamin-usecases\",\n",
")\n",
"\n",
"ln.track(nextflow_bulkrna)"
]
},
Expand All @@ -185,7 +190,9 @@
"id": "670533a7",
"metadata": {},
"source": [
"If we now stage input files, they'll be tracked as run inputs (if input data is tracked in the cloud and registered in LaminDB, this is where we'd typcically start):"
"If we now stage input files, they'll be tracked as run inputs.\n",
"\n",
"(As data is already locally available in this test case, staging won't download anything.)"
]
},
{
Expand All @@ -199,16 +206,16 @@
},
"outputs": [],
"source": [
"sample_sheet_file.stage()\n",
"[input_fastq.stage() for input_fastq in input_fastqs_file]"
"sample_sheet.stage()\n",
"[input_fastq.stage() for input_fastq in input_fastqs]"
]
},
{
"cell_type": "markdown",
"id": "17f9905e-0a34-4335-b0c4-eb9b598c8eaf",
"metadata": {},
"source": [
"We'll pass the LaminDB run id to the nextflow run, so that we can easily find it from within Nextflow:"
"All data is now in place and we can run the nextflow pipeline:"
]
},
{
Expand All @@ -225,6 +232,14 @@
"!nextflow run nf-core/rnaseq -r 3.11.2 -profile test,docker --outdir rna-seq-results -name {ln.dev.run_context.run.id} -resume"
]
},
{
"cell_type": "markdown",
"id": "58eea7fc",
"metadata": {},
"source": [
"Here, we passed the LaminDB run id to nextflow so that we can query it from within nextflow."
]
},
{
"cell_type": "markdown",
"id": "fb81c953",
Expand All @@ -244,26 +259,27 @@
{
"cell_type": "code",
"execution_count": null,
"id": "6e7b5f1d-b00b-43d3-bc46-83b14144a8ba",
"metadata": {
"tags": []
},
"id": "7140018a-9ef7-4136-a595-37b514c66a81",
"metadata": {},
"outputs": [],
"source": [
"# this would register 240 files, we don't need them here\n",
"# multiqc_results = ln.File.from_dir(\"rna-seq-results/multiqc/\")\n",
"# ln.save(multiqc_results)"
"multiqc_file = ln.File(\"rna-seq-results/multiqc/star_salmon/multiqc_report.html\")\n",
"multiqc_file.save()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7140018a-9ef7-4136-a595-37b514c66a81",
"cell_type": "markdown",
"id": "a588717f",
"metadata": {},
"outputs": [],
"source": [
"multiqc_file = ln.File(\"rna-seq-results/multiqc/star_salmon/multiqc_report.html\")\n",
"multiqc_file.save()"
":::{dropdown} How would I register all QC files?\n",
"\n",
"```python\n",
"multiqc_results = ln.File.from_dir(\"rna-seq-results/multiqc/\")\n",
"ln.save(multiqc_results)\n",
"```\n",
"\n",
":::"
]
},
{
Expand All @@ -285,12 +301,20 @@
"count_matrix.save()"
]
},
{
"cell_type": "markdown",
"id": "dd98074b",
"metadata": {},
"source": [
"## Link biological entities"
]
},
{
"cell_type": "markdown",
"id": "22c88eed-61e0-4d12-96bb-ea4e10f476c0",
"metadata": {},
"source": [
"To make it queryable by biological entities (genes, etc.), we can now proceed with: {doc}`docs:bulkrna`"
"To make the count matrix queryable by biological entities (genes, experimental metadata, etc.), we can now proceed with: {doc}`docs:bulkrna`"
]
},
{
Expand Down

0 comments on commit 2b67628

Please sign in to comment.