💄 Fine tune wording

laminlabs · Aug 18, 2023 · 2b67628 · 2b67628
1 parent 1728a30
commit 2b67628
Showing 1 changed file with 71 additions and 47 deletions.
diff --git a/docs/guide/bulk_rna_seq.ipynb b/docs/guide/bulk_rna_seq.ipynb
@@ -15,25 +15,17 @@
    "source": [
     "[Nextflow](https://www.nextflow.io/) is a workflow management system used for executing scientific workflows across platforms scalably, portably, and reproducibly.\n",
     "\n",
-    "The workflow [nf-core rnaseq](https://nf-co.re/rnaseq/3.12.0) is arguably one of the most popular pipelines for bulk RNA sequencing using STAR, RSEM, HISAT2 or Salmon with gene/isoform counts and extensive quality control.\n"
+    "Here, we'll run `nf-core/rnaseq` to process `.fastq` files from bulk RNA sequencing using STAR, RSEM, HISAT2, Salmon with gene/isoform counts and extensive quality control ([reference](https://nf-co.re/rnaseq/3.12.0)).\n",
+    "\n",
+    "![](https://raw.githubusercontent.com/nf-core/rnaseq/3.12.0//docs/images/nf-core-rnaseq_metro_map_grey.png)\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "531093fd-67af-40fd-b481-fbf68828bcfd",
+   "id": "b7c8e52d",
    "metadata": {},
    "source": [
-    "## Setup"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f51e29c1",
-   "metadata": {},
-   "source": [
-    "To run this notebook, you need to load a LaminDB instance that has the `bionty` schema mounted.\n",
-    "\n",
-    "Here, we’ll create a test instance (skip if you’d like to run it using your instance):"
+    "Let's create a test instance:"
    ]
   },
   {
@@ -57,8 +49,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "import lamindb as ln\n",
-    "from pathlib import Path"
+    "import lamindb as ln"
    ]
   },
   {
@@ -69,6 +60,14 @@
     "## Download test data"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "4f32ae96",
+   "metadata": {},
+   "source": [
+    "Download test data using git:"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -88,7 +87,7 @@
    "id": "be7f913a",
    "metadata": {},
    "source": [
-    "To keep track of the download, let's create a \"Download\" transform and a track a run pointing to the reference url:"
+    "Track the download:"
    ]
   },
   {
@@ -99,17 +98,16 @@
    "outputs": [],
    "source": [
     "download = ln.Transform(name=\"Download\")\n",
-    "ln.track(\n",
-    "    download, reference=\"https://github.com/nf-core/test-datasets\", reference_type=\"url\"\n",
-    ")"
+    "download_url = \"https://github.com/nf-core/test-datasets\"\n",
+    "ln.track(download, reference=download_url, reference_type=\"url\")"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "26d980c5",
    "metadata": {},
    "source": [
-    "Let's register the files we need from the download, they'll automatically be linked against the download run:"
+    "Register input files - they'll automatically be linked against the download run:"
    ]
   },
   {
@@ -123,18 +121,18 @@
    },
    "outputs": [],
    "source": [
-    "input_fastqs_file = ln.File.from_dir(\"test-datasets/testdata/GSE110004/\")\n",
-    "ln.save(input_fastqs_file)\n",
-    "sample_sheet_file = ln.File(\"test-datasets/samplesheet/v3.10/samplesheet_test.csv\")\n",
-    "ln.save(sample_sheet_file)"
+    "sample_sheet = ln.File(\"test-datasets/samplesheet/v3.10/samplesheet_test.csv\")\n",
+    "ln.save(sample_sheet)\n",
+    "input_fastqs = ln.File.from_dir(\"test-datasets/testdata/GSE110004/\")\n",
+    "ln.save(input_fastqs)"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "f915ff7a",
    "metadata": {},
    "source": [
-    "Let's visualize data lineage for one of the files:"
+    "Visualize data lineage for one of the files:"
    ]
   },
   {
@@ -144,23 +142,31 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "sample_sheet_file.view_lineage()"
+    "sample_sheet.view_lineage()"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "ecb68cf2-1188-4f8b-a2ab-01c60d5779b8",
    "metadata": {},
    "source": [
-    "## Track the nf-core rnaseq run"
+    "## Track the Nextflow run"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3b698d87",
+   "metadata": {},
+   "source": [
+    "(We'd start here if input files were tracked in the cloud with LaminDB rather than downloaded through git.)"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "3e1224fd",
    "metadata": {},
    "source": [
-    "Let's now track the Nextflow workflow:"
+    "Track the Nextflow pipeline & run:"
    ]
   },
   {
@@ -176,7 +182,6 @@
     "    type=\"pipeline\",\n",
     "    reference=\"https://github.com/laminlabs/nextflow-lamin-usecases\",\n",
     ")\n",
-    "\n",
     "ln.track(nextflow_bulkrna)"
    ]
   },
@@ -185,7 +190,9 @@
    "id": "670533a7",
    "metadata": {},
    "source": [
-    "If we now stage input files, they'll be tracked as run inputs (if input data is tracked in the cloud and registered in LaminDB, this is where we'd typcically start):"
+    "If we now stage input files, they'll be tracked as run inputs.\n",
+    "\n",
+    "(As data is already locally available in this test case, staging won't download anything.)"
    ]
   },
   {
@@ -199,16 +206,16 @@
    },
    "outputs": [],
    "source": [
-    "sample_sheet_file.stage()\n",
-    "[input_fastq.stage() for input_fastq in input_fastqs_file]"
+    "sample_sheet.stage()\n",
+    "[input_fastq.stage() for input_fastq in input_fastqs]"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "17f9905e-0a34-4335-b0c4-eb9b598c8eaf",
    "metadata": {},
    "source": [
-    "We'll pass the LaminDB run id to the nextflow run, so that we can easily find it from within Nextflow:"
+    "All data is now in place and we can run the nextflow pipeline:"
    ]
   },
   {
@@ -225,6 +232,14 @@
     "!nextflow run nf-core/rnaseq -r 3.11.2 -profile test,docker --outdir rna-seq-results -name {ln.dev.run_context.run.id} -resume"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "58eea7fc",
+   "metadata": {},
+   "source": [
+    "Here, we passed the LaminDB run id to nextflow so that we can query it from within nextflow."
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "fb81c953",
@@ -244,26 +259,27 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "6e7b5f1d-b00b-43d3-bc46-83b14144a8ba",
-   "metadata": {
-    "tags": []
-   },
+   "id": "7140018a-9ef7-4136-a595-37b514c66a81",
+   "metadata": {},
    "outputs": [],
    "source": [
-    "# this would register 240 files, we don't need them here\n",
-    "# multiqc_results = ln.File.from_dir(\"rna-seq-results/multiqc/\")\n",
-    "# ln.save(multiqc_results)"
+    "multiqc_file = ln.File(\"rna-seq-results/multiqc/star_salmon/multiqc_report.html\")\n",
+    "multiqc_file.save()"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "7140018a-9ef7-4136-a595-37b514c66a81",
+   "cell_type": "markdown",
+   "id": "a588717f",
    "metadata": {},
-   "outputs": [],
    "source": [
-    "multiqc_file = ln.File(\"rna-seq-results/multiqc/star_salmon/multiqc_report.html\")\n",
-    "multiqc_file.save()"
+    ":::{dropdown} How would I register all QC files?\n",
+    "\n",
+    "```python\n",
+    "multiqc_results = ln.File.from_dir(\"rna-seq-results/multiqc/\")\n",
+    "ln.save(multiqc_results)\n",
+    "```\n",
+    "\n",
+    ":::"
    ]
   },
   {
@@ -285,12 +301,20 @@
     "count_matrix.save()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "dd98074b",
+   "metadata": {},
+   "source": [
+    "## Link biological entities"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "22c88eed-61e0-4d12-96bb-ea4e10f476c0",
    "metadata": {},
    "source": [
-    "To make it queryable by biological entities (genes, etc.), we can now proceed with: {doc}`docs:bulkrna`"
+    "To make the count matrix queryable by biological entities (genes, experimental metadata, etc.), we can now proceed with: {doc}`docs:bulkrna`"
    ]
   },
   {