From be3bc0634a9b17988190f30c7c33c9e4c545dd21 Mon Sep 17 00:00:00 2001 From: Anh Nguyet Vu Date: Thu, 15 Aug 2024 15:52:28 -0700 Subject: [PATCH] Update vignette --- vignettes/annotate-nf-processed-data.Rmd | 34 +++++++++++------------- 1 file changed, 15 insertions(+), 19 deletions(-) diff --git a/vignettes/annotate-nf-processed-data.Rmd b/vignettes/annotate-nf-processed-data.Rmd index f39292c..1521262 100644 --- a/vignettes/annotate-nf-processed-data.Rmd +++ b/vignettes/annotate-nf-processed-data.Rmd @@ -45,9 +45,6 @@ get back a list of manifests that represent all useful dataset products from the These manifests can then be used to annotate the files as well as for creation of [Synapse Datasets](https://help.synapse.org/docs/Datasets.2611281979.html). -### Limitations - -- If sample ids and other information are updated on the original raw input files, data must reannotated (synced). ## Set up @@ -72,6 +69,11 @@ and get into format expected for downstream. 4. Transfer other meta from input to output processed files (most important are `individualID`, basic individual attributes, `assay`). 5. Set annotations for processed data type based on workflow default rules. +Given the above steps, some potential issues should be noted: +- Processed data files will also be missing or incorrect for anything annotations in that state for input files +- If sample ids and other information are updated on the original raw input files, data must be reannotated. +- Anything that deviates from a relatively standard workflow run, leading to changes in locations or naming of outputs, +might yield poor results for the annotation functionality here or require more manual composition of steps. ## nf-rnaseq @@ -105,16 +107,21 @@ input <- map_sample_input_ss(samplesheet) output <- map_sample_output_rnaseq(syn_out, fileview) meta <- processed_meta(input, output, workflow_link = wf_link) -# View the first dataset manifest in meta +``` -meta$manifests$`STAR and Salmon` +Inspect some manifests: +```{r, eval=FALSE} + head(meta$manifests$SAMtools) ``` +```{r, eval=FALSE} +head(meta$manifests$`STAR and Salmon` +``` ### Add provenance -Add provenance for the files involved using `add_activity_batch`. +Use `sample_io` to add provenance meta with `add_activity_batch`. "Workflow" provides the general name to the activity, while "workflow link" provides a more persistent reference to some version/part of the workflow, @@ -129,18 +136,6 @@ prov <- add_activity_batch(sample_io$output_id, sample_io$input_id) ``` -### Validate manifest - -Manifests can be inspected and validated using schematic before submission. -To do so, it has to be written to a .csv first. - -```{r rnaseq-meta-validate, eval=FALSE} - -fwrite(manifest_1, "manifest_1.csv") -manifest_validate(data_type = , file_name = "manifest_1.csv") - -``` - ### Submit manifest ```{r rnaseq-meta-submit, eval=FALSE} @@ -199,4 +194,5 @@ add_activity_batch(sample_io$output_id, ``` -Validating manifests, submitting manifests, and making datasets can follow the rna-seq examples above. +After provenance, the rest of the workflow for manifest submission or creating datasets is like the nf-rnaseq example. +