Merge branch 'main' of https://github.com/lincbrain/linc-docs

lincbrain · Dec 30, 2024 · ee7a7d3 · ee7a7d3
2 parents 52483c5 + 72da2cc
commit ee7a7d3
Showing 1 changed file with 89 additions and 43 deletions.
diff --git a/docs/upload.md b/docs/upload.md
@@ -1,24 +1,27 @@
 # Uploading data to lincbrain.org
 **Note:** The steps below require access to https://lincbrain.org, which is a private repository for LINC project investigators.
 
-## Create a new dataset or contribute to an existing dataset
-A dataset refers to a collection of brains that have been processed and imaged in a similar way. It typically contains data from multiple brains or samples, possibly imaged with multiple modalities.
+## Contribute to an existing dataset or create a new dataset
+Before you create a new dataset, please look over the existing datasets and make sure that there is no dataset appropriate for adding your new images to. Only create a new dataset as a last resort. If your images come from a brain that is already included in an existing dataset, definitely add to the existing dataset, even if the existing images come from a different part of that brain and/or were imaged with a different modality than yours. A dataset refers to a collection of brains that have been processed and imaged for a similar purpose. It typically contains data from multiple brains or samples, possibly imaged with multiple modalities.
 
+1. Contribute to an existing dataset
+    1. Log into the [LINC Data Platform](lincbrain.org).
+    1. Browse the existing datasets under the [SHARED DATASETS](https://lincbrain.org/dandiset) tab.
+    1. Contact the `Owner` by email requesting that they add you as an `Owner` of the dataset.
 1. Create a new dataset
     1. Log into the [LINC Data Platform](lincbrain.org).
+    1. Browse the existing datasets under the [SHARED DATASETS](https://lincbrain.org/dandiset) tab. Make sure that you have followed the guidelines above and there is no existing dataset that your new files can be added to.
     1. Click on the `NEW DATASET` button at the top right of the page.
     1. Fill out the title, description, and license. (The license option exists only because lincbrain.org is a clone of DANDI. It has no effect here, as lincbrain.org is a private repository.)
     1. Click on the `REGISTER DATASET` button to create the new dataset.
-1. Contribute to an existing dataset
-    1. Log into the [LINC Data Platform](lincbrain.org).
-    1. Browse the existing datasets under the [SHARED DATASETS](https://lincbrain.org/dandiset) tab.
-    1. Contact the `Owner` by email requesting that they add you as an `Owner` of the dataset.
 
 ## Install the dandi command-line interface (CLI)
 On your local machine, install the [dandi CLI package](https://pypi.org/project/dandi/) in a python environment:
 
 `pip install dandi`
 
+**Note:** If you are in the MGH LINC team, do not do this step. Instead, ask your labmates how to activate the existing dandi python environment.
+
 ## Copy your lincbrain API key
 Log into lincbrain.org and click on the button with your initials at the top right of the page. Copy your API key and enter it in the following environment variable on your local machine:
 
@@ -50,21 +53,28 @@ The above example will create a directory called `/path/to/my/staging/area/10101
 For more information, on the `--preserve-tree` option please see the [DANDI Handbook](https://www.dandiarchive.org/handbook/12_download/#download-the-dandisetyaml-file-and-a-specific-file-within-the-directory-tree-of-the-dandiset).
 
 ## Organize your data
-An example of how to organize a dataset that includes dMRI and histology data from two brains is shown below:
+An example from a (fictional) dataset that includes dMRI and histology data from two brains is shown below. This can be used as a guide for organizing your own dataset. A few things to watch out for:
+1. If you are creating a new dataset, you have to create the `dataset_description.json` file.
+1. If you are adding data from a new subject to an existing dataset, you have to add an entry with the new subject's information to the `participants.tsv` file.
+1. If you are adding data from a new sample (part of the brain) to an existing dataset, you have to add an entry with the new sample's information to the `samples.tsv` file.
+1. Any data files that you add must go under:
+   * `rawdata/` if they are raw (unprocessed) data in NIfTI or Zarr format
+   * `sourcedata/` if they are raw (unprocessed) data in other formats
+   * `derivatives/` if they are the outputs of pre-processing or other downstream analyses of the raw data 
 
 ```
 101010/
   dataset_description.json
   participants.tsv
+  samples.tsv
   rawdata/
     sub-Ken1/
       dwi/
-        sub-Ken1_acq-DSI_dwi.bval
-        sub-Ken1_acq-DSI_dwi.bvec
-        sub-Ken1_acq-DSI_dwi.json
-        sub-Ken1_acq-DSI_dwi.nii.gz
+        sub-Ken1_sample-brain_acq-DSI_dwi.bval
+        sub-Ken1_sample-brain_acq-DSI_dwi.bvec
+        sub-Ken1_sample-brain_acq-DSI_dwi.json
+        sub-Ken1_sample-brain_acq-DSI_dwi.nii.gz
       micr/
-        samples.tsv
         sub-Ken1_sample-slice0001_photo.json
         sub-Ken1_sample-slice0001_photo.tif
         sub-Ken1_sample-slice0001_stain-Nissl_BF.json
@@ -83,16 +93,15 @@ An example of how to organize a dataset that includes dMRI and histology data fr
         sub-Ken1_sample-slice0010_stain-LY_DF.tif
     sub-Ken2/  
       dwi/
-        sub-Ken2_acq-DSI_dwi.bval
-        sub-Ken2_acq-DSI_dwi.bvec
-        sub-Ken2_acq-DSI_dwi.json
-        sub-Ken2_acq-DSI_dwi.nii.gz
-        sub-Ken2_acq-MulShellMulTE_dwi.bval
-        sub-Ken2_acq-MulShellMulTE_dwi.bvec
-        sub-Ken2_acq-MulShellMulTE_dwi.json
-        sub-Ken2_acq-MulShellMulTE_dwi.nii.gz
+        sub-Ken2_sample-brain_acq-DSI_dwi.bval
+        sub-Ken2_sample-brain_acq-DSI_dwi.bvec
+        sub-Ken2_sample-brain_acq-DSI_dwi.json
+        sub-Ken2_sample-brain_acq-DSI_dwi.nii.gz
+        sub-Ken2_sample-brain_acq-MulShellMulTE_dwi.bval
+        sub-Ken2_sample-brain_acq-MulShellMulTE_dwi.bvec
+        sub-Ken2_sample-brain_acq-MulShellMulTE_dwi.json
+        sub-Ken2_sample-brain_acq-MulShellMulTE_dwi.nii.gz
       micr/
-        samples.tsv
         sub-Ken2_sample-slice0001_photo.json
         sub-Ken2_sample-slice0001_photo.tif
         sub-Ken2_sample-slice0001_stain-Nissl_BF.json
@@ -117,11 +126,28 @@ An example of how to organize a dataset that includes dMRI and histology data fr
         sub-Ken2_sample-slice0011_photo.tif
         sub-Ken2_sample-slice0011_stain-FR_DF.json
         sub-Ken2_sample-slice0011_stain-FR_DF.tif
+  derivatives/
+    sub-Ken1/
+      dwi/
+        ...
+      scenes/
+        ...
+      xfms/
+        ...
+    sub-Ken2/
+      dwi/
+        ...
+      scenes/
+        ...
+      xfms/
+        ...
 ```
 
 The files and subdirectories in this example dataset are described in detail below.
 
-### dataset_description.json
+### Top-level files
+
+#### dataset_description.json
 This text file is [described in detail in the BIDS specification](https://bids-specification.readthedocs.io/en/stable/modality-agnostic-files.html#dataset_descriptionjson). A minimal `dataset_description.json` would look like this:
 
 ```
@@ -131,7 +157,7 @@ This text file is [described in detail in the BIDS specification](https://bids-s
 }
 ```
 
-### participants.tsv
+#### participants.tsv
 This text file is [described in detail in the BIDS specification](https://bids-specification.readthedocs.io/en/stable/modality-agnostic-files.html#participants-file). For this dataset, the `participants.tsv` might look like this:
 
 ```
@@ -140,21 +166,54 @@ sub-Ken1 43 M healthy
 sub-Ken2 61 M hypertension
 ```
 
-### rawdata
-This directory contains one subdirectory for each brain, which contain one subdirectory for each modality, which in turn contain raw image data files named according to the BIDS specification.
+#### samples.tsv
+This text file is [described in detail in the BIDS specification](https://bids-specification.readthedocs.io/en/stable/modality-agnostic-files.html#samples-file). For this dataset, the `samples.tsv` would look like this:
 
-### dwi
+```
+participant_id sample_id sample_type
+sub-Ken1 sample-brain tissue
+sub-Ken1 sample-slice0001 tissue
+sub-Ken1 sample-slice0002 tissue
+sub-Ken1 sample-slice0009 tissue
+sub-Ken1 sample-slice0010 tissue
+sub-Ken2 sample-brain tissue
+sub-Ken2 sample-slice0001 tissue
+sub-Ken2 sample-slice0002 tissue
+sub-Ken2 sample-slice0003 tissue
+sub-Ken2 sample-slice0009 tissue
+sub-Ken2 sample-slice0010 tissue
+sub-Ken2 sample-slice0011 tissue
+```
+
+Label the sample as `brain` if the whole brain was imaged. Label it as `lefthemi` or `righthemi` if a whole hemisphere was imaged.
+
+### rawdata/
+This directory contains one subdirectory for each brain, which contains one subdirectory for each modality, which in turn contains raw image data files named according to the BIDS specification.
+
+The name of every image or other data file under `rawdata/` must start with a subject and sample label, in that order. Examples: 
+* `sub-Ken1_sample-brain_*.nii.gz`: NIfTI images of the whole brain of subject Ken1
+* `sub-Ken2_sample-slice0001_*.json`: metadata files associated with images of a section of subject Ken2
+
+#### dwi/
 This directory contains dMRI data files [as described in detail in the BIDS specification](https://bids-specification.readthedocs.io/en/stable/modality-specific-files/magnetic-resonance-imaging-data.html#diffusion-imaging-data). 
 
-In this example the data include images (`.nii.gz`), b-value tables (`.bval`), gradient tables (`.bvec`), and metadata (`.json`). Data from Ken1 were acquired with a DSI scheme, whereas data from Ken2 were acquired both with a DSI scheme and with a multi-shell, multi-echo scheme.
+In this example the data include images (`.nii.gz`), b-value tables (`.bval`), gradient tables (`.bvec`), and metadata (`.json`). The name of every image or other data file under `rawdata/dwi/` must contain the subject, sample, and acquisition label, in that order, followed by the `dwi` suffix. Examples:
+* `sub-Ken1_sample-brain_acq-DSI_dwi.nii.gz`: NIfTI volumes of DWIs acquired with a DSI scheme from the whole brain of subject Ken1
+* `sub-Ken2_sample-brain_acq-MulShellMulTE_dwi.bval`: b-value table of a scan acquired with a multi-shell, multi-echo scheme, from the whole brain of subject Ken2
 
-### micr
+#### micr/
 This directory contains microscopy data files [as described in detail in the BIDS specification](https://bids-specification.readthedocs.io/en/stable/modality-specific-files/microscopy.html).
 
 In this example the data include images (`.tif`) and metadata (`.json`) from multiple brain sections. For each section there is a blockface photo (`_photo`) and a histological stain (`_stain`). Sections from Ken1 and Ken2 were either processed with a Nissl stain and imaged under brightfield microscopy (`_BF`), or processed for the fluorescent tracer Lucifer Yellow (`LY`) and imaged under darkfield microscopy (`_DF`). Additional sections from Ken2 were processed for the fluorescent tracer Fluoro-Ruby (`FR`) and imaged under darkfield microscopy (`_DF`).
 
+### derivatives/
+This directory contains one subdirectory for each brain, which contains one subdirectory for each modality, which in turn contains any data files that were produced by analyzing the raw data of that modality. At the same level as the modality-specific directories, there can also be directories for saving files that are typically produced by co-analyzing images of more than one modality, e.g., `xfms/` for transform files or `scenes/` for scene files.
 
-#### high-res histology annotation
+The name of every image or data file under `derivatives/` must start its subject and sample label, in that order. For any files that have been transformed to a space other than the native space, i.e., the one where the images were acquired, this must be followed by a space label. Do not use a space label for files that are in the native space. The name of the file must end with a description label that includes the methodology followed by the name of the derivative. Examples:
+* `sub-Ken1_sample-brain_desc-DTI_FA.nii.gz`: a fractional anisotropy (FA) map derived from a diffusion tensor imaging (DTI) analysis of a whole-brain scan of subject Ken1, in its native (individual DWI) space
+* `sub-Ken2_sample-brain_space-CIT168_desc-CSD_tractography.trk`: a tractogram derived from a constrained spherical deconvolution (CSD) analysis of a whole-brain scan of subject Ken2, transformed to CIT168 template space
+
+#### Special case: high-res histology annotation
 All annotation files use the following naming scheme:
 `<dataset-name> + _desc-[label] + _suffix.ome.zarr`
 where `[label]` is replaced by the annotator's initials and `_suffix` indicates the type of segmentations being annotated. Specifically, when annotating discrete segmentations, use `_dseg` as the suffix (see the [BIDS spec on discrete segmentations](https://bids-specification.readthedocs.io/en/stable/derivatives/imaging.html#discrete-segmentations)).
@@ -170,19 +229,6 @@ index  name
 ...
 ```
 
-
-
-### samples.tsv
-This text file is [described in detail in the BIDS specification](https://bids-specification.readthedocs.io/en/stable/modality-agnostic-files.html#samples-file). For Ken1, the `samples.tsv` would look like this:
-
-```
-participant_id sample_id sample_type
-sub-Ken1 sample-slice0001 tissue
-sub-Ken1 sample-slice0002 tissue
-sub-Ken1 sample-slice0009 tissue
-sub-Ken1 sample-slice0010 tissue
-```
-
 ## Upload your data
 If you are uploading to a new or existing dataset on lincbrain.org, please ensure that you have the `dandiset.yaml` and `dataset_description.json` files in your local directory (i.e. the `/path/to/my/staging/area/101010` in the example above).
 
@@ -206,4 +252,4 @@ dandi delete -i linc /path/to/my/staging/area/101010/rawdata/Ken2/horses
 
 ```
 dandi delete -i linc "https://lincbrain.org/dandiset/101010/draft/files?location=rawdata%2Fsub-Ken2%2Fhorses"
-```
+```