Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 10 additions & 10 deletions doc/cellmap/fibsem-transfer-flow.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
```mermaid
graph TD
SCOPE_DAT_A[<u>jeiss scope</u><br> z1_0-0-0.dat, z1_0-0-1.dat, <br> z1_0-1-0.dat, z1_0-1-1.dat] --> TRANSFER_DAT
SCOPE_DAT_B[<u>jeiss scope</u><br> zn_0-0-0.dat, zn_0-0-1.dat, <br> zn_0-1-0.dat, zn_0-1-1.dat] --> TRANSFER_DAT
TRANSFER_DAT{transfer} --> DM11_DAT_A & DM11_DAT_B
DM11_DAT_A[<u>dm11</u><br><s> z1_0-0-0.dat, z1_0-0-1.dat, <br> z1_0-1-0.dat, z1_0-1-1.dat </s>] --> DAT_TO_H5
DM11_DAT_B[<u>dm11</u><br><s> zn_0-0-0.dat, zn_0-0-1.dat <br> zn_0-1-0.dat, zn_0-1-1.dat </s>] --> DAT_TO_H5
DAT_TO_H5{convert and <br> remove dm11 dats <br> after verification} --> DM11_RAW_H5 & NRS_ALIGN_H5
DM11_RAW_H5[<u>dm11</u><br><s> z1.raw.h5, ...<br> zn.raw.h5 </s>] --> ARCHIVE_H5
NRS_ALIGN_H5[<u>nrs</u><br> z1.uint8.h5, ...<br> zn.uint8.h5]
ARCHIVE_H5{archive and <br> remove dm11 raw h5s <br> after verification} --> NEARLINE_RAW_H5
NEARLINE_RAW_H5[<u>nearline</u><br> z1.raw-archive.h5, ...<br> zn.raw-archive.h5]
SCOPE_DAT_A(<b>jeiss scope</b><br> z1_0-0-0.dat, z1_0-0-1.dat, <br> z1_0-1-0.dat, z1_0-1-1.dat) --> TRANSFER_DAT
SCOPE_DAT_B(<b>jeiss scope</b><br> zn_0-0-0.dat, zn_0-0-1.dat, <br> zn_0-1-0.dat, zn_0-1-1.dat) --> TRANSFER_DAT
TRANSFER_DAT{{transfer}} --> DM11_DAT_A & DM11_DAT_B
DM11_DAT_A(<b>prfs</b><br><s> z1_0-0-0.dat, z1_0-0-1.dat, <br> z1_0-1-0.dat, z1_0-1-1.dat </s>) --> DAT_TO_H5
DM11_DAT_B(<b>prfs</b><br><s> zn_0-0-0.dat, zn_0-0-1.dat <br> zn_0-1-0.dat, zn_0-1-1.dat </s>) --> DAT_TO_H5
DAT_TO_H5{{ convert and <br> remove prfs dats <br> after verification }} --> DM11_RAW_H5 & NRS_ALIGN_H5
DM11_RAW_H5(<b>prfs</b><br><s> z1.raw.h5, ...<br> zn.raw.h5 </s>) --> ARCHIVE_H5
NRS_ALIGN_H5(<b>nrs</b><br> z1.uint8.h5, ...<br> zn.uint8.h5)
ARCHIVE_H5{{archive and <br> remove prfs raw h5s <br> after verification}} --> NEARLINE_RAW_H5
NEARLINE_RAW_H5(<b>nearline</b><br> z1.raw-archive.h5, ...<br> zn.raw-archive.h5)
```
49 changes: 49 additions & 0 deletions doc/cellmap/project_management.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Project management
All projects are tracked on the [recon_fibsem repository](https://github.com/JaneliaSciComp/recon_fibsem) on GitHub.
This repository is used for documenting and exchanging the information necessary to ingest and process new datasets, tracking the progress of the processing, and reviewing the processed datasets.

```mermaid
sequenceDiagram
participant F as FIB-SEM <br/> shared resource
participant S as SciComp
participant G as GitHub
participant C as Collaborator

F->>G: Create new issue
G-->>S: Notify
activate S
S->>G: Include in project board
Note over S: Ingestion and processing
loop Review
S->>G: Post updates and questions
G-->>C: Notify
C->>G: Review and feedback
G-->>S: Notify
end
S->>C: Finalize and hand off
deactivate S
```

## Issues
For every new dataset, an issue is opened. What information should be included and what processes this triggers is detailed in [part 1 of the pipeline documentation](steps/2-1_render_import.md). Here, we concentrate on the management of the issues.

Generally, GitHub is mainly used to track the progress of the processing and to communicate with the collaborators. This can be sped up by assigning the issue to the responsible person, by using labels, and by tagging people from which feedback is needed.

### Labels

There are three different categories of labels:

- **Workflow labels**: `00: imaging in progress`, `01: prep done`, `02: alignment done`, `03: alignment review done`, `04: z correction done`, `05: streak correction done`, `06: intensity correction done`, `07: straightening done`, `09: clean-up done`
- **Collaborator labels**: `Cell Map`, `eFIB-SEM SR` (the FIB-SEM shared resource), `Fly EM`
- **TODO labels**: `needs alignment`, `needs clean-up`, `needs cross-correlation`, `needs N5 creation`, `needs review`, `needs straightening`, `needs streak correction`, `non-rigid tile deformations`, `alignment review concerns`

Each dataset should have the right collaborator label. Apart from that, there are no strict rules about labeling. Use the TODO labels as needed to keep track of what needs to be done. The workflow labels are used to document the processing steps that have been done. When a dataset is done processing, usually all workflow labels below 06 are removed, as they are recorded in the issue history anyway.

### Project board

All issues are collected on the [project board](https://github.com/JaneliaSciComp/recon_fibsem/projects/1), which has four columns:

- **Alignment**: All new issues are automatically entered in this column and stay there during the all standard processing steps.
- **Review**: Once processing is done, the issue is moved to this column, where Preibisch and the collaborators have a final look at it. If further processing is necessary, the issue stays in this column for the time necessary to address all issues.
- **Done**: Once the dataset passes the review process, it is moved to this column to signal that we don’t do further processing on the dataset. The dataset is now ready to be handed off to the collaborator.
- **Cleaned Up**: Once the collaborator has taken over the dataset and has indicated that no further processing is necessary, the issue is moved to this column. At this stage all intermediate data that was used for processing (including 8bit images for alignment) are deleted. This clean up is only done irregularly if disk space is scarce.
68 changes: 68 additions & 0 deletions doc/cellmap/steps/1_transfer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Transfer data from scope to shared filesystem

```bash
# Commonly used paths
EMRP_ROOT="/groups/flyem/data/render/git/EM_recon_pipeline"
TRANSFER_INFO_DIR="src/resources/transfer_info/cellmap"
TRANSFER_DIR="/groups/flyem/home/flyem/bin/dat_transfer/2022"
```

## Get necessary metadata
```mermaid
flowchart LR
github(Issue on<br>GitHub)
json{{<tt>volume_transfer_info.&ltdataset&gt.json</tt>}}
scope(Scope)
github-- Extract data --->json
json<-. Check data ..->scope

```
There is an issue template that has to be filled out by the FIB-SEM shared resource to gather necessary data for subsequent processing. This information should be converted to a file called `volume_transfer_info.<dataset>.json`, where `<dataset>` is the dataset name as specified by the collaborator. No dashes should be in the project name since MongoDB doesn't like them that much. This file should be put under `${TRANSFER_INFO_DIR}` in your own copy of this repository and committed.

It is best to double-check the validity of the data from the GitHub issue. Sometimes, some post-processing is necessary to make sure all necessary data are present and correct. To this end, log into the scope as user `flyem` and navigate to the `root_keep_path` specified in the issue, e.g.:
```bash
su flyem
ssh jeiss7.hhmi.org
cd /cygdrive/d/UploadFlags
```
Make sure that the following data are correct:
* `data_set_id` and `root_dat_path`, which can be directly seen from the filenames listed in the directory; the `root_dat_path` should not have any timestamp in it;
* `columns_per_z_layer` and `rows_per_z_layer`, which can be inferred from the file suffix, e.g., `*_0-1-3.dat` being the highest suffix means that there are 2 rows and 4 columns.

This is also a good point to check if there is enough space on the collaborator's drives for the data and to increase the quota if necessary. The correct group for billing can be found with `lsfgroup <username of the collaborator>`.

## Transfer and conversion
```mermaid
flowchart LR
jenkins(Jenkins<br>server)
scope(Scope)
prfs(prfs)
nrs(nrs)
nearline(nearline)
jenkins-.->|"Initiate transfer (30min)"|scope
scope-->|Raw .dat|prfs
jenkins-.->|"Initiate conversion (2h)"|prfs
prfs--->|16-bit HDF5|prfs
prfs--->|8-bit HDF5 for processing|nrs
prfs--->|move 16-bit HDF5 after conversion<br>remove corresponding .dat|nearline
```
To set up transfer, make sure that `volume_transfer_info.<dataset>.json` is in `${EMPR_ROOT}/${TRANSFER_INFO_DIR}`. Then, go to `${TRANSFER_DIR}` and execute
```bash
./00_setup_transfer.sh cellmap/volume_transfer_info.<dataset>.json
```
This will copy the json file into `${TRANSFER_DIR}/config`, where it can be found by the processes run by the Jenkins server.

### Configuring and starting the Jenkins server for transfer
1. Log in to [the server](https://jenkins.int.janelia.org) and navigate to your scope under the FlyEM tab.
2. If the process is not enabled, enable it.
3. To test the process, go to the build steps of the configuration menu and select a short amount of time (e.g. 2min instead of 29min) and hit `Build now`. Make sure that the test run doesn't overlap with a scheduled run (happening every 30min - look at past runs and note that the time is in GMT).
4. If the run is successful (check in run > console output), set the time back to 29min and save the configuration.

The Jenkins process for conversion should always be running and happens every two hours. Note that everything resides under the FlyEM tab even if the current acquisition is done by the FIB-SEM shared resource for another collaborator, since the shared resources was initially founded for FlyEM, so the name has historic reasons.

## Set up processing directory
To set up a directory for subsequent processing, execute `11_setup_volume.sh` in `${TRANSFER_DIR}`. This will copy all relevant scripts for processing from this directory to a directory on `prfs` specified in `volume_transfer_info.<dataset>.json`.

## Update GitHub issue
To automatically create a text with all relevant links for the GitHub issue, execute `gen_github_text.sh` in the processing directory. This text can be used to update the issue description.
Furthermore, assign the issue to the person responsible for the next steps, add the right labels, and add the issue to the project board.
25 changes: 25 additions & 0 deletions doc/cellmap/steps/2-1_render_import.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Reconstruction Part 1: Render Import
```bash
# Commonly used paths
RENDER_DIR="/groups/flyem/data/render/"
PROCESSING_DIR=<path to the processing directory set up in previous step>
```
We assume that the data has been successfully transferred and converted to HDF5 format, and that there is a directory set up for processing which we call `PROCESSING_DIR` (see [transfer documentation](1_transfer.md) for details). The first step for processing is to import the data into the Render database so that it can be accessed by the render clients.

## Generate TileSpecs
Go to a machine with access to `${RENDER_DIR}`, navigate to `${PROCESSING_DIR}` and run the following command:
```bash
./07_h5_to_render.sh
```
This will launch a local dask job that will generate TileSpecs from the HDF5 files and upload them to the render database. All necessary information is read from the `volume_transfer_info.<dataset>.json` file that was created in the previous step. After this step, a dynamically rendered stack can be accessed in the point match explorer and viewed in neuroglancer.

## Set up preview volume: export to N5
NOTE: this is a feature under active development and the process will likely change in the near future. For now, the following steps are necessary.

Go to the Janelia compute cluster (e.g., `ssh login1.int.janelia.org`), navigate to `${PROCESSING_DIR}` and run the following command:
```bash
./99_append_to_export.sh <number of executors>
```
This will submit a couple of spark cluster jobs and set up logging directories. The number of executors should be chosen based on the size of the dataset. Currently, a single executor will occupy 10 cores on the cluster and 3 more cores for the driver are needed. The logs for scripts usually reside in `${PROCESSING_DIR}/logs`, but the spark jobs will set up additional log directories for each executor and the driver. These directories are printed to the console when executing above command.

Once fininshed, the location of the final N5 volume can be found either in the driver logs or by running `gen_github_text.sh` again.
Loading