MTI Workflow to run normal mode analysis on 3D nuclear segmentations.
This workflow currently relies on segmentations downloaded from Labkey, so running the pipeline as-is on Labkey data can only be done from within AICS.
If installing somewhere other than AICS compute-cluster infrastructure (e.g. your local machine) ... you will need:
AICS certificates to be able to install the required package lkaccess
. Instructions to setup certs on an macOS machine are as follows:
- Visit http://confluence.corp.alleninstitute.org/display/SF/Certs+on+OS+X
- Download the three .crt files, open each and keychain to System and hit 'Add' to trust
- Download
pip_conf_setup.sh
to project directory - Install wget:
brew install wget
- Run the downloaded setup file:
sudo bash pip_conf_setup.sh
Clone this repository, then
cd mti_nma
conda create --name mti_nma python=3.7
conda activate mti_nma
pip install -e .
After following the "Normal users" installation instructions,
pip install -e .[all]
Some visualizations included in this pipeline will run Blender code, requiring the user to have Blender downloaded on their machine. You can download Blender for free here:
https://www.blender.org/download/
The pipeline currently has the Blender path set to the default Mac OS location. If you are using another operating system or have downloaded Blender in another location, you will need to pass the path to where you have downloaded Blender. An example is provided below with the example run line in the "Running" section.
- Global config settings are in
.config
- Individual project steps are in
mti_mnist/steps/<step_name>
- to run an indiviual step such as
norm
, the cmd line workflow would bemti_nma norm pull
mti_nma norm run
mti_nma norm push
- to run the entire workflow, from the cmd line use
mti_nma all pull
mti_nma all run
mti_nma all push
There are several parameters you may want to pass (number of samples), and one which you MUST pass - the path to your blender application (if you are not running on a Mac with Blender downloaded in the default location). To pass this (or other parameters) you would run a line like the following"
- Generic parameter pass
mti_nma all run --<parameter_name> <parameter_value>
- Multiple structures pass
mti_nma all run --structs <struct1>,<struct2>
- Pass them as a comma separated list, I.E.
--structs Nuc,Cell
- Blender path pass
mti_nma all run --path_blender <your_blender_app_path>
- Blender path and number of samples pass
mti_nma all run --nsamples <int> --path_blender <your_blender_app_path>
By default, all of the actual computation (per FOV, per Cell, etc) run in parallel using either Dask, or ThreadPools.
What this means is that when you run: mti_nma all run
, it will use all of the cores on your current machine for
processing where each core will be a Dask worker.
As tracing errors with Dask can be a bit tricky, if you want to turn this off, simply run the step individually, or use
the debug
flag (mti_nma all run --debug
) and the processing will happen on threads instead of Dask workers.
If you want to utilize the SLURM cluster with Dask you can provide the distributed
flag
(mti_nma all run --distributed
). However, if you are doing this you will need to tell the pipeline where blender
is
located as well. A full command for this would be:
mti_nma all run --path_blender /allen/aics/modeling/jacksonb/applications/blender-2.82-linux64/blender --distributed
If you aren't running in debug mode, following the processing is a bit more difficult to do. When the pipeline starts
however, a message should be printed to log.info
that says something like:
[INFO: 121 2020-03-05 11:21:49,236] Dask dashboard available at: http://127.0.0.1:8787/status
If you are running on your local machine, then great, that link will simply work for you and you open up that page to monitor what Dask is working on.
If you are running on the cluster, that link will be for the machine you are connected to in the cluster and you should run a port forwarding command to connect to view the Dask monitor on your local machine.
I.E. In a new terminal:
ssh -A -J slurm-master -L {port_provided_by_log}:{node_you_are_on}:{port_provided_by_log} {node_you_are_on}
Additionally, logs every worker started by the pipeline will be placed in the directory:
~/.dask_logs/mti_nma/{datetime_of_pipeline_start}/
The following commands are the minimal set of commands to get setup for running the pipeline with a large nsamples
:
-
Connect to slurm-master and request a CPU node to start the Dask scheduler / run the pipeline on
ssh slurm-master srun -c 8 -p aics_cpu_general --pty bash
-
Navigate to your
mti_nma
project foldercd projects/mti_nma
-
Setup or connect to your conda environment
-
Setup a conda environment
conda create --name mti_nma python=3.7 -y conda activate mti_nma pip install -e .
OR:
-
Connect to your existing conda environment
conda activate mti_nma
-
-
Run the pipeline
mti_nma all run --path_blender /allen/aics/modeling/jacksonb/applications/blender-2.82-linux64/blender --nsamples {int} --distributed
-
(Optional) Connect to the Dask UI
In a new terminal
ssh -A -J <username>@slurm-master.corp.alleninstitute.org -L {port_provided_by_log}:{node_you_are_on}:{port_provided_by_log} <username>@{node_you_are_on}