Repository to test the refactoring of the photo-z related pipelines from the DES Science Portal using Parsl.
The workflow was developed aiming at the best performance based on the current resources of the LIneA environment, however it can be used in any environment that fulfills the following requirements:
- Conda installed
- LePhare installed
- For use with HTCondor:
- the workstation must be a submission machine for HTCondor.
- the workstation must share the following resources with all HTCondor nodes:
- the file system / directory containing the photoz-parsl repository
- and the input data
-
Clone the repository and create an environment with Conda:
git clone https://github.com/linea-it/photoz-parsl && cd photoz-parsl conda create -n parsl-env python=3.9 conda activate parsl-env pip install -r requirements.pip
-
Copy the file that sets the environment
cp env.sh.template env.sh
-
Edit env.sh, adding the path to Conda (CONDAPATH) and the path to this repository (PHZ_ROOT):
export CONDAPATH=<conda path> #e.g.:/home/fulano/miniconda3/bin export PHZ_ROOT=<photoz-parsl repository path> export LEPHAREDIR=<LePhare dir>
-
Sets the environment:
source env.sh
-
Copy the workflow configuration file:
cp config.yml.template config.yml
-
Edit config.yml with information about the inputs data and settings:
phz_root_dir: <repository path> executor: local # determines the code execution location, we currently have two options: "local" and "htcondor" inputs: photometric_data: <photometric data path> zphot: <zphot.para path> settings: photo_corr: <column name to magnitude correction> # e.g.: ebv photo_type: <magnitude column> # e.g.: SOF_BDF_MAG_{}_CORRECTED err_type: <magnitude error column> # e.g.: SOF_BDF_MAG_ERR_{} bands: <band list> # e.g.: [g,r,i,z] partitions: <partition numbers> index: <index column> # e.g.: coadd_objects_id lephare_bin: <lephare bin> # e.g.: $LEPHAREDIR/source test_environment: turn_on: True limit_sample: [1,3] # determines how many files and how many partitions the code will use. e.g.: [1,3] 1 file and 3 partitions
-
Help to run the pipeline:
python pz-run.py -h usage: pz-run.py [-h] [-w WORKING_DIR] config_path positional arguments: config_path yaml config path optional arguments: -h, --help show this help message and exit -w WORKING_DIR, --working_dir WORKING_DIR run directory
Prepare the configuration files by running the following script:
python prep-env.py
Run pz-run.py passing in the example configuration file:
python pz-run.py sample-data/sample.yml
Parsl includes a flexible monitoring system to capture program and task state as well as resource usage over time.
To activate the monitoring system:
-
Copy the file that active the monitoring
cp monitoring.sh.template monitoring.sh
-
Edit monitoring.sh, adding the path to Conda (CONDAPATH) and the path to this repository (PHZ_ROOT):
export CONDAPATH=<conda path> #e.g.:/home/fulano/miniconda3/bin export PHZ_ROOT=<photoz-parsl repository path>
-
And run:
source monitoring.sh
-
To view the system: http://localhost:55555/
Note: if your workstation is remote, you will need to make an ssh tunnel by mapping port 55555 from the remote server to your local machine.