This framework is a modernised version of DeepJet and DeepJetCore, taking advantage of modern packages like PyTorch, numpy, awkward, coffea, uproot and law. You will be able to read in ROOT files, extract features needed for a training of the DeepJet model, perform a training, make predictions using a trained model and evaluate the output/performance.
For software setup, conda is used. This ensures portability to most machines but is not mandatory for running the framework.
# clone repository
git clone ssh://[email protected]:7999/cms-btv/b-hive.git
# set up conda env
conda env create -n b_hive -f env.yml
# activate eny
conda activate b_hive
# set up environment variables
source setup.sh
In the last step of the setup.sh
, a local script is sourced, called local_setup.sh
.
This should be created by the user and specifies the working directory, where results should be placed.
For example:
#!/bin/bash
export DATA_PATH=/net/scratch/YOURDIRECTORY/BTV/training/
if this file is not created or $DATA_PATH
is not set otherwise, everything will be placed in the results directory.
- Everytime you want to use the framework, you need to source
setup.sh
by executing
source setup.sh
in the shell.
To get familiar with the possibilities, running the basic law index command
law index --verbose
will print the availabel commands.
Every task has specific parameters in order to steer its behaviour, for example the number of training epochs or a debug flag.
ToDo
To peform a task simply execute
law run $TASK_NAME
in the shell. The currently available tasks are
DatasetConstructorTask
: reads in ROOT files and stores the relevant branches in numpy files,TrainingTask
: performes a training with the previously generated numpy files,InferenceTask
: performes a prediction using the previously trained model andPlottingTask
: generates ROC curves using the output of the prediction
Due to the usage of law, the framework will check if previous steps in the chain have already been completed and automatically execute them if necessary or fall back on intermediate results to execute the requested task.