This REANA reproducible analysis example studies the Higgs-to-four-lepton decay channel that led to the Higgs boson experimental discovery in 2012. The example uses CMS open data released in 2011 and 2012.
Making a research data analysis reproducible basically means to provide "runnable recipes" addressing (1) where is the input data, (2) what software was used to analyse the data, (3) which computing environments were used to run the software and (4) which computational workflow steps were taken to run the analysis. This will permit to instantiate the analysis on the computational cloud and run the analysis to obtain (5) output results.
The analysis takes the following inputs:
- the list of CMS validated runs included in the
inputs
directory:Cert_190456-208686_8TeV_22Jan2013ReReco_Collisions12_JSON.txt
- a set of data files in the ROOT format, processed
from CMS public datasets, included in the
inputs
directory:DoubleE11.root
DoubleE12.root
DoubleMu11.root
DoubleMu12.root
DY1011.root
DY1012.root
DY101Jets12.root
DY50Mag12.root
DY50TuneZ11.root
DY50TuneZ12.root
DYTo2mu12.root
HZZ11.root
HZZ12.root
TTBar11.root
TTBar12.root
TTJets11.root
TTJets12.root
ZZ2mu2e11.root
ZZ2mu2e12.root
ZZ4e11.root
ZZ4e12.root
ZZ4mu11.root
ZZ4mu12.root
- CMS collision data from 2011 and 2012 accessed "live" during analysis via CERN Open Data portal:
- CMS simulated data from 2011 and 2012 accessed "live" during analysis via CERN Open Data portal:
The analysis will consist of two stages. In the first stage, we shall process
the original collision data (using demoanalyzer_cfg_level3data.py
) and
simulated data (using demoanalyzer_cfg_level3MC.py
) for one Higgs signal
candidate with reduced statistics. In the second stage, we shall plot the
results (using M4Lnormdatall_lvl3.cc
). The HiggsDemoAnalyzer
directory
contains the analysis code plugin for the CMSSW
analysis framework.
In order to be able to rerun the analysis even several years in the future, we need to "encapsulate the current compute environment", for example to freeze the software package versions our analysis is using. We shall achieve this by preparing a Docker container image for our analysis steps.
This analysis example runs within the CMSSW analysis framework that was packaged for Docker in clelange/cmssw.
The analysis workflow is simple and consists of two above-mentioned stages:
START
/ \
/ \
/ \
+-------------------------+ +------------------------+
| process collision data | | process simulated data |
+-------------------------+ +------------------------+
\ /
\ Higgs4L1file.root / DoubleMuParked2012C_10000_Higgs.root
\ /
+-------------------------+
| produce final plot |
+-------------------------+
|
| mass4l_combine_userlvl3.pdf
V
STOP
We shall use the CWL workflow specification to express the computational workflow:
and its individual steps:
The example produces a plot showing the Higgs signal:
Optional
If you would like to test the analysis locally (i.e. outside of the REANA platform), you can proceed as follows.
Using pure Docker:
$ docker run -i -t --rm \
-v `pwd`/inputs:/inputs \
-v `pwd`/code:/code \
-v `pwd`/outputs:/outputs \
clelange/cmssw:5_3_32 \
/bin/bash -c 'cp -r /code/HiggsExample20112012 .; \
scram b; \
cd /code/HiggsExample20112012/Level3; \
cmsRun ./demoanalyzer_cfg_level3data.py'
$ docker run -i -t --rm \
-v `pwd`/inputs:/inputs \
-v `pwd`/code:/code \
-v `pwd`/outputs:/outputs \
clelange/cmssw:5_3_32 \
/bin/bash -c 'cp -r /code/HiggsExample20112012 .; \
scram b; \
cd /code/HiggsExample20112012/Level3; \
cmsRun demoanalyzer_cfg_level3MC.py'
$ docker run -i -t --rm \
-v `pwd`/inputs:/inputs \
-v `pwd`/code:/code \
-v `pwd`/outputs:/outputs \
clelange/cmssw:5_3_32 \
/bin/bash -c 'cd /code/HiggsExample20112012/Level3; \
root -b -l -q ./M4Lnormdatall_lvl3.cc'
Using CWL:
$ cwltool --outdir=./outputs ./workflow/workflow.cwl ./workflow/input.yaml
We start by creating a reana.yaml file describing the above analysis structure with its inputs, code, runtime environment, computational workflow steps and expected outputs:
version: 0.3.0
inputs:
files:
- code/HiggsExample20112012/HiggsDemoAnalyzer/src/HiggsDemoAnalyzerGit.cc
- code/HiggsExample20112012/Level3/demoanalyzer_cfg_level3data.py
- code/HiggsExample20112012/Level3/demoanalyzer_cfg_level3MC.py
- code/HiggsExample20112012/Level3/M4Lnormdatall_lvl3.cc
parameters:
input: workflow/input.yaml
workflow:
type: cwl
file: workflow/workflow.cwl
environments:
- type: docker
image: clelange/cmssw:5_3_32
outputs:
files:
- results/mass4l_combine_userlvl3.pdf
We can now install the REANA command-line client, run the analysis and download the resulting plots:
$ # install REANA client:
$ mkvirtualenv reana-client
$ pip install reana-client
$ # connect to some REANA cloud instance:
$ export REANA_SERVER_URL=https://reana.cern.ch/
$ export REANA_ACCESS_TOKEN=XXXXXXX
$ # create new workflow:
$ reana-client create -n my-analysis
$ export REANA_WORKON=my-analysis
$ # upload input code and data to the workspace:
$ reana-client upload ./code ./data
$ # start computational workflow:
$ reana-client start
$ # ... should be finished in about a minute:
$ reana-client status
$ # list workspace files:
$ reana-client list
$ # download output results:
$ reana-client download results/mass4l_combine_userlvl3.pdf
Please see the REANA-Client
documentation for more detailed explanation of typical reana-client
usage
scenarios.
This example is based on the original open data analysis by Jomhari, Nur Zulaiha; Geiser, Achim; Bin Anuar, Afiq Aizuddin, "Higgs-to-four-lepton analysis example using 2011-2012 data", CERN Open Data Portal, 2017. DOI: 10.7483/OPENDATA.CMS.JKB8.RR42
The list of contributors to this REANA example in alphabetical order: