implemented Philip's review

haddadanas · Feb 14, 2024 · 3597ca6 · 3597ca6
1 parent 8cbcfa9
commit 3597ca6
Showing 1 changed file with 72 additions and 46 deletions.
diff --git a/docs/user_guide/examples.md b/docs/user_guide/examples.md
@@ -1,6 +1,6 @@
-# Examples
+# Creating your own task
 
-In this section, additional examples on the use of columnflow are presented.
+In this section, additional examples on the usage of columnflow are presented.
 
 ## Writing new tasks
 
@@ -10,6 +10,7 @@ This section will help you create your first columnflow task and contains some t
 Tasks in columnflow are based on {external+law:py:class}`~law.task.base.BaseTask` from the [law package](https://law.readthedocs.io/en/latest/index.html) by Marcel Rieger, which in turn is based on the [luigi package](https://luigi.readthedocs.io/en/stable/tasks.html) by Spotify.
 Refer to the documentation of these packages for more information on the task class and a detailed description of the implemented methods.
 
+(example_task_class_section)=
 ### Example: Task Class
 
 ```python showLineNumbers
@@ -24,32 +25,44 @@ class MyTask(AnalysisTask):
 
     # specify sandbox for the task
     sandbox = dev_sandbox(law.config.get("analysis", "my_columnar_sandbox"))
+
+    # Define a custom name space with which the task is called
+    # If not defined cf is usually used as default
+    task_namespace = "hbt"
 
     # class parameters
     param1 = luigi.Parameter(
-                default="default_value",
-                description="String parameter with default value",
-            )
+        default="default_value",
+        description="String parameter with default value",
+    )
     param2 = law.CSVParameter(
-                default=("*",), # parsed similar to wildcard imports
-                description="Parameter accepting comma-seperated values."
-                    "Default is a tuple with all knows values connected to this attribute."
-            ) 
+        default=("*",), # parsed similar to wildcard imports
+        description="Parameter accepting comma-seperated values."
+            "Default is a tuple with all knows values connected to this attribute."
+    )
+    param_bool = luigi.BoolParameter(
+        default=False,
+        description="Boolean parameter with default value `False`",
+    )
 
     # container for upstream requirements for convenience
     reqs = Requirements()
 
     def requires(self):
-        # Branch Dependencies on other tasks
-        return [SomeOtherTask]
+        # If needed get the requirements from super class (usually defined as dict)
+        req = super().requires()
+        # Add branch Dependencies on other tasks
+        req["SomeOtherTask"] = SomeOtherTask
+
+        return req
 
     def workflow_requires(self):
         # Workflow dependencies on other workflows needed if pilot is not activated
         reqs = super().workflow_requires()
         if not self.pilot:
             reqs["NewRequirement"] = SomeOtherTask
 
-        return
+        return reqs
 
     def output(self):
         # Path to the output file if needed
@@ -60,9 +73,9 @@ class MyTask(AnalysisTask):
         pass
 ```
 
-### The "super" class:
+### The "super" class
 Every task is defined by a Python class, which inherits from a task base or another fully functional task.
-The most basic base in the framework is {py:class}`~columnflow.tasks.framework.base.Task`.
+The most basic base in the framework is {py:class}`~columnflow.tasks.framework.base.BaseTask`.
 However, for tasks implemented in an analysis, it is highly recommended to inherit from one of the following derived subclasses:
 - {py:class}`~columnflow.tasks.framework.base.AnalysisTask`
 - {py:class}`~columnflow.tasks.framework.base.ConfigTask`
@@ -80,23 +93,23 @@ For example, if you have a task that needs a calibrator or depends on a task tha
 After making the choice of the superclass(es), the class is implemented at its core by the following methods (refer to the [luigi task documentation](https://luigi.readthedocs.io/en/stable/tasks.html) for more information on the methods):
 
 - {external+law:py:meth}`~law.task.base.BaseTask.run()`: This method contains the logic for executing the task.
-- `requires()`: This method returns a collection of tasks that this task depends on.
-- `output()`: This method returns a collection of output targets for the task.
-
-### The sandbox:
+- {external+law:py:meth}`~law.tasks.ForestMerge.requires()`: This method returns a collection of tasks that this task depends on.
+- {external+law:py:meth}`~law.task.base.WrapperTask.output()`: This method returns a collection of output targets for the task.
 
-#TODO cross reference to Bogdan's documentation
+### The sandbox
 
-Should your task require a special software enviroment (specific version of python, special packages, or other specific software), it is recommended to define a custom sandbox for this task.
+Should your task require a special software enviroment (special packages, or other specific software), it is recommended to define a custom sandbox for this task.
 For the task to use the sandbox, it is to be specified using the class attribute ```sandbox```, which is inherited from {external+law:py:class}`~law.sandbox.base.Sandbox`.
 The sandbox can be defined in the task class itself using the syntax shown above, elsewise the default sandbox defined in the analysis config file is used.
 
+More details on using and setting up the sandboxed are givin in the {doc}`sandbox section <sandbox>`.
 
-### The parameters:
+
+### The parameters
 
 Like the arguments of a function, the parameters of a task are the variables that are passed to the task when it is run from the command line (``law run MyTask --param1 foo``), making the task very flexible.
 The parameters are defined as class attributes and are identified as such by the class method `get_params()`.
-The parameters returned by this method are parsed from the command line by the class method `get_param_values(params, args, kwargs)`.
+The parameters returned by this method are parsed from the command line by the class method `get_param_values(params, *args, **kwargs)`.
 If a parameter is not provided in the command line, the default value is used.
 Finally, the parameters are initialized in the `__init__()` method of the task with the task object itself and can be accessed across all methods of its instance.
 
@@ -107,41 +120,45 @@ If needed, one can also define custom parameters.
 However, it is important to inherit the parameter class from the package used for the task (luigi or law).
 Otherwise, the parameter will not work correctly.
 
-### The requirements:
+### The requirements
 
-#TODO explain branches if not explained in law
+<!-- #TODO explain branches if not explained in law -->
 
-As noticed in the [code snipped](#example-task-class), a task can define three different requirement keywords, which have different functionality in the framework.
+As noticed in the {ref}`code snipped <example_task_class_section>`, a task can define three different requirement keywords, which have different functionality in the framework.
 
-The ```requires()``` methode in a task, which is inherited from {external+law:py:class}`law.task.base.Task`, defines all required outputs for a task to run on a branch level.
+The ```requires()``` method in a task, which is inherited from {external+law:py:class}`law.task.base.Task`, defines all required outputs for a task to run on a branch level.
 If an output does not exist, the task responsible for this output will be scheduled and executed before the desired task is run.
 
-The {external+law:py:meth}`~law.tasks.ForestMerge.workflow_requires()` is analog to the former methode, however it defines the dependencies on a workflow level, which can be very useful, when working remotely and running jobs on remote clusters.
+The {external+law:py:meth}`~law.tasks.ForestMerge.workflow_requires()` is analog to the former method.
+However, it defines the dependencies on a workflow level, which can be very useful when working remotely and running jobs on remote clusters.
 Every task defines a workflow, which contains all available branches of this task.
 When executing the workflow, the task will run in all (specified) branches.
 If the desired workflow requires the output of another workflow, the latter will be executed beforehand, which means that all needed branches (with no existing output) from the required workflow will be executed before any of the branches from the current workflow runs (Workflow runs vertically).
 This behaviour can be modified, by deactivating the ```pilot``` parameter.
 In this case, the workflow first run through the branches of the executed task after each other, where all dependencies of each branch are executed when needed by the branch (workflow propagates horizontally).
 This is very useful, when parallelising jobs on remote clusters, since multiple branches can run simultaneously, without the need to run all branches from one task beforehand.
 
-Last, the class attribute ```reqs```, that is a {py:class}`~columnflow.tasks.framework.base.Requirements` object, does not directly have to do with the other two requirements, but is rather for analyses specific configuration.
-Strictly speaking, it is not needed for the task to run, however it offers a container for the requirements, which becomes very convenient, when for example a required task should be replaced by a customized subclass of the same task.
+Finally, the class attribute ```reqs```, that is a {py:class}`~columnflow.tasks.framework.base.Requirements` object, does not directly have to do with the other two requirements, but is rather for analysis-specific configuration.
+Strictly speaking, it is not needed for the task to run.
+However it offers a container for the requirements, which becomes very convenient when for example a required task should be replaced by a customized subclass of the same task.
 Instead of having to edit the requirements in all the class methods that depends on this requirement, the ```reqs``` attribute can be simply modified with:
 ```python
 MyTask.reqs.RequiredTask = ModifiedRequiredTask
 ```
-Therefore, using the class attribute ```reqs```, when working with the requirements within the class is conveninet.
+Therefore, using the class attribute ```reqs```, when working with the requirements within the class is convenient.
 
 
-![req plot](cf_plot.svg "image")
+:::{figure} ../plots/user_guide/cf_plot.pdf
+:width: 100%
+:::
 
 ### RunOnceTask
 
 The {external+law:py:class}`~law.tasks.RunOnceTask` is a special task that is used to run a task only once, which can be very useful in many situations.
 For example, when a task is required by multiple tasks or multiple branches of a workflow, but the task itself should only be executed once independent of the branches, or a task does not produce any file output but is needed to be marked as done (e.g. a task only produces text output on the CLI).
 
 This is implemented by (additionally) inheriting from {external+law:py:class}`~law.tasks.RunOnceTask`.
-Moreover, to mark the task as complete, the {external=law:py:meth}`~law.tasks.RunOnceTask.mark_complete()` method is to be added at the end of the ```run()``` method after the logic of the task is executed, or the complete```run()``` method is either to be wrapped by the {external=law:py:meth}`~law.tasks.RunOnceTask.complete_on_success` decorator.
+Moreover, to mark the task as complete, the {external+law:py:meth}`~law.tasks.RunOnceTask.mark_complete()` method is to be added at the end of the ```run()``` method after the logic of the task is executed, or the complete ```run()``` method is either to be wrapped by the {external+law:py:meth}`~law.tasks.RunOnceTask.complete_on_success` decorator.
 One must be careful with the latter, since the decorator will mark the task as complete even if it fails.
 Therefore, it is recommended to also add a check at the end and raise an error in case the task fails.
 
@@ -166,38 +183,40 @@ class MyPrintingTask(AnalysisTask, law.tasks.RunOnceTask):
 
 ### Add the task to the analysis
 
-After creating a new task, make sure to add the new task to the law config file in your analysis folder, in order to run the task.
-The new task has to be specified under the ```[outputs]``` section as follows:
+After creating a new task, the law config file in your analysis folder may need to be adjusted, in order to run the task.
+If not already there, the folder of the new task has not been added to the ```[modules]``` section of the law config file (using python dot path notation).
+If the task produces any output, the new task has to be specified under the ```[outputs]``` section together with the saving location as follows:
 
 ```ini
+[modules]
+
+path.to.the.tasks.folder
+
 [outputs]
 
 lfn_sources: wlcg_fs_desy_store
 
-cf.MyNewTask: wlcg
+hbt.MyNewTask: wlcg
 ...
 ```
 ### Executing the Task
 
-after sourcing the law environment, the task can be executed using the command:
+After new sourcing the law environment or by running `law index` in a setup enviroment, the task can be executed using the command:
 ```bash
 law run cf.MyTask --param1 foo
 ```
 The parameters can be given using the syntax ```--param1 foo```, where ```param1``` is the name of the parameter and ```foo``` is the value of the parameter.
 If the parameter's name include an underscore, the underscore is replaced by a hyphen (```--param_1 foo``` -> ```--param-1 foo```).
-If the parameter is a boolean, the parameter can be set to ```True``` by only typing ```--param-boolean```.
+If the parameter is a boolean, the parameter can be set to ```True``` by only typing ```--param-bool```.
 If the parameter is a list, the values can be given as a comma-separated list (```--param2 foo,bar,baz```).
 
-## Custom law.cfg file for users without grid certificate (CMS specific?)
+## Custom law.cfg file for users without grid certificate (CMS specific)
 
 Users, who do not have a grid certificate, are not able to run jobs on the grid or store and fetch data from it.
 However, they can still run jobs locally.
-For this purpose, a custom law config file should be created in the analysis folder, which specifies a local dcache and the path to the nanoAoDs for every required campaing, and set the output locations for the tasks to local.
-It is recommended to create a copy of the default law config file and modify it as needed and then redirect the setup to the custom config file by setting the ```LAW_CONFIG_FILE``` environment variable in the `setup.sh` file to the path of the custom config file as follows:
-
-```bash
-export LAW_CONFIG_FILE="${LAW_CONFIG_FILE:-${ANALYSIS_DIR}/law.nocert.cfg}"
-```
+For this purpose, a custom law config file should be created in the analysis folder, which specifies a local dcache and the path to the nanoAODs for every required campaign, and set the output locations for the tasks to local.
+It is recommended to apply modifications on the config file in a new custom law config file and not directly in the default one.
+The custom config should inherit from the default config (as shown below) and can be modify as needed by adding or overloading fields.
 
 The costum config file should contain the following sections:
 ```ini
@@ -207,7 +226,7 @@ inherit: $HBT_BASE/law.cfg
 
 [local_dcache]
 # define path to the local dcache used later for the lfn (Local File Name) sources 
-base: path/to/local/dcache
+base: path/to/different/file/system
 
 [local_campaigns_name]
 base: path/to/local/campaigns/nanoAOD/directory
@@ -233,7 +252,7 @@ selection_modules: analysis.selection.{python_module_name}
 
 [outputs]
 
-lfn_sources: local_desy_dcache
+lfn_sources: local_dcache
 
 # output locations per task family
 # for local targets : "local[, STORE_PATH]"
@@ -245,4 +264,11 @@ cf.Task3: /shared/path/to/store/output
 
 ```
 
+It is important to redirect the setup to the custom config file by setting the ```LAW_CONFIG_FILE``` environment variable in the `setup.sh` file to the path of the custom config file as follows:
+
+```bash
+export LAW_CONFIG_FILE="${LAW_CONFIG_FILE:-${ANALYSIS_DIR}/law.nocert.cfg}"
+```
+
+For more details, refer to the [law config documentation](https://law.readthedocs.io/en/latest/config.html) for more information on the law config file.
 A template for the custom [config file](https://raw.githubusercontent.com/uhh-cms/hh2bbtautau/dev/law.nocert.cfg) can be found under the dev branch in the repository of the [hh2bbtautau analysis](https://github.com/uhh-cms/hh2bbtautau).