Skip to content

Commit

Permalink
Initial HTCondor-CE 24 docs
Browse files Browse the repository at this point in the history
  • Loading branch information
timtheisen committed Oct 31, 2024
1 parent 8874789 commit b4ec449
Show file tree
Hide file tree
Showing 21 changed files with 254 additions and 709 deletions.
6 changes: 3 additions & 3 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ owners that want to start contributing to a computing grid with minimal effort.
![HTCondor-CE-Bosco](img/bosco.png)

If your site intends to run over 10,000 concurrent pilot jobs, you will need to host your own
[HTCondor-CE](v23/installation/htcondor-ce.md) because the Hosted CE has not yet been optimized for such loads.
[HTCondor-CE](v24/installation/htcondor-ce.md) because the Hosted CE has not yet been optimized for such loads.

How the CE is Customized
------------------------
Expand All @@ -63,11 +63,11 @@ Aside from the [basic configuration] required in the CE installation, there are
you decide any customization is required at all):

- **Deciding which Virtual Organizations (VOs) are allowed to run at your site:** HTCondor-CE leverages HTCondor's
built-in ability to [authenticate incoming jobs](v23/configuration/authentication.md) based on their OAuth
built-in ability to [authenticate incoming jobs](v24/configuration/authentication.md) based on their OAuth
token credentials.
- **How to filter and transform the pilot jobs to be run on your batch system:** Filtering and transforming pilot jobs
(i.e., setting site-specific attributes or resource limits), requires configuration of your site’s job routes.
For examples of common job routes, consult the [job router configuration](v23/configuration/job-router-overview.md)
For examples of common job routes, consult the [job router configuration](v24/configuration/job-router-overview.md)
pages.

How Security Works
Expand Down
4 changes: 2 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,15 +40,15 @@ Benefits of running the HTCondor-CE:

- **Scalability:** HTCondor-CE is capable of supporting ~16k concurrent RARs
- **Debugging tools:** HTCondor-CE offers
[many tools to help troubleshoot](v23/troubleshooting/debugging-tools.md) issues with RARs
[many tools to help troubleshoot](v24/troubleshooting/debugging-tools.md) issues with RARs
- **Routing as configuration:** HTCondor-CE’s mechanism to transform and submit RARs is customized via configuration
variables, which means that customizations will persist across upgrades and will not involve modification of
software internals to route jobs

Getting HTCondor-CE
-------------------

Learn how to get and install HTCondor-CE through our [documentation](v23/installation/htcondor-ce.md).
Learn how to get and install HTCondor-CE through our [documentation](v24/installation/htcondor-ce.md).

Contact Us
----------
Expand Down
4 changes: 0 additions & 4 deletions docs/v23/releases.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,6 @@ Known bugs affecting HTCondor-CEs can be found in
Updating to HTCondor-CE 23
--------------------------

!!! note "Updating from HTCondor-CE < 6"
If updating to HTCondor-CE 23 from HTCondor-CE < 6, be sure to also consult the HTCondor-CE 6
[upgrade instructions](../v6/releases.md#600).

!!! tip "Finding relevant configuration changes"
When updating HTCondor-CE RPMs, `.rpmnew` and `.rpmsave` files may be created containing new defaults that you
should merge or new defaults that have replaced your customzations, respectively.
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,6 @@ In this example, we set the routed job on hold if the job is idle and has been s
tried to start more than once.
This will catch jobs which are starting and stopping multiple times.

=== "ClassAd Transform"

```hl_lines="5 8"
JOB_ROUTER_ROUTE_Condor_Pool @=jrt
UNIVERSE VANILLA
Expand All @@ -37,39 +35,18 @@ This will catch jobs which are starting and stopping multiple times.

JOB_ROUTER_ROUTE_NAMES = Condor_Pool
```
=== "Deprecated Syntax"

```hl_lines="7 10"
JOB_ROUTER_ENTRIES @=jre
[
TargetUniverse = 5;
name = "Condor_Pool";
# Puts the routed job on hold if the job's been idle and has been started at least
# once or if the job has tried to start more than once
set_PeriodicHold = (NumJobStarts >= 1 && JobStatus == 1) || NumJobStarts > 1;
# Release routed jobs if the condor_starter couldn't start the executable and
# 'VMGAHP_ERR_INTERNAL' is in the HoldReason
set_PeriodicRelease = HoldReasonCode == 6 && regexp("VMGAHP_ERR_INTERNAL", HoldReason);
]
@jre

JOB_ROUTER_ROUTE_NAMES = Condor_Pool
```

Setting routed job requirements
-------------------------------

If you need to set requirements on your routed job, you will need to use `SET REQUIREMENTS` or `set_Requirements`
instead of `Requirements` for ClassAd transform and deprecated syntaxes, respectively.
If you need to set requirements on your routed job, you will need to use `SET REQUIREMENTS`
instead of `Requirements`.
The `Requirements` attribute filters jobs coming into your CE into different job routes whereas the set function will
set conditions on the routed job that must be met by the worker node it lands on.
For more information on requirements, consult the
[HTCondor manual](https://htcondor.readthedocs.io/en/lts/users-manual/submitting-a-job.html#about-requirements-and-rank).

To ensure that your job lands on a Linux machine in your pool:

=== "ClassAd Transform"

```hl_lines="3"
JOB_ROUTER_ROUTE_Condor_Pool @jrt
UNIVERSE VANILLA
Expand All @@ -79,41 +56,17 @@ To ensure that your job lands on a Linux machine in your pool:
JOB_ROUTER_ROUTE_NAMES = Condor_Pool
```

=== "Deprecated Syntax"

```hl_lines="5"
JOB_ROUTER_ENTRIES @=jre
[
TargetUniverse = 5;
name = "Condor_Pool";
set_Requirements = (TARGET.OpSys == "LINUX");
]
@jre

JOB_ROUTER_ROUTE_NAMES = Condor_Pool
```

### Preserving original job requirements ###

To preserve and include the original job requirements, rather than just setting new requirements, you can use `COPY
Requirements` or `copy_Requirements` to store the current value of `Requirements` to another variable, which we'll call
`original_requirements`.
To do this, replace the above `SET Requirements` or `set_Requirements` lines with:

=== "ClassAd Transform"

```
SET Requirements = ($(MY.Requirements)) && (<YOUR REQUIREMENTS EXPRESSION>)
```

=== "Deprecated Syntax"

```
copy_Requirements = "original_requirements";
set_Requirements = original_requirements && ...;
```


### Setting the accounting group based on the credential of the submitted job ###

A common need in the CE is to want to set the accounting identity of the routed job using information from the credential
Expand All @@ -129,8 +82,6 @@ Because of this, the default CE config will copy all attributes that match `Auth

Example of setting the accounting group from AuthToken or x509 attributes.

=== "ClassAd Transform"

```
JOB_ROUTER_CLASSAD_USER_MAP_NAMES = $(JOB_ROUTER_CLASSAD_USER_MAP_NAMES) AcctGroupMap
CLASSAD_USER_MAPFILE_AcctGroupMap = <path-to-mapfile>
Expand Down
102 changes: 102 additions & 0 deletions docs/v24/configuration/job-router-overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
Job Router Configuration Overview
=================================

The [HTCondor Job Router](https://htcondor.readthedocs.io/en/lts/grid-computing/job-router.html) is at the heart of
HTCondor-CE and allows admins to transform and direct jobs to specific batch systems.
Customizations are made in the form of job routes where each route corresponds to a separate job transformation:
If an incoming job matches a job route's requirements, the route creates a transformed job (referred to as the 'routed
job') that is then submitted to the batch system.
The CE package comes with default routes located in `/etc/condor-ce/config.d/02-ce-*.conf` that provide enough basic
functionality for a small site.

If you have needs beyond delegating all incoming jobs to your batch system as they are, this document provides an
overview of how to configure your HTCondor-CE Job Router

!!! note "Definitions"
- **Incoming Job**: A job which was submitted to HTCondor-CE from an external source.
- **Routed Job**: A job that has been transformed by the Job Router.

Route Syntaxes
--------------

HTCondor-CE 5 introduced the ability to write job routes using [ClassAd transform syntax](#classad-transforms).

### ClassAd transforms ###

The HTCondor [ClassAd transforms](https://htcondor.readthedocs.io/en/lts/classads/transforms.html) were
originally introduced to HTCondor to perform in-place transformations of user jobs submitted to an HTCondor pool.
In the HTCondor 8.9 series, the Job Router was updated to support transforms and HTCondor-CE 5 adds the configuration
necessary to support routes written as ClassAd transforms.
If configured to use transform-based routes, HTCondor-CE routes and transforms jobs that by chaining ClassAd transforms
in the following order:

1. Each transform in `JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES` whose requirements are met by the job
1. The first transform from `JOB_ROUTER_ROUTE_NAMES` whose requirements are met by the job.
See [the section on route matching](#how-jobs-match-to-routes) below.
1. Each transform in `JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES` whose requirements are met by the job

### Required syntax ###

For existing HTCondor-CEs, it's required that administrators stop using the deprecated syntax and
transition to ClassAd transforms now.

For new HTCondor-CEs, it's required that administrators start with ClassAd transforms.
The [ClassAd transform](#classad-transforms) syntax provides many benefits including:

- Statements being evaluated in [the order they are written](writing-job-routes.md#editing-attributes)
- Use of variables that are not included in the resultant job ad
- Use simple case-like logic

Additionally, it is now easier to include job transformations that should be evaluated before or after your routes by
including transforms in the lists of `JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES` and `JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES`,
respectively.

### Converting to ClassAd transforms ###

For existing HTCondor-CE's utilizing the deprecated syntax can do the following steps to convert to using the ClassAd
transform syntax:

1. Output the current configuration by running the following:

condor_ce_config_val -summary > summary-file

2. Convert the stored configuration by running the following:

condor_transform_ads -convert:file summary-file > 90-converted-job-routes.conf

3. Place the `90-converted-job-routes.conf` from the previous command into the `/etc/condor-ce/config.d`.

!!! note "Potential need to rename generated config"
The files in `/etc/condor-ce/config.d` are read in lexicographical order.
So if you define your current job router configuration in `/etc/condor-ce/config.d` in a file that is read
later, e.g. `95-local.conf`, you will need to rename your generated config file, e.g. `96-generated-job-routes.conf`.

4. Tweak new job routes as needed. For assistance please reach out to [[email protected]](mailto:[email protected])
6. Restart the HTCondor-CE

!!! note "Not Using Custom Job Routes?"
Conversion of job router syntax from the deprecated syntax to ClassAd transform syntax needs to occur if custom job
routes have been configured.

How Jobs Match to Routes
------------------------

The Job Router considers incoming jobs in the HTCondor-CE SchedD (i.e., jobs visible in
[condor_ce_q](../troubleshooting/debugging-tools.md#condor_ce_q)) that meet the following constraints:

- The job has not already been considered by the Job Router
- The job's universe is standard or vanilla

If the incoming job meets the above constraints, then the job is matched to the first route in `JOB_ROUTER_ROUTE_NAMES`
whose requirements are satisfied by the job's ClassAd.
Additionally:

- Transforms in
`JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES` and `JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES` may also have their own
requirements that determine whether or not that transform is applied.

Getting Help
------------

If you have any questions or issues with configuring job routes, please [contact us](../../index.md#contact-us) for
assistance.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,7 @@ This page contains information about job routes that can be used if you are runn
Setting a default batch queue
-----------------------------

To set a default queue for routed jobs, set the variable or attribute `default_queue` for the ClassAd
transform and deprecated syntax, respectively:

=== "ClassAd Transform"
To set a default queue for routed jobs, set the variable `default_queue`:

```hl_lines="3"
JOB_ROUTER_ROUTE_Slurm_Cluster @=jrt
Expand All @@ -20,20 +17,6 @@ transform and deprecated syntax, respectively:
JOB_ROUTER_ROUTE_NAMES = Slurm_Cluster
```

=== "Deprecated Syntax"

```hl_lines="5"
JOB_ROUTER_ENTRIES @=jre
[
GridResource = "batch slurm";
name = "Slurm_Cluster";
set_default_queue = "osg_queue";
]
@jre

JOB_ROUTER_ROUTE_NAMES = Slurm_Cluster
```

Setting batch system directives
-------------------------------

Expand All @@ -45,29 +28,17 @@ submit script.
ClassAd attributes can be passed from the routed job to the local submit attributes script via
`default_CERequirements` attribute, which takes a comma-separated list of other attributes:

=== "ClassAd Transform"

```
SET foo = "X"
SET bar = "Y"
SET default_CERequirements = "foo,bar"
```

=== "Deprecated Syntax"

```
set_foo = "X";
set_bar = "Y";
set_default_CERequirements = "foo,bar";
```

This sets `foo` to the string `X` and `bar` to the string `Y` in the environment of the local submit attributes script.

The following example sets the maximum walltime to 1 hour and the accounting group to the `x509UserProxyFirstFQAN`
attribute of the job submitted to a PBS batch system:

=== "ClassAd Transform"

```hl_lines="4 5 6"
JOB_ROUTER_ROUTE_Slurm_Cluster @=jrt
GridResource = "batch slurm"
Expand All @@ -79,21 +50,6 @@ attribute of the job submitted to a PBS batch system:
JOB_ROUTER_ROUTE_NAMES = Slurm_Cluster
```

=== "Deprecated Syntax"

```hl_lines="5 6 7"
JOB_ROUTER_ENTRIES @=jre [
GridResource = "batch slurm";
name = "Slurm_Cluster";
set_Walltime = 3600;
set_AccountingGroup = x509UserProxyFirstFQAN;
set_default_CERequirements = "WallTime,AccountingGroup";
]
@jre

JOB_ROUTER_ROUTE_NAMES = Slurm_Cluster
```

With `/etc/blahp/pbs_local_submit_attributes.sh` containing:

```
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ Inserting IDTOKENs into the routed job's sandbox
If you want to insert IDTOKENS into the routed job's sandbox you can use the `SendIDTokens` route command, or
the `JOB_ROUTER_SEND_ROUTE_IDTOKENS` global configuration variable. Tokens
sent using this mechanism must be named and declared using the `JOB_ROUTER_CREATE_IDTOKEN_NAMES`
and [`JOB_ROUTER_CREATE_IDTOKEN_<name>`](https://htcondor.readthedocs.io/en/latest/admin-manual/configuration-macros.html#JOB_ROUTER_CREATE_IDTOKEN_%3CNAME%3E) configuration variables. Tokens whose names are declared in
and [`JOB_ROUTER_CREATE_IDTOKEN_<name>`](https://htcondor.readthedocs.io/en/lts/admin-manual/configuration-macros.html#JOB_ROUTER_CREATE_IDTOKEN_%3CNAME%3E) configuration variables. Tokens whose names are declared in
the `JOB_ROUTER_SEND_ROUTE_IDTOKENS` configuration variable are sent by default for each route that does
not have a `SendIDTokens` command.

Expand Down
Loading

0 comments on commit b4ec449

Please sign in to comment.