From b4ec449cc6e6a07127feea46047d387351f6ceb5 Mon Sep 17 00:00:00 2001 From: Tim Theisen Date: Thu, 31 Oct 2024 09:28:58 -0500 Subject: [PATCH] Initial HTCondor-CE 24 docs --- docs/architecture.md | 6 +- docs/index.md | 4 +- docs/v23/releases.md | 4 - .../configuration/authentication.md | 0 .../configuration/htcondor-routes.md | 53 +- docs/v24/configuration/job-router-overview.md | 102 ++++ .../configuration/local-batch-system.md | 0 .../configuration/non-htcondor-routes.md | 46 +- .../configuration/optional-configuration.md | 2 +- .../configuration/writing-job-routes.md | 461 +----------------- .../installation/central-collector.md | 2 +- docs/{v6 => v24}/installation/htcondor-ce.md | 8 +- docs/{v6 => v24}/operation.md | 0 docs/{v6 => v24}/reference.md | 0 docs/{v6 => v24}/releases.md | 40 +- docs/{v6 => v24}/remote-job-submission.md | 0 .../troubleshooting/common-issues.md | 56 ++- .../troubleshooting/debugging-tools.md | 59 ++- docs/{v6 => v24}/troubleshooting/logs.md | 2 +- .../troubleshooting/remote-troubleshooting.md | 0 docs/v6/configuration/job-router-overview.md | 118 ----- 21 files changed, 254 insertions(+), 709 deletions(-) rename docs/{v6 => v24}/configuration/authentication.md (100%) rename docs/{v6 => v24}/configuration/htcondor-routes.md (79%) create mode 100644 docs/v24/configuration/job-router-overview.md rename docs/{v6 => v24}/configuration/local-batch-system.md (100%) rename docs/{v6 => v24}/configuration/non-htcondor-routes.md (70%) rename docs/{v6 => v24}/configuration/optional-configuration.md (98%) rename docs/{v6 => v24}/configuration/writing-job-routes.md (56%) rename docs/{v6 => v24}/installation/central-collector.md (99%) rename docs/{v6 => v24}/installation/htcondor-ce.md (95%) rename docs/{v6 => v24}/operation.md (100%) rename docs/{v6 => v24}/reference.md (100%) rename docs/{v6 => v24}/releases.md (52%) rename docs/{v6 => v24}/remote-job-submission.md (100%) rename docs/{v6 => v24}/troubleshooting/common-issues.md (91%) rename docs/{v6 => v24}/troubleshooting/debugging-tools.md (89%) rename docs/{v6 => v24}/troubleshooting/logs.md (98%) rename docs/{v6 => v24}/troubleshooting/remote-troubleshooting.md (100%) delete mode 100644 docs/v6/configuration/job-router-overview.md diff --git a/docs/architecture.md b/docs/architecture.md index 28067ff0f..3e721da97 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -54,7 +54,7 @@ owners that want to start contributing to a computing grid with minimal effort. ![HTCondor-CE-Bosco](img/bosco.png) If your site intends to run over 10,000 concurrent pilot jobs, you will need to host your own -[HTCondor-CE](v23/installation/htcondor-ce.md) because the Hosted CE has not yet been optimized for such loads. +[HTCondor-CE](v24/installation/htcondor-ce.md) because the Hosted CE has not yet been optimized for such loads. How the CE is Customized ------------------------ @@ -63,11 +63,11 @@ Aside from the [basic configuration] required in the CE installation, there are you decide any customization is required at all): - **Deciding which Virtual Organizations (VOs) are allowed to run at your site:** HTCondor-CE leverages HTCondor's - built-in ability to [authenticate incoming jobs](v23/configuration/authentication.md) based on their OAuth + built-in ability to [authenticate incoming jobs](v24/configuration/authentication.md) based on their OAuth token credentials. - **How to filter and transform the pilot jobs to be run on your batch system:** Filtering and transforming pilot jobs (i.e., setting site-specific attributes or resource limits), requires configuration of your site’s job routes. - For examples of common job routes, consult the [job router configuration](v23/configuration/job-router-overview.md) + For examples of common job routes, consult the [job router configuration](v24/configuration/job-router-overview.md) pages. How Security Works diff --git a/docs/index.md b/docs/index.md index f66d4d3d7..f775fe43d 100644 --- a/docs/index.md +++ b/docs/index.md @@ -40,7 +40,7 @@ Benefits of running the HTCondor-CE: - **Scalability:** HTCondor-CE is capable of supporting ~16k concurrent RARs - **Debugging tools:** HTCondor-CE offers - [many tools to help troubleshoot](v23/troubleshooting/debugging-tools.md) issues with RARs + [many tools to help troubleshoot](v24/troubleshooting/debugging-tools.md) issues with RARs - **Routing as configuration:** HTCondor-CE’s mechanism to transform and submit RARs is customized via configuration variables, which means that customizations will persist across upgrades and will not involve modification of software internals to route jobs @@ -48,7 +48,7 @@ Benefits of running the HTCondor-CE: Getting HTCondor-CE ------------------- -Learn how to get and install HTCondor-CE through our [documentation](v23/installation/htcondor-ce.md). +Learn how to get and install HTCondor-CE through our [documentation](v24/installation/htcondor-ce.md). Contact Us ---------- diff --git a/docs/v23/releases.md b/docs/v23/releases.md index af66eabea..08944d043 100644 --- a/docs/v23/releases.md +++ b/docs/v23/releases.md @@ -16,10 +16,6 @@ Known bugs affecting HTCondor-CEs can be found in Updating to HTCondor-CE 23 -------------------------- -!!! note "Updating from HTCondor-CE < 6" - If updating to HTCondor-CE 23 from HTCondor-CE < 6, be sure to also consult the HTCondor-CE 6 - [upgrade instructions](../v6/releases.md#600). - !!! tip "Finding relevant configuration changes" When updating HTCondor-CE RPMs, `.rpmnew` and `.rpmsave` files may be created containing new defaults that you should merge or new defaults that have replaced your customzations, respectively. diff --git a/docs/v6/configuration/authentication.md b/docs/v24/configuration/authentication.md similarity index 100% rename from docs/v6/configuration/authentication.md rename to docs/v24/configuration/authentication.md diff --git a/docs/v6/configuration/htcondor-routes.md b/docs/v24/configuration/htcondor-routes.md similarity index 79% rename from docs/v6/configuration/htcondor-routes.md rename to docs/v24/configuration/htcondor-routes.md index 5d9c56a71..8fcbd277a 100644 --- a/docs/v6/configuration/htcondor-routes.md +++ b/docs/v24/configuration/htcondor-routes.md @@ -22,8 +22,6 @@ In this example, we set the routed job on hold if the job is idle and has been s tried to start more than once. This will catch jobs which are starting and stopping multiple times. -=== "ClassAd Transform" - ```hl_lines="5 8" JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA @@ -37,30 +35,11 @@ This will catch jobs which are starting and stopping multiple times. JOB_ROUTER_ROUTE_NAMES = Condor_Pool ``` -=== "Deprecated Syntax" - - ```hl_lines="7 10" - JOB_ROUTER_ENTRIES @=jre - [ - TargetUniverse = 5; - name = "Condor_Pool"; - # Puts the routed job on hold if the job's been idle and has been started at least - # once or if the job has tried to start more than once - set_PeriodicHold = (NumJobStarts >= 1 && JobStatus == 1) || NumJobStarts > 1; - # Release routed jobs if the condor_starter couldn't start the executable and - # 'VMGAHP_ERR_INTERNAL' is in the HoldReason - set_PeriodicRelease = HoldReasonCode == 6 && regexp("VMGAHP_ERR_INTERNAL", HoldReason); - ] - @jre - - JOB_ROUTER_ROUTE_NAMES = Condor_Pool - ``` - Setting routed job requirements ------------------------------- -If you need to set requirements on your routed job, you will need to use `SET REQUIREMENTS` or `set_Requirements` -instead of `Requirements` for ClassAd transform and deprecated syntaxes, respectively. +If you need to set requirements on your routed job, you will need to use `SET REQUIREMENTS` +instead of `Requirements`. The `Requirements` attribute filters jobs coming into your CE into different job routes whereas the set function will set conditions on the routed job that must be met by the worker node it lands on. For more information on requirements, consult the @@ -68,8 +47,6 @@ For more information on requirements, consult the To ensure that your job lands on a Linux machine in your pool: -=== "ClassAd Transform" - ```hl_lines="3" JOB_ROUTER_ROUTE_Condor_Pool @jrt UNIVERSE VANILLA @@ -79,20 +56,6 @@ To ensure that your job lands on a Linux machine in your pool: JOB_ROUTER_ROUTE_NAMES = Condor_Pool ``` -=== "Deprecated Syntax" - - ```hl_lines="5" - JOB_ROUTER_ENTRIES @=jre - [ - TargetUniverse = 5; - name = "Condor_Pool"; - set_Requirements = (TARGET.OpSys == "LINUX"); - ] - @jre - - JOB_ROUTER_ROUTE_NAMES = Condor_Pool - ``` - ### Preserving original job requirements ### To preserve and include the original job requirements, rather than just setting new requirements, you can use `COPY @@ -100,20 +63,10 @@ Requirements` or `copy_Requirements` to store the current value of `Requirements `original_requirements`. To do this, replace the above `SET Requirements` or `set_Requirements` lines with: -=== "ClassAd Transform" - ``` SET Requirements = ($(MY.Requirements)) && () ``` -=== "Deprecated Syntax" - - ``` - copy_Requirements = "original_requirements"; - set_Requirements = original_requirements && ...; - ``` - - ### Setting the accounting group based on the credential of the submitted job ### A common need in the CE is to want to set the accounting identity of the routed job using information from the credential @@ -129,8 +82,6 @@ Because of this, the default CE config will copy all attributes that match `Auth Example of setting the accounting group from AuthToken or x509 attributes. -=== "ClassAd Transform" - ``` JOB_ROUTER_CLASSAD_USER_MAP_NAMES = $(JOB_ROUTER_CLASSAD_USER_MAP_NAMES) AcctGroupMap CLASSAD_USER_MAPFILE_AcctGroupMap = diff --git a/docs/v24/configuration/job-router-overview.md b/docs/v24/configuration/job-router-overview.md new file mode 100644 index 000000000..f19bc2aa3 --- /dev/null +++ b/docs/v24/configuration/job-router-overview.md @@ -0,0 +1,102 @@ +Job Router Configuration Overview +================================= + +The [HTCondor Job Router](https://htcondor.readthedocs.io/en/lts/grid-computing/job-router.html) is at the heart of +HTCondor-CE and allows admins to transform and direct jobs to specific batch systems. +Customizations are made in the form of job routes where each route corresponds to a separate job transformation: +If an incoming job matches a job route's requirements, the route creates a transformed job (referred to as the 'routed +job') that is then submitted to the batch system. +The CE package comes with default routes located in `/etc/condor-ce/config.d/02-ce-*.conf` that provide enough basic +functionality for a small site. + +If you have needs beyond delegating all incoming jobs to your batch system as they are, this document provides an +overview of how to configure your HTCondor-CE Job Router + +!!! note "Definitions" + - **Incoming Job**: A job which was submitted to HTCondor-CE from an external source. + - **Routed Job**: A job that has been transformed by the Job Router. + +Route Syntaxes +-------------- + +HTCondor-CE 5 introduced the ability to write job routes using [ClassAd transform syntax](#classad-transforms). + +### ClassAd transforms ### + +The HTCondor [ClassAd transforms](https://htcondor.readthedocs.io/en/lts/classads/transforms.html) were +originally introduced to HTCondor to perform in-place transformations of user jobs submitted to an HTCondor pool. +In the HTCondor 8.9 series, the Job Router was updated to support transforms and HTCondor-CE 5 adds the configuration +necessary to support routes written as ClassAd transforms. +If configured to use transform-based routes, HTCondor-CE routes and transforms jobs that by chaining ClassAd transforms +in the following order: + +1. Each transform in `JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES` whose requirements are met by the job +1. The first transform from `JOB_ROUTER_ROUTE_NAMES` whose requirements are met by the job. + See [the section on route matching](#how-jobs-match-to-routes) below. +1. Each transform in `JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES` whose requirements are met by the job + +### Required syntax ### + +For existing HTCondor-CEs, it's required that administrators stop using the deprecated syntax and +transition to ClassAd transforms now. + +For new HTCondor-CEs, it's required that administrators start with ClassAd transforms. +The [ClassAd transform](#classad-transforms) syntax provides many benefits including: + +- Statements being evaluated in [the order they are written](writing-job-routes.md#editing-attributes) +- Use of variables that are not included in the resultant job ad +- Use simple case-like logic + +Additionally, it is now easier to include job transformations that should be evaluated before or after your routes by +including transforms in the lists of `JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES` and `JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES`, +respectively. + +### Converting to ClassAd transforms ### + +For existing HTCondor-CE's utilizing the deprecated syntax can do the following steps to convert to using the ClassAd +transform syntax: + +1. Output the current configuration by running the following: + + condor_ce_config_val -summary > summary-file + +2. Convert the stored configuration by running the following: + + condor_transform_ads -convert:file summary-file > 90-converted-job-routes.conf + +3. Place the `90-converted-job-routes.conf` from the previous command into the `/etc/condor-ce/config.d`. + + !!! note "Potential need to rename generated config" + The files in `/etc/condor-ce/config.d` are read in lexicographical order. + So if you define your current job router configuration in `/etc/condor-ce/config.d` in a file that is read + later, e.g. `95-local.conf`, you will need to rename your generated config file, e.g. `96-generated-job-routes.conf`. + +4. Tweak new job routes as needed. For assistance please reach out to [htcondor-users@cs.wisc.edu](mailto:htcondor-users@cs.wisc.edu) +6. Restart the HTCondor-CE + +!!! note "Not Using Custom Job Routes?" + Conversion of job router syntax from the deprecated syntax to ClassAd transform syntax needs to occur if custom job + routes have been configured. + +How Jobs Match to Routes +------------------------ + +The Job Router considers incoming jobs in the HTCondor-CE SchedD (i.e., jobs visible in +[condor_ce_q](../troubleshooting/debugging-tools.md#condor_ce_q)) that meet the following constraints: + +- The job has not already been considered by the Job Router +- The job's universe is standard or vanilla + +If the incoming job meets the above constraints, then the job is matched to the first route in `JOB_ROUTER_ROUTE_NAMES` +whose requirements are satisfied by the job's ClassAd. +Additionally: + +- Transforms in + `JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES` and `JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES` may also have their own + requirements that determine whether or not that transform is applied. + +Getting Help +------------ + +If you have any questions or issues with configuring job routes, please [contact us](../../index.md#contact-us) for +assistance. diff --git a/docs/v6/configuration/local-batch-system.md b/docs/v24/configuration/local-batch-system.md similarity index 100% rename from docs/v6/configuration/local-batch-system.md rename to docs/v24/configuration/local-batch-system.md diff --git a/docs/v6/configuration/non-htcondor-routes.md b/docs/v24/configuration/non-htcondor-routes.md similarity index 70% rename from docs/v6/configuration/non-htcondor-routes.md rename to docs/v24/configuration/non-htcondor-routes.md index a1db44399..3f2db8613 100644 --- a/docs/v6/configuration/non-htcondor-routes.md +++ b/docs/v24/configuration/non-htcondor-routes.md @@ -6,10 +6,7 @@ This page contains information about job routes that can be used if you are runn Setting a default batch queue ----------------------------- -To set a default queue for routed jobs, set the variable or attribute `default_queue` for the ClassAd -transform and deprecated syntax, respectively: - -=== "ClassAd Transform" +To set a default queue for routed jobs, set the variable `default_queue`: ```hl_lines="3" JOB_ROUTER_ROUTE_Slurm_Cluster @=jrt @@ -20,20 +17,6 @@ transform and deprecated syntax, respectively: JOB_ROUTER_ROUTE_NAMES = Slurm_Cluster ``` -=== "Deprecated Syntax" - - ```hl_lines="5" - JOB_ROUTER_ENTRIES @=jre - [ - GridResource = "batch slurm"; - name = "Slurm_Cluster"; - set_default_queue = "osg_queue"; - ] - @jre - - JOB_ROUTER_ROUTE_NAMES = Slurm_Cluster - ``` - Setting batch system directives ------------------------------- @@ -45,29 +28,17 @@ submit script. ClassAd attributes can be passed from the routed job to the local submit attributes script via `default_CERequirements` attribute, which takes a comma-separated list of other attributes: -=== "ClassAd Transform" - ``` SET foo = "X" SET bar = "Y" SET default_CERequirements = "foo,bar" ``` -=== "Deprecated Syntax" - - ``` - set_foo = "X"; - set_bar = "Y"; - set_default_CERequirements = "foo,bar"; - ``` - This sets `foo` to the string `X` and `bar` to the string `Y` in the environment of the local submit attributes script. The following example sets the maximum walltime to 1 hour and the accounting group to the `x509UserProxyFirstFQAN` attribute of the job submitted to a PBS batch system: -=== "ClassAd Transform" - ```hl_lines="4 5 6" JOB_ROUTER_ROUTE_Slurm_Cluster @=jrt GridResource = "batch slurm" @@ -79,21 +50,6 @@ attribute of the job submitted to a PBS batch system: JOB_ROUTER_ROUTE_NAMES = Slurm_Cluster ``` -=== "Deprecated Syntax" - - ```hl_lines="5 6 7" - JOB_ROUTER_ENTRIES @=jre [ - GridResource = "batch slurm"; - name = "Slurm_Cluster"; - set_Walltime = 3600; - set_AccountingGroup = x509UserProxyFirstFQAN; - set_default_CERequirements = "WallTime,AccountingGroup"; - ] - @jre - - JOB_ROUTER_ROUTE_NAMES = Slurm_Cluster - ``` - With `/etc/blahp/pbs_local_submit_attributes.sh` containing: ``` diff --git a/docs/v6/configuration/optional-configuration.md b/docs/v24/configuration/optional-configuration.md similarity index 98% rename from docs/v6/configuration/optional-configuration.md rename to docs/v24/configuration/optional-configuration.md index 6178516bf..147176386 100644 --- a/docs/v6/configuration/optional-configuration.md +++ b/docs/v24/configuration/optional-configuration.md @@ -60,7 +60,7 @@ Inserting IDTOKENs into the routed job's sandbox If you want to insert IDTOKENS into the routed job's sandbox you can use the `SendIDTokens` route command, or the `JOB_ROUTER_SEND_ROUTE_IDTOKENS` global configuration variable. Tokens sent using this mechanism must be named and declared using the `JOB_ROUTER_CREATE_IDTOKEN_NAMES` -and [`JOB_ROUTER_CREATE_IDTOKEN_`](https://htcondor.readthedocs.io/en/latest/admin-manual/configuration-macros.html#JOB_ROUTER_CREATE_IDTOKEN_%3CNAME%3E) configuration variables. Tokens whose names are declared in +and [`JOB_ROUTER_CREATE_IDTOKEN_`](https://htcondor.readthedocs.io/en/lts/admin-manual/configuration-macros.html#JOB_ROUTER_CREATE_IDTOKEN_%3CNAME%3E) configuration variables. Tokens whose names are declared in the `JOB_ROUTER_SEND_ROUTE_IDTOKENS` configuration variable are sent by default for each route that does not have a `SendIDTokens` command. diff --git a/docs/v6/configuration/writing-job-routes.md b/docs/v24/configuration/writing-job-routes.md similarity index 56% rename from docs/v6/configuration/writing-job-routes.md rename to docs/v24/configuration/writing-job-routes.md index 6e8bd9c8f..fad63895d 100644 --- a/docs/v6/configuration/writing-job-routes.md +++ b/docs/v24/configuration/writing-job-routes.md @@ -1,78 +1,17 @@ Writing Job Routes ================== -This document contains documentation for HTCondor-CE Job Router configurations with equivalent examples for the -[ClassAd transform](job-router-overview.md#classad-transforms) and -[deprecated](job-router-overview.md#deprecated-syntax) syntaxes. +This document contains documentation for HTCondor-CE Job Router configurations with examples for the +[ClassAd transform](job-router-overview.md#classad-transforms). Configuration from this page should be written to files in `/etc/condor-ce/config.d/`, whose contents are parsed in lexicographic order with subsequent variables overriding earlier ones. -Each example is displayed in code blocks with tabs to switch between the two syntaxes: - -=== "ClassAd Transform" +Each example is displayed in code blocks: ``` This is an example for the ClassAd transform syntax ``` -=== "Deprecated syntax" - - ``` - This is an example for the deprecated syntax - ``` - -Syntax Differences ------------------- - -!!! warning "Planned Removal of Deprecated Syntax" - - `JOB_ROUTER_DEFAULTS`, `JOB_ROUTER_ENTRIES`, `JOB_ROUTER_ENTRIES_CMD`, and `JOB_ROUTER_ENTRIES_FILE` are - deprecated and will be removed for *V24* of the HTCondor Software Suite. New configuration syntax for the job router - is defined using `JOB_ROUTER_ROUTE_NAMES` and `JOB_ROUTER_ROUTE_[name]`. - - For new syntax example vist: - [HTCondor Documentation - Job Router](https://htcondor.readthedocs.io/en/latest/grid-computing/job-router.html#an-example-configuration) - - **Note:** The removal will occur during the lifetime of the HTCondor *V23* feature series. - - -In HTCondor-CE 5, the [deprecated syntax](job-router-overview.md#deprecated-syntax) continues to be the default and -administrator's can move to the [ClassAd transform syntax](job-router-overview.md#classad-transforms) by setting the -following in a file in `/etc/condor-ce/config.d/`: - -``` -JOB_ROUTER_USE_DEPRECATED_ROUTER_ENTRIES = False -``` - -The [ClassAd transform](job-router-overview.md#classad-transforms) syntax provides many benefits including: - -- Statements being evaluated in [the order they are written](#editing-attributes) -- Use of variables that are not included in the resultant job ad -- Use simple case-like logic - -Additionally, it is now easier to include job transformations that should be evaluated before or after your routes by -including transforms in the lists of `JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES` and `JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES`, -respectively. - -For examples of the ClassAd transform syntax, you can inspect default job router transforms packaged with HTCondor-CE -with the following command: - -``` bash -user@host $ condor_ce_config_val -dump JOB_ROUTER_TRANSFORM_ -``` - -### Differences in `MY.` and `TARGET.` ### - -In addition to the above, the behavior of the `MY.` and `TARGET.` ClassAd attribute prefixes has changed between the two -different syntaxes: - -- **In ClassAd transform syntax,** `MY.` always refers to the incoming job's attributes and can be referenced within - `$()`, e.g. `$(MY.Owner)` refers to the mapped user of the incoming job. - `TARGET` is only used in [SET](#setting-attributes) expressions to refer to attributes in the slot ad (HTCondor - pools only). -- **In the deprecated syntax,** `MY.` refers to attributes in the job route and `TARGET.` refers to attributes in the - incoming job ad for [copy\_](#copying-attributes), [delete\_](#removing-attributes), and - [eval\_set\_](#setting-attributes-with-classad-expressions) functions. - However, in expressions defined by [set\_*](#setting-attributes), `MY.` refers to the attributes in the incoming job - ad and `TARGET.` refers to the attribute in the slot ad (HTCondor pools only). - Required Fields --------------- @@ -83,10 +22,8 @@ Default routes can be found in `/usr/share/condor-ce/config.d/02-ce-`) for the ClassAd transform syntax or with the `name` attribute for the deprecated syntax: - -=== "ClassAd Transform" +To identify routes, you will need to assign a name to the route, in the name of the configuration macro +(i.e., `JOB_ROUTER_ROUTE_`): ```hl_lines="1" JOB_ROUTER_ROUTE_Condor_Pool @=jrt @@ -96,18 +33,6 @@ To identify routes, you will need to assign a name to the route, either in the n JOB_ROUTER_ROUTE_NAMES = Condor_Pool ``` -=== "Deprecated syntax" - - ```hl_lines="4" - JOB_ROUTER_ENTRIES @=jre - [ - TargetUniverse = 5; - name = "Condor_Pool"; - ] - - JOB_ROUTER_ROUTE_NAMES = Condor_Pool - ``` - !!! warning "Naming restrictions" - Route names should only contain alphanumeric and `_` characters. - Routes specified by `JOB_ROUTER_ROUTE_*` will override routes with the same name in `JOB_ROUTER_ENTRIES` @@ -129,8 +54,6 @@ For all other batch systems, the `GridResource` attribute needs to be set to `"b (where `` can be one of `pbs`, `slurm`, `lsf`, or `sge`). -=== "ClassAd Transform" - ```hl_lines="2 6 7" JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA @@ -143,37 +66,17 @@ For all other batch systems, the `GridResource` attribute needs to be set to `"b JOB_ROUTER_ROUTE_NAMES = Condor_Pool My_Slurm ``` -=== "Deprecated syntax" - - ```hl_lines="3 7 8" - JOB_ROUTER_ENTRIES @=jre - [ - TargetUniverse = 5; - name = "Condor_Pool"; - ] - [ - GridResource = "batch slurm"; - name = "My_Slurm"; - ] - @jre - - JOB_ROUTER_ROUTE_NAMES = Condor_Pool My_Slurm - ``` - Writing Multiple Routes ----------------------- If your batch system needs incoming jobs to be sorted (e.g. if different VO's need to go to separate queues), you will -need to write multiple job routes where each route is a separate `JOB_ROUTER_ROUTE_*` macro in the ClassAd transform -syntax and enclosed by square brackets in the deprecated syntax. +need to write multiple job routes where each route is a separate `JOB_ROUTER_ROUTE_*` macro. Additionally, the route names must be added to `JOB_ROUTER_ROUTE_NAMES` in the order that you want their requirements statements compared to incoming jobs. The following routes takes incoming jobs that have a `queue` attribute set to `"prod"` and sets `IsProduction = True`. All other jobs will be routed with `IsProduction = False`. -=== "ClassAd Transform" - ```hl_lines="1 7 12" JOB_ROUTER_ROUTE_Production_Jobs @=jrt REQUIREMENTS queue == "prod" @@ -189,33 +92,11 @@ All other jobs will be routed with `IsProduction = False`. JOB_ROUTER_ROUTE_NAMES = Production_Jobs Condor_Pool ``` -=== "Deprecated syntax" - - ```hl_lines="2 7 8 12 15" - JOB_ROUTER_ENTRIES @=jre - [ - Requirements = (TARGET.queue == "prod"); - TargetUniverse = 5; - set_IsProduction = True; - name = "Production_Jobs"; - ] - [ - TargetUniverse = 5; - set_IsProduction = False; - name = "Condor_Pool"; - ] - @jre - - JOB_ROUTER_ROUTE_NAMES = Production_Jobs Condor_Pool - ``` - Writing Comments ---------------- To write comments you can use `#` to comment a line: -=== "ClassAd Transform" - ```hl_lines="2" JOB_ROUTER_ROUTE_Condor_Pool @=jrt # This is a comment @@ -225,20 +106,6 @@ To write comments you can use `#` to comment a line: JOB_ROUTER_ROUTE_NAMES = Condor_Pool ``` -=== "Deprecated syntax" - - ```hl_lines="5" - JOB_ROUTER_ENTRIES @=jre - [ - TargetUniverse = 5; - name = "Condor_Pool"; - # This is a comment - ] - @jre - - JOB_ROUTER_ROUTE_NAMES = Condor_Pool - ``` - Setting Attributes for All Routes --------------------------------- @@ -267,28 +134,10 @@ To apply the same transform after your pre-route and route transforms, append th JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES = $(JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES) Periodic_Hold ``` -### Deprecated syntax - -To set an attribute that will be applied to all routes, you will need to ensure that `MERGE_JOB_ROUTER_DEFAULT_ADS` is -set to `True` (check the value with [condor\_ce\_config\_val](../troubleshooting/debugging-tools.md#condor_ce_config_val)) -and use the [set_](#setting-attributes) function in the `JOB_ROUTER_DEFAULTS`. -The following configuration sets the `Periodic_Hold` attribute for all routes: - -```hl_lines="7" -# Use the defaults generated by the condor_ce_router_defaults script. To add -# additional defaults, add additional lines of the form: -# -# JOB_ROUTER_DEFAULTS = $(JOB_ROUTER_DEFAULTS) [set_foo = 1;] -# -MERGE_JOB_ROUTER_DEFAULT_ADS=True -JOB_ROUTER_DEFAULTS = $(JOB_ROUTER_DEFAULTS) [set_Periodic_Hold = (NumJobStarts >= 1 && JobStatus == 1) || NumJobStarts > 1;] -``` - Filtering Jobs Based On… ------------------------ -To filter jobs, use the route's `REQUIREMENTS` or `Requirements` attribute for ClassAd transforms and deprecated -syntaxes, respectively. +To filter jobs, use the route's `REQUIREMENTS` attribute. Incoming jobs will be evaluated against the ClassAd expression set in the route's requirements and if the expression evaluates to `TRUE`, the route will match. More information on the syntax of ClassAd's can be found in the @@ -296,9 +145,6 @@ More information on the syntax of ClassAd's can be found in the For an example on how incoming jobs interact with filtering in job routes, consult [this document](../remote-job-submission.md). -In the deprecated syntax, you may need to specify `TARGET.` to refer to differentiate between job and route attributes. -See [this section](#differences-in-my-and-target) for more details. - !!! note If you have an HTCondor batch system, note the difference with [set\_requirements](htcondor-routes.md#setting-routed-job-requirements): @@ -309,8 +155,6 @@ To filter jobs based on their pilot job queue attribute, your routes will need a incoming job's `queue` attribute. The following entry routes jobs to HTCondor if the incoming job (specified by `TARGET`) is an `analy` (Analysis) glidein: -=== "ClassAd Transform" - ```hl_lines="2" JOB_ROUTER_ROUTE_Condor_Pool @=jrt REQUIREMENTS queue == "prod" @@ -320,28 +164,12 @@ The following entry routes jobs to HTCondor if the incoming job (specified by `T JOB_ROUTER_ROUTE_NAMES = Condor_Pool ``` -=== "Deprecated syntax" - - ```hl_lines="3" - JOB_ROUTER_ENTRIES @=jre - [ - Requirements = (TARGET.queue == "prod"); - TargetUniverse = 5; - name = "Condor_Pool"; - ] - @jre - - JOB_ROUTER_ROUTE_NAMES = My_HTCONDOR - ``` - ### Mapped user ### To filter jobs based on what local account the incoming job was mapped to, your routes will need a requirements expression using the incoming job's `Owner` attribute. The following entry routes jobs to the HTCondor batch system if the mapped user is `usatlas2`: -=== "ClassAd Transform" - ```hl_lines="2" JOB_ROUTER_ROUTE_Condor_Pool @=jrt REQUIREMENTS Owner == "usatlas2" @@ -351,25 +179,9 @@ The following entry routes jobs to the HTCondor batch system if the mapped user JOB_ROUTER_ROUTE_NAMES = Condor_Pool ``` -=== "Deprecated syntax" - - ```hl_lines="3" - JOB_ROUTER_ENTRIES @=jre - [ - Requirements = (TARGET.Owner == "usatlas2"); - TargetUniverse = 5; - name = "Condor_Pool"; - ] - @jre - - JOB_ROUTER_ROUTE_NAMES = My_HTCONDOR - ``` - Alternatively, you can match based on regular expression. The following entry routes jobs to the HTCondor batch system if the mapped user begins with `usatlas`: -=== "ClassAd Transform" - ```hl_lines="2" JOB_ROUTER_ROUTE_Condor_Pool @=jrt REQUIREMENTS regexp("^usatlas", Owner) @@ -379,20 +191,6 @@ The following entry routes jobs to the HTCondor batch system if the mapped user JOB_ROUTER_ROUTE_NAMES = Condor_Pool ``` -=== "Deprecated syntax" - - ```hl_lines="3" - JOB_ROUTER_ENTRIES @=jre - [ - Requirements = regexp("^usatlas", TARGET.Owner); - TargetUniverse = 5; - name = "Condor_Pool"; - ] - @jre - - JOB_ROUTER_ROUTE_NAMES = My_HTCONDOR - ``` - ### VOMS attribute ### To filter jobs based on the subject of the job's proxy, your routes will need a requirements expression using the @@ -400,8 +198,6 @@ incoming job's `x509UserProxyFirstFQAN` attribute. The following entry routes jobs to the HTCondor batch system if the proxy subject contains `/cms/Role=Pilot`: -=== "ClassAd Transform" - ```hl_lines="2" JOB_ROUTER_ROUTE_Condor_Pool @=jrt REQUIREMENTS regexp("\/cms\/Role\=pilot", x509UserProxyFirstFQAN) @@ -411,20 +207,6 @@ The following entry routes jobs to the HTCondor batch system if the proxy subjec JOB_ROUTER_ROUTE_NAMES = Condor_Pool ``` -=== "Deprecated syntax" - - ```hl_lines="3" - JOB_ROUTER_ENTRIES @=jre - [ - Requirements = regexp("\/cms\/Role\=pilot", TARGET.x509UserProxyFirstFQAN); - TargetUniverse = 5; - name = "Condor_Pool"; - ] - @jre - - JOB_ROUTER_ROUTE_NAMES = My_HTCONDOR - ``` - Setting a Default… ------------------ @@ -447,10 +229,7 @@ CONDORCE_MAX_JOBS = 10000 ### Maximum memory ### -To set a default maximum memory (in MB) for routed jobs, set the variable or attribute `default_maxMemory` for the -ClassAd transform and deprecated syntax, respectively: - -=== "ClassAd Transform" +To set a default maximum memory (in MB) for routed jobs, set the variable `default_maxMemory`: ```hl_lines="4" JOB_ROUTER_ROUTE_Condor_Pool @=jrt @@ -462,29 +241,11 @@ ClassAd transform and deprecated syntax, respectively: JOB_ROUTER_ROUTE_NAMES = Condor_Pool ``` -=== "Deprecated Syntax" - - ```hl_lines="6" - JOB_ROUTER_ENTRIES @=jre - [ - TargetUniverse = 5; - name = "Condor_Pool"; - # Set the requested memory to 1 GB - set_default_maxMemory = 1000; - ] - @jre - - JOB_ROUTER_ROUTE_NAMES = Condor_Pool - ``` - ### Number of cores to request ### -To set a default number of cores for routed jobs, set the variable or attribute `default_xcount` for the ClassAd -transform and deprecated syntax, respectively: +To set a default number of cores for routed jobs, set the variable `default_xcount`: -=== "ClassAd Transform" - ```hl_lines="4" JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA @@ -495,28 +256,11 @@ transform and deprecated syntax, respectively: JOB_ROUTER_ROUTE_NAMES = Condor_Pool ``` -=== "Deprecated Syntax" - - ```hl_lines="6" - JOB_ROUTER_ENTRIES @=jre - [ - TargetUniverse = 5; - name = "Condor_Pool"; - # Set the requested cores to 8 - set_default_xcount = 8; - ] - @jre - - JOB_ROUTER_ROUTE_NAMES = Condor_Pool - ``` - ### Number of gpus to request ### To set a default number of GPUs for routed jobs, set the job ClassAd attribute `RequestGPUs` in the route transform: -=== "ClassAd Transform" - ```hl_lines="4" JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA @@ -528,14 +272,11 @@ transform: ``` The `DEFAULT` keyword works for any job attribute other than those mentioned above that require the use of -alternative names for defaulting in the CE. The deprecated syntax has no keyword for defaulting. +alternative names for defaulting in the CE. ### Maximum walltime ### -To set a default number of cores for routed jobs, set the variable or attribute `default_maxWallTime` for the ClassAd -transform and deprecated syntax, respectively: - -=== "ClassAd Transform" +To set a default number of cores for routed jobs, set the variable `default_maxWallTime`: ```hl_lines="4" JOB_ROUTER_ROUTE_Condor_Pool @=jrt @@ -547,21 +288,6 @@ transform and deprecated syntax, respectively: JOB_ROUTER_ROUTE_NAMES = Condor_Pool ``` -=== "Deprecated Syntax" - - ```hl_lines="6" - JOB_ROUTER_ENTRIES @=jre - [ - TargetUniverse = 5; - name = "Condor_Pool"; - # Set the max walltime to 1 hr - set_default_maxWallTime = 60; - ] - @jre - - JOB_ROUTER_ROUTE_NAMES = Condor_Pool - ``` - Setting Job Environments ------------------------ @@ -613,12 +339,9 @@ For example, the following HTCondor-CE configuration would result in the followi ``` To set environment variables per job route, based on incoming job attributes, or using ClassAd functions, add -`default_pilot_job_env` or `set_default_pilot_job_env` to your job route configuration for ClassAd transforms and -deprecated syntax, respectively. +`default_pilot_job_env` to your job route configuration: For example, the following HTCondor-CE configuration would result in this environment for a job with these attributes: -=== "ClassAd Transform" - ```hl_lines="3 4 5" JOB_ROUTER_Condor_Pool @=jrt UNIVERSE VANILLA @@ -630,22 +353,6 @@ For example, the following HTCondor-CE configuration would result in this enviro JOB_ROUTER_ROUTE_NAMES = Condor_Pool ``` -=== "Deprecated Syntax" - - ```hl_lines="5 6 7" - JOB_ROUTER_ENTRIES @=jre - [ - TargetUniverse = 5; - name = "Condor_Pool"; - set_default_pilot_job_env = strcat("WN_SCRATCH_DIR=/nobackup", - " PILOT_COLLECTOR=", JOB_COLLECTOR, - " ACCOUNTING_GROUP=", toLower(JOB_VO)); - ] - @jre - - JOB_ROUTER_ROUTE_NAMES = Condor_Pool - ``` - === "Incoming Job Attributes" ``` @@ -670,16 +377,16 @@ Editing Attributes… ------------------- The following functions are operations that can be used to take incoming job attributes and modify them for the routed -job for the ClassAd transform and deprecated syntax, respectively: +job: 1. `COPY`, `copy_*` 2. `DELETE`, `delete_*` 3. `SET`, `set_*` 4. `EVALSET`, `eval_set_*` -The above operations are evaluated in order differently depending on your chosen syntax: +The above operations are evaluated in order: -- **If you are using ClassAd transforms**, each function is evaluated in order of appearance. +- Each function is evaluated in order of appearance. For example, the following will set `FOO` in the routed job to the incoming job's `Owner` attribute and then subsequently remove `FOO` from the routed job: @@ -688,29 +395,16 @@ The above operations are evaluated in order differently depending on your chosen DELETE FOO @jrt -- **If you are using the deprecated syntax**, each class of operations is evaluated in the order specified above, - i.e. all `copy_*`, before `delete_*`, etc. - For example, if the attribute `FOO` is set using `eval_set_FOO` in the `JOB_ROUTER_DEFAULTS`, you'll be unable to use - `delete_foo` to remove it from your jobs since the attribute is set using `eval_set_foo` after the deletion occurs - according to the order of operations. - To get around this, we can take advantage of the fact that operations defined in `JOB_ROUTER_DEFAULTS` get - overridden by the same operation in `JOB_ROUTER_ENTRIES`. - So to 'delete' `FOO`, you could add `eval_set_foo = ""` to the route in the `JOB_ROUTER_ENTRIES`, resulting in `foo` - being set to the empty string in the routed job. - More documentation can be found in the [HTCondor manual](https://htcondor.readthedocs.io/en/lts/grid-computing/job-router.html#routing-table-entry-commands-and-macro-values) ### Copying attributes -To copy the value of an attribute of the incoming job to an attribute of the routed job, use `COPY` or `copy_` for -ClassAd transform and deprecated syntaxes, respectively.. +To copy the value of an attribute of the incoming job to an attribute of the routed job, use `COPY`: The following route copies the `Environment` attribute of the incoming job and sets the attribute `Original_Environment` on the routed job to the same value: -=== "ClassAd Transform" - ```hl_lines="3" JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA @@ -720,29 +414,12 @@ on the routed job to the same value: JOB_ROUTER_ROUTE_NAMES = Condor_Pool ``` -=== "Deprecated Syntax" - - ```hl_lines="5" - JOB_ROUTER_ENTRIES @=jre - [ - TargetUniverse = 5; - name = "Condor_Pool"; - copy_Environment = "Original_Environment"; - ] - @jre - - JOB_ROUTER_ROUTE_NAMES = Condor_Pool - ``` - ### Removing attributes -To remove an attribute of the incoming job from the routed job, use `DELETE` or `delete_` for ClassAd transform and -deprecated syntaxes, respectively. +To remove an attribute of the incoming job from the routed job, use `DELETE`. The following route removes the `Environment` attribute from the routed job: -=== "ClassAd Transform" - ```hl_lines="3" JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA @@ -752,27 +429,11 @@ The following route removes the `Environment` attribute from the routed job: JOB_ROUTER_ROUTE_NAMES = Condor_Pool ``` -=== "Deprecated Syntax" - - ```hl_lines="5" - JOB_ROUTER_ENTRIES @=jre - [ - TargetUniverse = 5; - name = "Condor_Pool"; - delete_Environment = True; - ] - @jre - - JOB_ROUTER_ROUTE_NAMES = Condor_Pool - ``` - ### Setting attributes -To set an attribute on the routed job, use `SET` or `set_` for ClassAd transform and deprecated syntaxes, respectively. +To set an attribute on the routed job, use `SET`. The following route sets the Job's `Rank` attribute to 5: -=== "ClassAd Transform" - ```hl_lines="3" JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA @@ -782,28 +443,11 @@ The following route sets the Job's `Rank` attribute to 5: JOB_ROUTER_ROUTE_NAMES = Condor_Pool ``` -=== "Deprecated Syntax" - - ```hl_lines="5" - JOB_ROUTER_ENTRIES @=jre - [ - TargetUniverse = 5; - name = "Condor_Pool"; - set_Rank = 5; - ] - @jre - - JOB_ROUTER_ROUTE_NAMES = Condor_Pool - ``` - ### Setting attributes with ClassAd expressions -To set an attribute to a ClassAd expression to be evaluated, use `EVALSET` or `eval_set` for ClassAd transform and -deprecated syntaxes, respectively. +To set an attribute to a ClassAd expression to be evaluated, use `EVALSET`. The following route sets the `Experiment` attribute to `atlas.osguser` if the Owner of the incoming job is `osguser`: -=== "ClassAd Transform" - ```hl_lines="3" JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA @@ -813,20 +457,6 @@ The following route sets the `Experiment` attribute to `atlas.osguser` if the Ow JOB_ROUTER_ROUTE_NAMES = Condor_Pool ``` -=== "Deprecated Syntax" - - ```hl_lines="5" - JOB_ROUTER_ENTRIES @=jre - [ - TargetUniverse = 5; - name = "Condor_Pool"; - eval_set_Experiment = strcat("atlas.", Owner); - ] - @jre - - JOB_ROUTER_ROUTE_NAMES = Condor_Pool - ``` - Limiting the Number of Jobs --------------------------- @@ -836,14 +466,12 @@ This section outlines how to limit the number of total or idle jobs in a specifi !!! note If you are using an HTCondor batch system, limiting the number of jobs is not the preferred solution: HTCondor manages fair share on its own via - [user priorities and group accounting](http://research.cs.wisc.edu/htcondor/manual/v8.6/3_6User_Priorities.html). + [user priorities and group accounting](https://htcondor.readthedocs.io/en/lts/admin-manual/user-priorities-negotiation.html). ### Total jobs To set a limit on the number of jobs for a specific route, -set the [MaxJobs](http://research.cs.wisc.edu/htcondor/manual/v8.6/5_4HTCondor_Job.html#57134) attribute: - -=== "ClassAd Transform" +set the [MaxJobs](https://htcondor.readthedocs.io/en/lts/grid-computing/job-router.html#index-6) attribute: ```hl_lines="3" JOB_ROUTER_ROUTE_Condor_Poole @=jrt @@ -854,26 +482,10 @@ set the [MaxJobs](http://research.cs.wisc.edu/htcondor/manual/v8.6/5_4HTCondor_J JOB_ROUTER_ROUTE_NAMES = Condor_Pool ``` -=== "Deprecated Syntax" - - ```hl_lines="5" - JOB_ROUTER_ENTRIES @=jre - [ - TargetUniverse = 5; - name = "Condor_Pool"; - MaxJobs = 100; - ] - @jre - - JOB_ROUTER_ROUTE_NAMES = Condor_Pool - ``` - ### Idle jobs To set a limit on the number of idle jobs for a specific route, -set the [MaxIdleJobs](http://research.cs.wisc.edu/htcondor/manual/v8.6/5_4HTCondor_Job.html#57135) attribute: - -=== "ClassAd Transform" +set the [MaxIdleJobs](https://htcondor.readthedocs.io/en/lts/grid-computing/job-router.html#index-7) attribute: ```hl_lines="3" JOB_ROUTER_ROUTE_Condor_Poole @=jrt @@ -884,20 +496,6 @@ set the [MaxIdleJobs](http://research.cs.wisc.edu/htcondor/manual/v8.6/5_4HTCond JOB_ROUTER_ROUTE_NAMES = Condor_Pool ``` -=== "Deprecated Syntax" - - ```hl_lines="5" - JOB_ROUTER_ENTRIES @=jre - [ - TargetUniverse = 5; - name = "Condor_Pool"; - MaxIdleJobs = 100; - ] - @jre - - JOB_ROUTER_ROUTE_NAMES = Condor_Pool - ``` - Debugging Routes ---------------- @@ -910,8 +508,6 @@ JOB_ROUTER_DEBUG = D_ALWAYS:2 D_CAT Then wrap the problematic attribute in `debug()`: -=== "ClassAd Transform" - ```hl_lines="2" JOB_ROUTER_ROUTE_Condor_Pool @=jrt EVALSET Experiment = debug(strcat("atlas", Name)) @@ -921,19 +517,6 @@ Then wrap the problematic attribute in `debug()`: JOB_ROUTER_ROUTE_NAMES = Condor_Pool ``` -=== "Deprecated Syntax" - - ```hl_lines="4" - JOB_ROUTER_ENTRIES @=jre - [ - name = "Condor_Pool"; - eval_set_Experiment = debug(strcat("atlas", Name)); - ] - @jre - - JOB_ROUTER_ROUTE_NAMES = Condor_Pool - ``` - You will find the debugging output in `/var/log/condor-ce/JobRouterLog`. Getting Help diff --git a/docs/v6/installation/central-collector.md b/docs/v24/installation/central-collector.md similarity index 99% rename from docs/v6/installation/central-collector.md rename to docs/v24/installation/central-collector.md index f5bdce9ce..88b78299f 100644 --- a/docs/v6/installation/central-collector.md +++ b/docs/v24/installation/central-collector.md @@ -35,7 +35,7 @@ There are some one-time (per host) steps to prepare in advance: - Ensure the host has a supported operating system (Red Hat Enterprise Linux variant 7) - Obtain root access to the host -- Prepare the [EPEL](https://fedoraproject.org/wiki/EPEL) and [HTCondor](https://research.cs.wisc.edu/htcondor/yum/) Yum +- Prepare the [EPEL](https://fedoraproject.org/wiki/EPEL) and [HTCondor](https://htcondor.readthedocs.io/en/lts/getting-htcondor/from-our-repositories.html) Yum repositories - Install CA certificates and VO data into `/etc/grid-security/certificates` and `/etc/grid-security/vomsdir`, respectively diff --git a/docs/v6/installation/htcondor-ce.md b/docs/v24/installation/htcondor-ce.md similarity index 95% rename from docs/v6/installation/htcondor-ce.md rename to docs/v24/installation/htcondor-ce.md index 6e953b4ae..0d890961a 100644 --- a/docs/v6/installation/htcondor-ce.md +++ b/docs/v24/installation/htcondor-ce.md @@ -1,5 +1,5 @@ -Installing HTCondor-CE 6 -======================== +Installing HTCondor-CE 24 +========================= !!! tip "Joining the OSG Consortium (OSG)?" If you are installing an HTCondor-CE for the OSG, consult the @@ -11,7 +11,7 @@ It is configured to use the [Job Router daemon](https://htcondor.readthedocs.io/ to delegate resource allocation requests by transforming and submitting them to the site’s batch system. See the [home page](../../index.md) for more details on the features and architecture of HTCondor-CE. -Use this page to learn how to install, configure, run, test, and troubleshoot HTCondor-CE 6 from the +Use this page to learn how to install, configure, run, test, and troubleshoot HTCondor-CE 24 from the [CHTC yum repositories](https://htcondor.org/downloads/htcondor-ce.html). Before Starting @@ -34,7 +34,7 @@ There are some one-time (per host) steps to prepare in advance: - Ensure the host has a supported operating system (Red Hat Enterprise Linux variant 7) - Obtain root access to the host -- Prepare the [EPEL](https://fedoraproject.org/wiki/EPEL) and [HTCondor Development](https://research.cs.wisc.edu/htcondor/yum/) Yum +- Prepare the [EPEL](https://fedoraproject.org/wiki/EPEL) and [HTCondor Development](https://htcondor.readthedocs.io/en/lts/getting-htcondor/from-our-repositories.html) Yum repositories - Install CA certificates and VO data into `/etc/grid-security/certificates` and `/etc/grid-security/vomsdir`, respectively diff --git a/docs/v6/operation.md b/docs/v24/operation.md similarity index 100% rename from docs/v6/operation.md rename to docs/v24/operation.md diff --git a/docs/v6/reference.md b/docs/v24/reference.md similarity index 100% rename from docs/v6/reference.md rename to docs/v24/reference.md diff --git a/docs/v6/releases.md b/docs/v24/releases.md similarity index 52% rename from docs/v6/releases.md rename to docs/v24/releases.md index b1bf0784e..aecf9371c 100644 --- a/docs/v6/releases.md +++ b/docs/v24/releases.md @@ -1,7 +1,7 @@ Releases ======== -HTCondor-CE 6 is distributed via RPM and are available from the following Yum repositories: +HTCondor-CE 24 is distributed via RPM and are available from the following Yum repositories: - [HTCondor LTS and Feature Releases](https://htcondor.org/htcondor/download/) - [The OSG Consortium](https://osg-htc.org/docs/common/yum/) @@ -13,8 +13,12 @@ Known Issues Known bugs affecting HTCondor-CEs can be found in [Jira](https://opensciencegrid.atlassian.net/issues/?jql=project%20%3D%20HTCONDOR%20AND%20status%20not%20in%20(done%2C%20abandoned)%20and%20component%20%3D%20htcondor-ce%20and%20issuetype%20%3D%20bug) -Updating to HTCondor-CE 6 -------------------------- +Updating to HTCondor-CE 24 +-------------------------- + +!!! note "Updating from HTCondor-CE < 23" + If updating to HTCondor-CE 24 from HTCondor-CE < 23, be sure to also consult the HTCondor-CE 23 + [upgrade instructions](../v23/releases.md#600). !!! tip "Finding relevant configuration changes" When updating HTCondor-CE RPMs, `.rpmnew` and `.rpmsave` files may be created containing new defaults that you @@ -24,34 +28,26 @@ Updating to HTCondor-CE 6 :::console root@host # find /etc/condor-ce/ -name '*.rpmnew' -name '*.rpmsave' -HTCondor-CE 6 is a major release that aligns its security model with -[HTCondor 9.0's improved security model](https://htcondor.readthedocs.io/en/v9_0/version-history/upgrading-from-88-to-90-series.html). -As such, upgrades from older versions of HTCondor-CE may require manual intervention. +HTCondor-CE 24 is very close in functionality to HTCondor-CE 23. +As such, upgrading should be very easy. -HTCondor-CE 6 Version History ------------------------------ +HTCondor-CE 24 Version History +------------------------------ -This section contains release notes for each version of HTCondor-CE 6. +This section contains release notes for each version of HTCondor-CE 24. Full HTCondor-CE version history can be found on [GitHub](https://github.com/htcondor/htcondor-ce/releases). -### 6.0.1 ### +### **October 31, 2024:** 24.1.1 ### -[This release](https://github.com/htcondor/htcondor-ce/releases/tag/v6.0.1) includes the following new features: +[This release](https://github.com/htcondor/htcondor-ce/releases/tag/v24.1.1) includes the following new features: -- Add grid CA and host certificate/key locations to default SSL search paths -- Verifies that HTCondor-CE can access the local HTCondor's `SPOOL` directory -- Can use `condor_ce_trace` without SciToken to test batch system integration -- `condor_ce_upgrade_check` checks compatibility with HTCondor 23.0 -- Adds deprecation warnings for old job router configuration syntax +- Initial HTCondor-CE 24.1.1 release -### 6.0.0 ### +### **October 31, 2024:** 24.0.1 ### -[This release](https://github.com/htcondor/htcondor-ce/releases/tag/v6.0.0) includes the following new features: +[This release](https://github.com/htcondor/htcondor-ce/releases/tag/v24.0.1) includes the following new features: -- Align HTCondor-CE security configuration with HTCondor defaults -- Add example configuration on how to ban users -- Add `condor_ce_transform_ads` command -- Improve essential directory checking and creation at startup +- Initial HTCondor-CE 24.0.1 release Getting Help ------------ diff --git a/docs/v6/remote-job-submission.md b/docs/v24/remote-job-submission.md similarity index 100% rename from docs/v6/remote-job-submission.md rename to docs/v24/remote-job-submission.md diff --git a/docs/v6/troubleshooting/common-issues.md b/docs/v24/troubleshooting/common-issues.md similarity index 91% rename from docs/v6/troubleshooting/common-issues.md rename to docs/v24/troubleshooting/common-issues.md index 12e31e136..6cf2102d6 100644 --- a/docs/v6/troubleshooting/common-issues.md +++ b/docs/v24/troubleshooting/common-issues.md @@ -422,7 +422,7 @@ Notice the failures in the above message: `Remote Mapping: gsi@unmapped` and `Au ### Jobs go on hold -Jobs will be put on held with a `HoldReason` attribute that can be inspected with +Jobs can be put on hold with a `HoldReason` attribute that can be inspected with [condor\_ce\_q](debugging-tools.md#condor_ce_q): ``` console @@ -430,6 +430,20 @@ user@host $ condor_ce_q -l -attr HoldReason HoldReason = "CE job in status 5 put on hold by SYSTEM_PERIODIC_HOLD due to no matching routes, route job limit, or route failure threshold." ``` +The CE (and CE client) will put a job on hold when it encounters a problem +with the job that it doesn't know how to resolve. + +If the HTCondor schedd believes that the existing job it has submitted +to a remote queue may be recoverable, then it will leave the remote job +queued and keep the `GridJobId` attribute defined in the local job ad. +If you release the local job (with `condor_ce_release`), then the schedd +will attempt to re-establish contact with the remote scheduler. + +If the schedd believes the existing remote job is not recoverable, then it +willremove the job from the remote queue and set `GridJobId` to `Undefined` +in the local job ad. If you release the local job, then a new job instance +will be submitted to the remote scheduler. + #### Held jobs: no matching routes, route job limit, or route failure threshold Jobs on the CE will be put on hold if they are not claimed by the job router within 30 minutes. @@ -441,7 +455,7 @@ The most common cases for this behavior are as follows: - **The route(s) that the job matches to are full:** See [limiting the number of jobs](../configuration/writing-job-routes.md#limiting-the-number-of-jobs). - **The job router is throttling submission to your batch system due to submission failures:** - See the HTCondor manual for [FailureRateThreshold](http://research.cs.wisc.edu/htcondor/manual/v8.6/5_4HTCondor_Job.html#55958). + See the HTCondor manual for [FailureRateThreshold](https://htcondor.readthedocs.io/en/lts/grid-computing/job-router.html#index-8). Check for errors in the [JobRouterLog](logs.md#jobrouterlog) or [GridmanagerLog](logs.md#gridmanagerlog) for HTCondor and non-HTCondor batch systems, respectively. @@ -465,10 +479,10 @@ Ensure that the owner of the job generates their proxy with `voms-proxy-init`. #### Held jobs: Invalid job universe -The HTCondor-CE only accepts jobs that have `universe` in their submit files set to `vanilla`, `standard`, `local`, or +The HTCondor-CE only accepts jobs that have `universe` in their submit files set to `vanilla`, `local`, or `scheduler`. These universes also have corresponding integer values that can be found in the -[HTCondor manual](http://research.cs.wisc.edu/htcondor/manual/v8.6/12_Appendix_A.html#104736). +[HTCondor manual](https://htcondor.readthedocs.io/en/lts/codes-other-values/job-universe-numbers.html). **Next actions** @@ -550,6 +564,40 @@ This means that the `condor_job_router_info` (note this is not the CE version), 2. You have installed HTCondor in a non-standard location that is not in your `PATH`. 3. The `condor_job_router_info` tool itself wasn't available until Condor-8.2.3-1.1 (available in osg-upcoming). +### Jobs removed from the local batch system + +When the CE removes a job from the local batch system, it may be due to +a problem the CE encountered with managing the job or it may be at the +behest of the submitter to the CE (which may be a remote HTCondor +Access Point). + +Given a specific job ID in the CE logs, first find the job ad in CE +queue with the `condor_ce_q` tool and check the value of the `GridJobID` +attribute: + +``` console +user@host $ condor_ce_q -af GridJobId +``` + +If the job is no longer in the queue, you will have to check the history +using the `condor_ce_history` tool: + +``` console +user@host $ condor_ce_history -af GridJobId +``` + +If the `GridJobId` is *undefined*, then the CE did the removal due to a +problem interacting with the local batch system. +Check the `HoldReason` and `LastHoldReason` attributes for why the CE +removed the job. + +If `GridJobID` is not *undefined*, and is set to some value, then the +submitter to the CE removed the job. +If the submitter is a remote HTCondor Access Point, its daemons may have +done the removal as part of putting its local job on hold. +In that case, the `HoldReason` attribute in the remote job queue should +indicate the source of the problem. + Getting Help ------------ diff --git a/docs/v6/troubleshooting/debugging-tools.md b/docs/v24/troubleshooting/debugging-tools.md similarity index 89% rename from docs/v6/troubleshooting/debugging-tools.md rename to docs/v24/troubleshooting/debugging-tools.md index c0c051b8e..8c78c01ac 100644 --- a/docs/v6/troubleshooting/debugging-tools.md +++ b/docs/v24/troubleshooting/debugging-tools.md @@ -143,8 +143,9 @@ Authorized: TRUE !!! note If you run the `condor_ce_ping` command on the CE that you are testing, omit the `-name` and `-pool` - options. `condor_ce_ping` takes the same arguments as `condor_ping` and is documented in the - [HTCondor manual](http://research.cs.wisc.edu/htcondor/manual/v8.6/condor_ping.html). + options. `condor_ce_ping` takes the same arguments as + `condor_ping`, which is documented in the + [HTCondor manual](https://htcondor.readthedocs.io/en/lts/man-pages/condor_ping.html). ### Troubleshooting ### @@ -182,8 +183,8 @@ user@host $ condor_ce_q -name condorce.example.com -pool condorce.example.com:96 !!! note If you run the `condor_ce_q` command on the CE that you are testing, omit the `-name` and `-pool` options. - `condor_ce_q` takes the same arguments as `condor_q` and is documented in the - [HTCondor manual](http://research.cs.wisc.edu/htcondor/manual/v8.6/condor_q.html). + `condor_ce_q` takes the same arguments as `condor_q`, which is documented in the + [HTCondor manual](https://htcondor.readthedocs.io/en/lts/man-pages/condor_q.html). ### Troubleshooting ### @@ -248,8 +249,8 @@ user@host $ condor_ce_history -name condorce.example.com -pool condorce.example. !!! note If you run the `condor_ce_history` command on the CE that you are testing, omit the `-name` and `-pool` options. - `condor_ce_history` takes the same arguments as `condor_history` and is documented in the - [HTCondor manual](http://research.cs.wisc.edu/htcondor/manual/v8.6/condor_history.html). + `condor_ce_history` takes the same arguments as `condor_history`, which is documented in the + [HTCondor manual](https://htcondor.readthedocs.io/en/lts/man-pages/condor_history.html). condor_ce_job_router_info @@ -258,8 +259,7 @@ condor_ce_job_router_info ### Usage ### Use the `condor_ce_job_router_info` command to help troubleshoot your routes and how jobs will match to them. -To see all of your routes (the output is long because it combines your routes with the -[JOB\_ROUTER\_DEFAULTS](../configuration/job-router-overview.md#deprecated-syntax) configuration variable): +To see all of your routes: ``` console root@host # condor_ce_job_router_info -config @@ -343,8 +343,38 @@ and their statuses: user@host $ condor_ce_router_q ``` -`condor_ce_router_q` takes the same options as `condor_router_q` and `condor_q` and is documented in the -[HTCondor manual](http://research.cs.wisc.edu/htcondor/manual/v8.6/condor_router_q.html) +`condor_ce_router_q` takes the same options as `condor_router_q`, +which is documented in the +[HTCondor manual](https://htcondor.readthedocs.io/en/lts/man-pages/condor_router_q.html) + + +condor_ce_test_token +-------------------- + +## Usage ### + +Use the `condor_ce_test_token` command to test SciTokens +authentication in the CE. +It will create a token with an issuer and subject that you specify and +configure the CE daemons to accept that token as if it had been +generated by the given issuer (for one hour). +The token is printed to stdout; use it with `condor_ce_submit` to test +that SciTokens authentication and user mapping operate correctly. + +To create a temporary SciToken that appears to be issued by the +SciTokens demo issuer: + +``` console +root@host # condor_ce_token_test --issuer https://demo.scitokens.org +--audience ANY --scope condor:/WRITE --subject alice@foo.edu +``` + +!!! note + You must run `condor_ce_test_token` on the CE that you are testing + as the root user. + `condor_ce_test_token` takes the same arguments as + `condor_test_token`, which is documented in the + [HTCondor manual](https://htcondor.readthedocs.io/en/lts/man-pages/condor_test_token.html). condor_ce_status @@ -358,8 +388,8 @@ To see the daemons running on a CE, run the following command: user@host $ condor_ce_status -any ``` -`condor_ce_status` takes the same arguments as `condor_status`, which are documented in the -[HTCondor manual](http://research.cs.wisc.edu/htcondor/manual/v8.6/condor_status.html). +`condor_ce_status` takes the same arguments as `condor_status`, which is documented in the +[HTCondor manual](https://htcondor.readthedocs.io/en/lts/man-pages/condor_status.html). !!! note ""Missing" Worker Nodes" An HTCondor-CE will not show any worker nodes (e.g. `Machine` entries in the `condor_ce_status -any` output) if @@ -404,8 +434,9 @@ following command: user@host $ condor_ce_config_val -config ``` -`condor_ce_config_val` takes the same arguments as `condor_config_val` and is documented in the -[HTCondor manual](http://research.cs.wisc.edu/htcondor/manual/v8.6/condor_config_val.html). +`condor_ce_config_val` takes the same arguments as +`condor_config_val`, which is documented in the +[HTCondor manual](https://htcondor.readthedocs.io/en/lts/man-pages/condor_config_val.html). condor_ce_reconfig diff --git a/docs/v6/troubleshooting/logs.md b/docs/v24/troubleshooting/logs.md similarity index 98% rename from docs/v6/troubleshooting/logs.md rename to docs/v24/troubleshooting/logs.md index 2e15706e9..64b8c7a2b 100644 --- a/docs/v6/troubleshooting/logs.md +++ b/docs/v24/troubleshooting/logs.md @@ -253,7 +253,7 @@ SharedPortLog The HTCondor-CE shared port log keeps track of all connections to all of the HTCondor-CE daemons other than the collector. This log is a good place to check if experiencing connectivity issues with HTCondor-CE. More information can be found -[here](http://research.cs.wisc.edu/htcondor/manual/v8.6/3_9Networking_includes.html#SECTION00492000000000000000). +[here](https://htcondor.readthedocs.io/en/lts/admin-manual/networking.html#reducing-port-usage-with-the-condor-shared-port-daemon). - Location: `/var/log/condor-ce/SharedPortLog` - Key contents: Every attempt to connect to HTCondor-CE (except collector queries) diff --git a/docs/v6/troubleshooting/remote-troubleshooting.md b/docs/v24/troubleshooting/remote-troubleshooting.md similarity index 100% rename from docs/v6/troubleshooting/remote-troubleshooting.md rename to docs/v24/troubleshooting/remote-troubleshooting.md diff --git a/docs/v6/configuration/job-router-overview.md b/docs/v6/configuration/job-router-overview.md deleted file mode 100644 index 88a1de177..000000000 --- a/docs/v6/configuration/job-router-overview.md +++ /dev/null @@ -1,118 +0,0 @@ -Job Router Configuration Overview -================================= - -The [HTCondor Job Router](https://htcondor.readthedocs.io/en/lts/grid-computing/job-router.html) is at the heart of -HTCondor-CE and allows admins to transform and direct jobs to specific batch systems. -Customizations are made in the form of job routes where each route corresponds to a separate job transformation: -If an incoming job matches a job route's requirements, the route creates a transformed job (referred to as the 'routed -job') that is then submitted to the batch system. -The CE package comes with default routes located in `/etc/condor-ce/config.d/02-ce-*.conf` that provide enough basic -functionality for a small site. - -If you have needs beyond delegating all incoming jobs to your batch system as they are, this document provides an -overview of how to configure your HTCondor-CE Job Router - -!!! note "Definitions" - - **Incoming Job**: A job which was submitted to HTCondor-CE from an external source. - - **Routed Job**: A job that has been transformed by the Job Router. - -Route Syntaxes --------------- - -HTCondor-CE 5 introduces the ability to write job routes using [ClassAd transform syntax](#classad-transforms) in -addition to the [existing configuration syntax](#deprecated-syntax). -The old route configuration syntax continues to be the default in HTCondor-CE 5 but there are benefits to transitioning -to the new syntax as [outlined below](#choosing-a-syntax). - -### ClassAd transforms ### - -The HTCondor [ClassAd transforms](https://htcondor.readthedocs.io/en/lts/classads/transforms.html) were -originally introduced to HTCondor to perform in-place transformations of user jobs submitted to an HTCondor pool. -In the HTCondor 8.9 series, the Job Router was updated to support transforms and HTCondor-CE 5 adds the configuration -necessary to support routes written as ClassAd transforms. -If configured to use trasnform-based routes, HTCondor-CE routes and transforms jobs that by chaining ClassAd transforms -in the following order: - -1. Each transform in `JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES` whose requirements are met by the job -1. The first transform from `JOB_ROUTER_ROUTE_NAMES` whose requirements are met by the job. - See [the section on route matching](#how-jobs-match-to-routes) below. -1. Each transform in `JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES` whose requirements are met by the job - -### Deprecated syntax ### - -!!! warning "Planned Removal of Deprecated Syntax" - - `JOB_ROUTER_DEFAULTS`, `JOB_ROUTER_ENTRIES`, `JOB_ROUTER_ENTRIES_CMD`, and `JOB_ROUTER_ENTRIES_FILE` are - deprecated and will be removed for *V24* of the HTCondor Software Suite. New configuration syntax for the job router - is defined using `JOB_ROUTER_ROUTE_NAMES` and `JOB_ROUTER_ROUTE_[name]`. - - For new syntax example vist: - [HTCondor Documentation - Job Router](https://htcondor.readthedocs.io/en/latest/grid-computing/job-router.html#an-example-configuration) - - **Note:** The removal will occur during the lifetime of the HTCondor *V23* feature series. - -Since the inception of HTCondor-CE, job routes have been written as a -[list of ClassAds](https://htcondor.readthedocs.io/en/lts/grid-computing/job-router.html#deprecated-router-configuration). -Each job route’s [ClassAd](http://research.cs.wisc.edu/htcondor/manual/v8.6/4_1HTCondor_s_ClassAd.html) is constructed -by combining each entry from the `JOB_ROUTER_ENTRIES` with the `JOB_ROUTER_DEFAULTS`: - -- `JOB_ROUTER_ENTRIES` is a configuration variable whose default is set in `/etc/condor-ce/config.d/02-ce-*.conf` but - may be overriden by the administrator in subsequent files in `/etc/condor-ce/config.d/`. -- `JOB_ROUTER_DEFAULTS` is a generated configuration variable that sets default job route values that are required for - HTCondor-CE's functionality. - To view its contents in a readable format, run the following command: - - :::console - user@host $ condor_ce_config_val JOB_ROUTER_DEFAULTS | sed 's/;/;\n/g' - -Take care when modifying attributes in `JOB_ROUTER_DEFAULTS`: you may -[add new attributes](writing-job-routes.md#setting-attributes-for-all-routes) and override attributes that are -[set_*](writing-job-routes.md#setting-attributes) in `JOB_ROUTER_DEFAULTS`. - -!!! danger "The following may break your HTCondor-CE" - - Do **not** set the `JOB_ROUTER_DEFAULTS` configuration variable yourself. This will cause the CE to stop - functioning. - - If a value is set in `JOB_ROUTER_DEFAULTS` with `eval_set_`, override it by using - `eval_set_` in the `JOB_ROUTER_ENTRIES`. - Do this at your own risk as it may cause the CE to break. - -### Choosing a syntax ### - -For existing HTCondor-CEs, it's recommended that administrators continue to use the deprecated syntax (the default) and -transition to ClassAd transforms at their own pace. - -For new HTCondor-CEs, it's recommended that administrators start with ClassAd transforms. -The [ClassAd transform](#classad-transforms) syntax provides many benefits including: - -- Statements being evaluated in [the order they are written](writing-job-routes.md#editing-attributes) -- Use of variables that are not included in the resultant job ad -- Use simple case-like logic - -Additionally, it is now easier to include job transformations that should be evaluated before or after your routes by -including transforms in the lists of `JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES` and `JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES`, -respectively. - -How Jobs Match to Routes ------------------------- - -The Job Router considers incoming jobs in the HTCondor-CE SchedD (i.e., jobs visible in -[condor_ce_q](../troubleshooting/debugging-tools.md#condor_ce_q)) that meet the following constraints: - -- The job has not already been considered by the Job Router -- The job's universe is standard or vanilla - -If the incoming job meets the above constraints, then the job is matched to the first route in `JOB_ROUTER_ROUTE_NAMES` -whose requirements are satisfied by the job's ClassAd. -Additionally: - -- **If you are using the [ClassAd transform syntax](#classad-transforms),** transforms in - `JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES` and `JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES` may also have their own - requirements that determine whether or not that transform is applied. -- **If you are using the [deprecated syntax](#deprecated-syntax),** you may configure the Job Router to evenly - distribute jobs across all matching routes (i.e., round-robin matching). - To do so, add the following configuration to a file in `/etc/condor-ce/config.d/`: - - JOB_ROUTER_ROUND_ROBIN_SELECTION = True - -Getting Help ------------- - -If you have any questions or issues with configuring job routes, please [contact us](../../index.md#contact-us) for -assistance.