diff --git a/404.html b/404.html
index cce711d22..16ddf83f3 100644
--- a/404.html
+++ b/404.html
@@ -194,7 +194,7 @@
-
+
HTCondor-CE 6
@@ -672,7 +672,7 @@
-
+
Installation
@@ -705,7 +705,7 @@
-
+
Authentication
@@ -717,7 +717,7 @@
-
+
Local Batch System
@@ -750,7 +750,7 @@
-
+
Overview
@@ -762,7 +762,7 @@
-
+
Writing Job Routes
@@ -774,7 +774,7 @@
-
+
For HTCondor Batch Systems
@@ -786,7 +786,7 @@
-
+
For Non-HTCondor Batch Systems
@@ -804,7 +804,7 @@
-
+
Optional Configuration
@@ -822,7 +822,7 @@
-
+
Operation
@@ -855,7 +855,7 @@
-
+
Common Issues
@@ -867,7 +867,7 @@
-
+
Debugging Tools
@@ -879,7 +879,7 @@
-
+
Helpful Logs
@@ -918,7 +918,7 @@
-
+
Submit Jobs Remotely
@@ -930,7 +930,7 @@
-
+
Remote Troubleshooting
@@ -942,7 +942,7 @@
-
+
Install a Central Collector
@@ -960,7 +960,7 @@
-
+
Releases
@@ -972,7 +972,7 @@
-
+
Reference
diff --git a/architecture/index.html b/architecture/index.html
index 31a2054a9..257eb9984 100644
--- a/architecture/index.html
+++ b/architecture/index.html
@@ -203,7 +203,7 @@
-
+
HTCondor-CE 6
@@ -758,7 +758,7 @@
-
+
Installation
@@ -791,7 +791,7 @@
-
+
Authentication
@@ -803,7 +803,7 @@
-
+
Local Batch System
@@ -836,7 +836,7 @@
-
+
Overview
@@ -848,7 +848,7 @@
-
+
Writing Job Routes
@@ -860,7 +860,7 @@
-
+
For HTCondor Batch Systems
@@ -872,7 +872,7 @@
-
+
For Non-HTCondor Batch Systems
@@ -890,7 +890,7 @@
-
+
Optional Configuration
@@ -908,7 +908,7 @@
-
+
Operation
@@ -941,7 +941,7 @@
-
+
Common Issues
@@ -953,7 +953,7 @@
-
+
Debugging Tools
@@ -965,7 +965,7 @@
-
+
Helpful Logs
@@ -1004,7 +1004,7 @@
-
+
Submit Jobs Remotely
@@ -1016,7 +1016,7 @@
-
+
Remote Troubleshooting
@@ -1028,7 +1028,7 @@
-
+
Install a Central Collector
@@ -1046,7 +1046,7 @@
-
+
Releases
@@ -1058,7 +1058,7 @@
-
+
Reference
@@ -1132,17 +1132,17 @@ Hosted CE over SSH
If your site intends to run over 10,000 concurrent pilot jobs, you will need to host your own
-HTCondor-CE because the Hosted CE has not yet been optimized for such loads.
+HTCondor-CE because the Hosted CE has not yet been optimized for such loads.
How the CE is Customized
Aside from the [basic configuration] required in the CE installation, there are two main ways to customize your CE (if
you decide any customization is required at all):
Deciding which Virtual Organizations (VOs) are allowed to run at your site: HTCondor-CE leverages HTCondor's
- built-in ability to authenticate incoming jobs based on their OAuth
+ built-in ability to authenticate incoming jobs based on their OAuth
token credentials.
How to filter and transform the pilot jobs to be run on your batch system: Filtering and transforming pilot jobs
(i.e., setting site-specific attributes or resource limits), requires configuration of your site’s job routes.
- For examples of common job routes, consult the job router configuration
+ For examples of common job routes, consult the job router configuration
pages.
How Security Works
diff --git a/index.html b/index.html
index a25fa020b..c02349bf6 100644
--- a/index.html
+++ b/index.html
@@ -203,7 +203,7 @@
-
+
HTCondor-CE 6
@@ -742,7 +742,7 @@
-
+
Installation
@@ -775,7 +775,7 @@
-
+
Authentication
@@ -787,7 +787,7 @@
-
+
Local Batch System
@@ -820,7 +820,7 @@
-
+
Overview
@@ -832,7 +832,7 @@
-
+
Writing Job Routes
@@ -844,7 +844,7 @@
-
+
For HTCondor Batch Systems
@@ -856,7 +856,7 @@
-
+
For Non-HTCondor Batch Systems
@@ -874,7 +874,7 @@
-
+
Optional Configuration
@@ -892,7 +892,7 @@
-
+
Operation
@@ -925,7 +925,7 @@
-
+
Common Issues
@@ -937,7 +937,7 @@
-
+
Debugging Tools
@@ -949,7 +949,7 @@
-
+
Helpful Logs
@@ -988,7 +988,7 @@
-
+
Submit Jobs Remotely
@@ -1000,7 +1000,7 @@
-
+
Remote Troubleshooting
@@ -1012,7 +1012,7 @@
-
+
Install a Central Collector
@@ -1030,7 +1030,7 @@
-
+
Releases
@@ -1042,7 +1042,7 @@
-
+
Reference
@@ -1104,13 +1104,13 @@ Routing as configuration: HTCondor-CE’s mechanism to transform and submit RARs is customized via configuration
variables, which means that customizations will persist across upgrades and will not involve modification of
software internals to route jobs
Getting HTCondor-CE
-Learn how to get and install HTCondor-CE through our documentation .
+Learn how to get and install HTCondor-CE through our documentation .
HTCondor-CE is developed and maintained by the Center for High Throughput Computing .
If you have questions or issues regarding HTCondor-CE, please see the
diff --git a/search/search_index.json b/search/search_index.json
index a81c36b3b..8a907db78 100644
--- a/search/search_index.json
+++ b/search/search_index.json
@@ -1 +1 @@
-{"config":{"lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"HTCondor-CE \u00b6 The HTCondor-CE software is a Compute Entrypoint (CE) based on HTCondor for sites that are part of a larger computing grid (e.g. European Grid Infrastructure , The OSG Consortium ). As such, HTCondor-CE serves as a \"door\" for incoming resource allocation requests (RARs) \u2014 it handles authorization and delegation of these requests to a grid site's local batch system. Supported batch systems include Grid Engine , HTCondor , LSF , PBS Pro / Torque , and Slurm . For an introduction to HTCondor-CE, watch our recorded webinar from the EGI Community Webinar Programme: What is a Compute Entrypoint? \u00b6 A Compute Entrypoint (CE) is the door for remote organizations to submit requests to temporarily allocate local compute resources. These resource allocation requests are submitted as pilot jobs that create an environment for end-user jobs to match and ultimately run within the pilot job. CEs are made up of a thin layer of software that you install on a machine that already has the ability to submit and manage jobs in your local batch system. What is HTCondor-CE? \u00b6 HTCondor-CE is a special configuration of the HTCondor software designed as a Compute Entrypoint. It is configured to use the HTCondor Job Router daemon to delegate resource allocation requests by transforming and submitting them to the site\u2019s batch system. Benefits of running the HTCondor-CE: Scalability: HTCondor-CE is capable of supporting ~16k concurrent RARs Debugging tools: HTCondor-CE offers many tools to help troubleshoot issues with RARs Routing as configuration: HTCondor-CE\u2019s mechanism to transform and submit RARs is customized via configuration variables, which means that customizations will persist across upgrades and will not involve modification of software internals to route jobs Getting HTCondor-CE \u00b6 Learn how to get and install HTCondor-CE through our documentation . Contact Us \u00b6 HTCondor-CE is developed and maintained by the Center for High Throughput Computing . If you have questions or issues regarding HTCondor-CE, please see the HTCondor support page for how to contact us.","title":"Overview"},{"location":"#htcondor-ce","text":"The HTCondor-CE software is a Compute Entrypoint (CE) based on HTCondor for sites that are part of a larger computing grid (e.g. European Grid Infrastructure , The OSG Consortium ). As such, HTCondor-CE serves as a \"door\" for incoming resource allocation requests (RARs) \u2014 it handles authorization and delegation of these requests to a grid site's local batch system. Supported batch systems include Grid Engine , HTCondor , LSF , PBS Pro / Torque , and Slurm . For an introduction to HTCondor-CE, watch our recorded webinar from the EGI Community Webinar Programme:","title":"HTCondor-CE"},{"location":"#what-is-a-compute-entrypoint","text":"A Compute Entrypoint (CE) is the door for remote organizations to submit requests to temporarily allocate local compute resources. These resource allocation requests are submitted as pilot jobs that create an environment for end-user jobs to match and ultimately run within the pilot job. CEs are made up of a thin layer of software that you install on a machine that already has the ability to submit and manage jobs in your local batch system.","title":"What is a Compute Entrypoint?"},{"location":"#what-is-htcondor-ce","text":"HTCondor-CE is a special configuration of the HTCondor software designed as a Compute Entrypoint. It is configured to use the HTCondor Job Router daemon to delegate resource allocation requests by transforming and submitting them to the site\u2019s batch system. Benefits of running the HTCondor-CE: Scalability: HTCondor-CE is capable of supporting ~16k concurrent RARs Debugging tools: HTCondor-CE offers many tools to help troubleshoot issues with RARs Routing as configuration: HTCondor-CE\u2019s mechanism to transform and submit RARs is customized via configuration variables, which means that customizations will persist across upgrades and will not involve modification of software internals to route jobs","title":"What is HTCondor-CE?"},{"location":"#getting-htcondor-ce","text":"Learn how to get and install HTCondor-CE through our documentation .","title":"Getting HTCondor-CE"},{"location":"#contact-us","text":"HTCondor-CE is developed and maintained by the Center for High Throughput Computing . If you have questions or issues regarding HTCondor-CE, please see the HTCondor support page for how to contact us.","title":"Contact Us"},{"location":"architecture/","text":"How Jobs Run \u00b6 Once an incoming pilot job is authorized, it is placed into HTCondor-CE\u2019s scheduler where the Job Router creates a transformed copy (called the routed job ) and submits the copy to the batch system (called the batch system job ). After submission, HTCondor-CE monitors the batch system job and communicates its status to the original pilot job, which in turn notifies the original submitter (e.g., job factory) of any updates. When the batch job job completes, files are transferred along the same chain: from the batch system to the CE, then from the CE to the original submitter. On HTCondor batch systems \u00b6 For a site with an HTCondor batch system , the Job Router uses HTCondor protocols to place a transformed copy of the pilot job directly into the batch system\u2019s scheduler, meaning that the routed job is also the batch system job. Thus, there are three representations of your job, each with its own ID (see diagram below): Submitter: the HTCondor job ID in the original queue HTCondor-CE: the incoming pilot job\u2019s ID HTCondor batch system: the routed job\u2019s ID In an HTCondor-CE/HTCondor setup, file transfer is handled natively between the two sets of daemons by the underlying HTCondor software. If you are running HTCondor as your batch system, you will have two HTCondor configurations side-by-side (one residing in /etc/condor/ and the other in /etc/condor-ce ) and will need to make sure to differentiate the two when modifying any configuration. On other batch systems \u00b6 For non-HTCondor batch systems, the Job Router transforms the pilot job into a routed job on the CE and the routed job submits a job into the batch system via a process called the BLAHP. Thus, there are four representations of your job, each with its own ID (see diagram below): Submitter: the HTCondor job ID in the original queue HTCondor-CE: the incoming pilot job\u2019s ID and the routed job\u2019s ID Non-HTCondor batch system: the batch system\u2019s job ID Although the following figure specifies the PBS case, it applies to all non-HTCondor batch systems: With non-HTCondor batch systems, HTCondor-CE cannot use internal HTCondor protocols to transfer files so its \"spool\" directory must be exported to a shared file system that is mounted on the batch system\u2019s worker nodes. Hosted CE over SSH \u00b6 The Hosted CE is designed to be an HTCondor-CE as a Service offered by a central grid operations team. Hosted CEs submit jobs to remote clusters over SSH, providing a simple starting point for opportunistic resource owners that want to start contributing to a computing grid with minimal effort. If your site intends to run over 10,000 concurrent pilot jobs, you will need to host your own HTCondor-CE because the Hosted CE has not yet been optimized for such loads. How the CE is Customized \u00b6 Aside from the [basic configuration] required in the CE installation, there are two main ways to customize your CE (if you decide any customization is required at all): Deciding which Virtual Organizations (VOs) are allowed to run at your site: HTCondor-CE leverages HTCondor's built-in ability to authenticate incoming jobs based on their OAuth token credentials. How to filter and transform the pilot jobs to be run on your batch system: Filtering and transforming pilot jobs (i.e., setting site-specific attributes or resource limits), requires configuration of your site\u2019s job routes. For examples of common job routes, consult the job router configuration pages. How Security Works \u00b6 In the grid, security depends on a PKI infrastructure involving Certificate Authorities (CAs) where CAs sign and issue certificates. When these clients and hosts wish to communicate with each other, the identities of each party is confirmed by cross-checking their certificates with the signing CA and establishing trust. In its default configuration, HTCondor-CE supports token-based authentication and authorization to the remote submitter's credentials.","title":"Architecture"},{"location":"architecture/#how-jobs-run","text":"Once an incoming pilot job is authorized, it is placed into HTCondor-CE\u2019s scheduler where the Job Router creates a transformed copy (called the routed job ) and submits the copy to the batch system (called the batch system job ). After submission, HTCondor-CE monitors the batch system job and communicates its status to the original pilot job, which in turn notifies the original submitter (e.g., job factory) of any updates. When the batch job job completes, files are transferred along the same chain: from the batch system to the CE, then from the CE to the original submitter.","title":"How Jobs Run"},{"location":"architecture/#on-htcondor-batch-systems","text":"For a site with an HTCondor batch system , the Job Router uses HTCondor protocols to place a transformed copy of the pilot job directly into the batch system\u2019s scheduler, meaning that the routed job is also the batch system job. Thus, there are three representations of your job, each with its own ID (see diagram below): Submitter: the HTCondor job ID in the original queue HTCondor-CE: the incoming pilot job\u2019s ID HTCondor batch system: the routed job\u2019s ID In an HTCondor-CE/HTCondor setup, file transfer is handled natively between the two sets of daemons by the underlying HTCondor software. If you are running HTCondor as your batch system, you will have two HTCondor configurations side-by-side (one residing in /etc/condor/ and the other in /etc/condor-ce ) and will need to make sure to differentiate the two when modifying any configuration.","title":"On HTCondor batch systems"},{"location":"architecture/#on-other-batch-systems","text":"For non-HTCondor batch systems, the Job Router transforms the pilot job into a routed job on the CE and the routed job submits a job into the batch system via a process called the BLAHP. Thus, there are four representations of your job, each with its own ID (see diagram below): Submitter: the HTCondor job ID in the original queue HTCondor-CE: the incoming pilot job\u2019s ID and the routed job\u2019s ID Non-HTCondor batch system: the batch system\u2019s job ID Although the following figure specifies the PBS case, it applies to all non-HTCondor batch systems: With non-HTCondor batch systems, HTCondor-CE cannot use internal HTCondor protocols to transfer files so its \"spool\" directory must be exported to a shared file system that is mounted on the batch system\u2019s worker nodes.","title":"On other batch systems"},{"location":"architecture/#hosted-ce-over-ssh","text":"The Hosted CE is designed to be an HTCondor-CE as a Service offered by a central grid operations team. Hosted CEs submit jobs to remote clusters over SSH, providing a simple starting point for opportunistic resource owners that want to start contributing to a computing grid with minimal effort. If your site intends to run over 10,000 concurrent pilot jobs, you will need to host your own HTCondor-CE because the Hosted CE has not yet been optimized for such loads.","title":"Hosted CE over SSH"},{"location":"architecture/#how-the-ce-is-customized","text":"Aside from the [basic configuration] required in the CE installation, there are two main ways to customize your CE (if you decide any customization is required at all): Deciding which Virtual Organizations (VOs) are allowed to run at your site: HTCondor-CE leverages HTCondor's built-in ability to authenticate incoming jobs based on their OAuth token credentials. How to filter and transform the pilot jobs to be run on your batch system: Filtering and transforming pilot jobs (i.e., setting site-specific attributes or resource limits), requires configuration of your site\u2019s job routes. For examples of common job routes, consult the job router configuration pages.","title":"How the CE is Customized"},{"location":"architecture/#how-security-works","text":"In the grid, security depends on a PKI infrastructure involving Certificate Authorities (CAs) where CAs sign and issue certificates. When these clients and hosts wish to communicate with each other, the identities of each party is confirmed by cross-checking their certificates with the signing CA and establishing trust. In its default configuration, HTCondor-CE supports token-based authentication and authorization to the remote submitter's credentials.","title":"How Security Works"},{"location":"v23/operation/","text":"Operating an HTCondor-CE \u00b6 To verify that you have a working installation of HTCondor-CE, ensure that all the relevant services are started and enabled then perform the validation steps below. Managing HTCondor-CE services \u00b6 In addition to the HTCondor-CE job gateway service itself, there are a number of supporting services in your installation. The specific services are: Software Service name Your batch system condor or pbs_server or \u2026 HTCondor-CE condor-ce (Optional) APEL uploader condor-ce-apel and condor-ce-apel.timer Start and enable the services in the order listed and stop them in reverse order. As a reminder, here are common service commands (all run as root ): To... On EL7, run the command... Start a service systemctl start Stop a service systemctl stop Enable a service to start on boot systemctl enable Disable a service from starting on boot systemctl disable Validating HTCondor-CE \u00b6 To validate an HTCondor-CE, perform the following steps: Verify that local job submissions complete successfully from the CE host. For example, if you have a Slurm cluster, run sbatch from the CE and verify that it runs and completes with scontrol and sacct . Verify that all the necessary daemons are running with condor_ce_status -any . Verify the CE's network configuration using condor_ce_host_network_check . Verify that jobs can complete successfully using condor_ce_trace . Draining an HTCondor-CE \u00b6 To drain an HTCondor-CE of jobs, perform the following steps: Set CONDORCE_MAX_JOBS = 0 in /etc/condor-ce/config.d Run condor_ce_reconfig to apply the configuration change Use condor_ce_rm as needed to stop and remove any jobs that should stop running Once draining is completed, don't forget to restore the value of CONDORCE_MAX_JOBS to its previous value before trying to operate the HTCondor-CE again. Checking User Authentication \u00b6 The authentication method for submitting jobs to an HTCondor-CE is SciTokens. To see which authentication method and identity were used to submit a particular job (or modify existing jobs), you can look in /var/log/condor-ce/AuditLog . If SciTokens authentication was used, you'll see a set of lines like this: 10/15/21 17:54:08 (cid:130) (D_AUDIT) Command=QMGMT_WRITE_CMD, peer=<172.17.0.2:37869> 10/15/21 17:54:08 (cid:130) (D_AUDIT) AuthMethod=SCITOKENS, AuthId=https://demo.scitokens.org,htcondor-ce-dev, CondorId=testuser@users.htcondor.org 10/15/21 17:54:08 (cid:130) (D_AUDIT) Submitting new job 2.0 Lines pertaining to the same client request will have the same cid value. Lines from different client requests may be interleaved. Getting Help \u00b6 If any of the above validation steps fail, consult the troubleshooting guide . If that still doesn't resolve your issue, please contact us for assistance.","title":"Operation"},{"location":"v23/operation/#operating-an-htcondor-ce","text":"To verify that you have a working installation of HTCondor-CE, ensure that all the relevant services are started and enabled then perform the validation steps below.","title":"Operating an HTCondor-CE"},{"location":"v23/operation/#managing-htcondor-ce-services","text":"In addition to the HTCondor-CE job gateway service itself, there are a number of supporting services in your installation. The specific services are: Software Service name Your batch system condor or pbs_server or \u2026 HTCondor-CE condor-ce (Optional) APEL uploader condor-ce-apel and condor-ce-apel.timer Start and enable the services in the order listed and stop them in reverse order. As a reminder, here are common service commands (all run as root ): To... On EL7, run the command... Start a service systemctl start Stop a service systemctl stop Enable a service to start on boot systemctl enable Disable a service from starting on boot systemctl disable ","title":"Managing HTCondor-CE services"},{"location":"v23/operation/#validating-htcondor-ce","text":"To validate an HTCondor-CE, perform the following steps: Verify that local job submissions complete successfully from the CE host. For example, if you have a Slurm cluster, run sbatch from the CE and verify that it runs and completes with scontrol and sacct . Verify that all the necessary daemons are running with condor_ce_status -any . Verify the CE's network configuration using condor_ce_host_network_check . Verify that jobs can complete successfully using condor_ce_trace .","title":"Validating HTCondor-CE"},{"location":"v23/operation/#draining-an-htcondor-ce","text":"To drain an HTCondor-CE of jobs, perform the following steps: Set CONDORCE_MAX_JOBS = 0 in /etc/condor-ce/config.d Run condor_ce_reconfig to apply the configuration change Use condor_ce_rm as needed to stop and remove any jobs that should stop running Once draining is completed, don't forget to restore the value of CONDORCE_MAX_JOBS to its previous value before trying to operate the HTCondor-CE again.","title":"Draining an HTCondor-CE"},{"location":"v23/operation/#checking-user-authentication","text":"The authentication method for submitting jobs to an HTCondor-CE is SciTokens. To see which authentication method and identity were used to submit a particular job (or modify existing jobs), you can look in /var/log/condor-ce/AuditLog . If SciTokens authentication was used, you'll see a set of lines like this: 10/15/21 17:54:08 (cid:130) (D_AUDIT) Command=QMGMT_WRITE_CMD, peer=<172.17.0.2:37869> 10/15/21 17:54:08 (cid:130) (D_AUDIT) AuthMethod=SCITOKENS, AuthId=https://demo.scitokens.org,htcondor-ce-dev, CondorId=testuser@users.htcondor.org 10/15/21 17:54:08 (cid:130) (D_AUDIT) Submitting new job 2.0 Lines pertaining to the same client request will have the same cid value. Lines from different client requests may be interleaved.","title":"Checking User Authentication"},{"location":"v23/operation/#getting-help","text":"If any of the above validation steps fail, consult the troubleshooting guide . If that still doesn't resolve your issue, please contact us for assistance.","title":"Getting Help"},{"location":"v23/reference/","text":"Reference \u00b6 Configuration \u00b6 The following directories contain the configuration for HTCondor-CE. The directories are parsed in the order presented and thus configuration within the final directory will override configuration specified in the previous directories. Location Comment /usr/share/condor-ce/config.d/ Configuration defaults (overwritten on package updates) /etc/condor-ce/config.d/ Files in this directory are parsed in alphanumeric order (i.e., 99-local.conf will override values in 01-ce-auth.conf ) For a detailed order of the way configuration files are parsed, run the following command: user@host $ condor_ce_config_val -config Users \u00b6 The following users are needed by HTCondor-CE at all sites: User Comment condor The HTCondor-CE will be run as root, but perform most of its operations as the condor user. Certificates \u00b6 File User that owns certificate Path to certificate Host certificate root /etc/grid-security/hostcert.pem Host key root /grid-security/hostkey.pem Networking \u00b6 Service Name Protocol Port Number Inbound Outbound Comment Htcondor-CE tcp 9619 X HTCondor-CE shared port Allow inbound and outbound network connection to all internal site servers, such as the batch system head-node only ephemeral outgoing ports are necessary.","title":"Reference"},{"location":"v23/reference/#reference","text":"","title":"Reference"},{"location":"v23/reference/#configuration","text":"The following directories contain the configuration for HTCondor-CE. The directories are parsed in the order presented and thus configuration within the final directory will override configuration specified in the previous directories. Location Comment /usr/share/condor-ce/config.d/ Configuration defaults (overwritten on package updates) /etc/condor-ce/config.d/ Files in this directory are parsed in alphanumeric order (i.e., 99-local.conf will override values in 01-ce-auth.conf ) For a detailed order of the way configuration files are parsed, run the following command: user@host $ condor_ce_config_val -config","title":"Configuration"},{"location":"v23/reference/#users","text":"The following users are needed by HTCondor-CE at all sites: User Comment condor The HTCondor-CE will be run as root, but perform most of its operations as the condor user.","title":"Users"},{"location":"v23/reference/#certificates","text":"File User that owns certificate Path to certificate Host certificate root /etc/grid-security/hostcert.pem Host key root /grid-security/hostkey.pem","title":"Certificates"},{"location":"v23/reference/#networking","text":"Service Name Protocol Port Number Inbound Outbound Comment Htcondor-CE tcp 9619 X HTCondor-CE shared port Allow inbound and outbound network connection to all internal site servers, such as the batch system head-node only ephemeral outgoing ports are necessary.","title":"Networking"},{"location":"v23/releases/","text":"Releases \u00b6 HTCondor-CE 23 is distributed via RPM and are available from the following Yum repositories: HTCondor LTS and Feature Releases The OSG Consortium Known Issues \u00b6 Known bugs affecting HTCondor-CEs can be found in Jira Updating to HTCondor-CE 23 \u00b6 Updating from HTCondor-CE < 6 If updating to HTCondor-CE 23 from HTCondor-CE < 6, be sure to also consult the HTCondor-CE 6 upgrade instructions . Finding relevant configuration changes When updating HTCondor-CE RPMs, .rpmnew and .rpmsave files may be created containing new defaults that you should merge or new defaults that have replaced your customzations, respectively. To find these files for HTCondor-CE, run the following command: root@host # find /etc/condor-ce/ -name '*.rpmnew' -name '*.rpmsave' HTCondor-CE 23 is very close in functionality to HTCondor-CE 6. As such, upgrading should be very easy. HTCondor-CE 23 Version History \u00b6 This section contains release notes for each version of HTCondor-CE 23. Full HTCondor-CE version history can be found on GitHub . October 30, 2024: 23.10.1 \u00b6 This release includes the following new features: Fix certificate subject parsing in condor_ce_host_network_check October 30, 2024: 23.0.17 \u00b6 This release includes the following new features: Remove obsolete GSI configuration August 8, 2024: 23.9.1 \u00b6 This release includes the following new features: Use new Job Router syntax by default Update configuration files to work with HTCondor 23.9.1 and later Use BatchQueue job attribute in CE routes July 24, 2024: 23.0.13 \u00b6 This release includes the following new features: Package condor_ce_upgrade_check July 16, 2024: 23.0.12 \u00b6 This release includes the following new features: Fix whole node GPU request expression for non-HTCondor batch systems April 11, 2024: 23.0.8 \u00b6 This release includes the following new features: Fix memory request being ignored for whole node jobs March 14, 2024: 23.0.6 \u00b6 This release includes the following new features: Fix CE job route transform for job environment Fix CERequirements when the default_CERequirements is not set Add condor_ce_test_token tool to generate short lived SciToken for tests Remove GSI from security method list to eliminate annoying warnings January 4, 2024: 23.0.3 \u00b6 This release includes the following new features: Ensure that jobs requesting GPUs land on HTCondor EPs with GPUs November 16, 2023: 23.0.1 \u00b6 This release includes the following new features: Add condor_ce_test_token command September 21, 2023: 23.0.0 \u00b6 This release includes the following new features: Add grid CA and host certificate/key locations to default SSL search paths Verifies that HTCondor-CE can access the local HTCondor's SPOOL directory Can use condor_ce_trace without SciToken to test batch system integration condor_ce_upgrade_check checks compatibility with HTCondor 23.0 Adds deprecation warnings for old job router configuration syntax Getting Help \u00b6 If you have any questions about the release process or run into issues with an upgrade, please contact us for assistance.","title":"Releases"},{"location":"v23/releases/#releases","text":"HTCondor-CE 23 is distributed via RPM and are available from the following Yum repositories: HTCondor LTS and Feature Releases The OSG Consortium","title":"Releases"},{"location":"v23/releases/#known-issues","text":"Known bugs affecting HTCondor-CEs can be found in Jira","title":"Known Issues"},{"location":"v23/releases/#updating-to-htcondor-ce-23","text":"Updating from HTCondor-CE < 6 If updating to HTCondor-CE 23 from HTCondor-CE < 6, be sure to also consult the HTCondor-CE 6 upgrade instructions . Finding relevant configuration changes When updating HTCondor-CE RPMs, .rpmnew and .rpmsave files may be created containing new defaults that you should merge or new defaults that have replaced your customzations, respectively. To find these files for HTCondor-CE, run the following command: root@host # find /etc/condor-ce/ -name '*.rpmnew' -name '*.rpmsave' HTCondor-CE 23 is very close in functionality to HTCondor-CE 6. As such, upgrading should be very easy.","title":"Updating to HTCondor-CE 23"},{"location":"v23/releases/#htcondor-ce-23-version-history","text":"This section contains release notes for each version of HTCondor-CE 23. Full HTCondor-CE version history can be found on GitHub .","title":"HTCondor-CE 23 Version History"},{"location":"v23/releases/#october-30-2024-23101","text":"This release includes the following new features: Fix certificate subject parsing in condor_ce_host_network_check","title":"October 30, 2024: 23.10.1"},{"location":"v23/releases/#october-30-2024-23017","text":"This release includes the following new features: Remove obsolete GSI configuration","title":"October 30, 2024: 23.0.17"},{"location":"v23/releases/#august-8-2024-2391","text":"This release includes the following new features: Use new Job Router syntax by default Update configuration files to work with HTCondor 23.9.1 and later Use BatchQueue job attribute in CE routes","title":"August 8, 2024: 23.9.1"},{"location":"v23/releases/#july-24-2024-23013","text":"This release includes the following new features: Package condor_ce_upgrade_check","title":"July 24, 2024: 23.0.13"},{"location":"v23/releases/#july-16-2024-23012","text":"This release includes the following new features: Fix whole node GPU request expression for non-HTCondor batch systems","title":"July 16, 2024: 23.0.12"},{"location":"v23/releases/#april-11-2024-2308","text":"This release includes the following new features: Fix memory request being ignored for whole node jobs","title":"April 11, 2024: 23.0.8"},{"location":"v23/releases/#march-14-2024-2306","text":"This release includes the following new features: Fix CE job route transform for job environment Fix CERequirements when the default_CERequirements is not set Add condor_ce_test_token tool to generate short lived SciToken for tests Remove GSI from security method list to eliminate annoying warnings","title":"March 14, 2024: 23.0.6"},{"location":"v23/releases/#january-4-2024-2303","text":"This release includes the following new features: Ensure that jobs requesting GPUs land on HTCondor EPs with GPUs","title":"January 4, 2024: 23.0.3"},{"location":"v23/releases/#november-16-2023-2301","text":"This release includes the following new features: Add condor_ce_test_token command","title":"November 16, 2023: 23.0.1"},{"location":"v23/releases/#september-21-2023-2300","text":"This release includes the following new features: Add grid CA and host certificate/key locations to default SSL search paths Verifies that HTCondor-CE can access the local HTCondor's SPOOL directory Can use condor_ce_trace without SciToken to test batch system integration condor_ce_upgrade_check checks compatibility with HTCondor 23.0 Adds deprecation warnings for old job router configuration syntax","title":"September 21, 2023: 23.0.0"},{"location":"v23/releases/#getting-help","text":"If you have any questions about the release process or run into issues with an upgrade, please contact us for assistance.","title":"Getting Help"},{"location":"v23/remote-job-submission/","text":"Submitting Jobs Remotely to an HTCondor-CE \u00b6 This document outlines how to submit jobs to an HTCondor-CE from a remote client using two different methods: With dedicated tools for quickly verifying end-to-end job submission, and From an existing HTCondor submit host, useful for developing pilot submission infrastructure If you are the administrator of an HTCondor-CE, consider verifying your HTCondor-CE using the administrator-focused documentation . Before Starting \u00b6 Before attempting to submit jobs to an HTCondor-CE as documented below, ensure the following: The HTCondor-CE administrator has independently verified their HTCondor-CE The HTCondor-CE administrator has added your credential information (e.g. SciToken or grid proxy) to the HTCondor-CE authentication configuration Your credentials are valid and unexpired Submission with Debugging Tools \u00b6 The HTCondor-CE client contains debugging tools designed to quickly test an HTCondor-CE. To use these tools, install the RPM package from the relevant Yum repository : root@host # yum install htcondor-ce-client Verify end-to-end submission \u00b6 The HTCondor-CE client package includes a debugging tool that perform tests of end-to-end job submission called condor_ce_trace . To submit a diagnostic job with condor_ce_trace , run the following command: user@host $ condor_ce_trace --debug Replacing with the hostname of the CE you wish to test. On success, you will see Job status: Completed and the job's environment on the worker node where it ran. If you do not see the expected output, refer to the troubleshooting guide . CONDOR_CE_TRACE_ATTEMPTS For a busy site cluster, it may take longer than the default 5 minutes to test end-to-end submission. To extend the length of time that condor_ce_trace waits for the job to complete, prepend the command with _condor_CONDOR_CE_TRACE_ATTEMPTS= . (Optional) Requesting resources \u00b6 condor_ce_trace doesn't make any specific resource requests so its jobs are only given the default resources as configured by the HTCondor-CE you are debugging. To request specific resources (or other job attributes), you can specify the --attribute option on the command line: user@host $ condor_ce_trace --debug \\ --attribute='+resource1=value1'... \\ --attribute='+resourceN=valueN' \\ ce.htcondor.org For example, the following command submits a test job requesting 4 cores, 4 GB of RAM, a wall clock time of 2 hours, and the 'osg' queue, run the following command: user@host $ condor_ce_trace --debug \\ --attribute='+xcount=4' \\ --attribute='+maxMemory=4000' \\ --attribute='+maxWallTime=120' \\ --attribute='+remote_queue=osg' \\ ce.htcondor.org For a list of other attributes that can be set with the --attribute option, consult the submit file commands section. Note Non-HTCondor batch systems may need additional HTCondor-CE configuration to support these job attributes. See the batch system integration for details on how to support them. Submission with HTCondor Submit \u00b6 If you need to submit more complicated jobs than a trace job as described above (e.g. for developing piilot job submission infrastructures) and have access to an HTCondor submit host, you can use standard HTCondor submission tools. Submit the job \u00b6 To submit jobs to a remote HTCondor-CE (or any other externally facing HTCondor SchedD) from an HTCondor submit host, you need to construct an HTCondor submit file describing an HTCondor-C job : Write a submit file, ce_test.sub : # Required for remote HTCondor-CE submission universe = grid use_x509userproxy = true grid_resource = condor ce.htcondor.org ce.htcondor.org:9619 # Files executable = ce_test.sh output = ce_test.out error = ce_test.err log = ce_test.log # File transfer behavior ShouldTransferFiles = YES WhenToTransferOutput = ON_EXIT # Optional resource requests #+xcount = 4 # Request 4 cores #+maxMemory = 4000 # Request 4GB of RAM #+maxWallTime = 120 # Request 2 hrs of wall clock time #+remote_queue = \"osg\" # Request the OSG queue # Submit a single job queue Replacing ce_test.sh with the path to the executable you wish to run and ce.htcondor.org with the hostname of the CE you wish to test. Note The grid_resource line should start with condor and is not related to which batch system you are using. Submit the job: user@host $ condor_submit ce_test.sub Tracking job progress \u00b6 You can track job progress by by querying the local queue: user@host $ condor__q As well as the remote HTCondor-CE queue: user@host $ condor__q -name -pool :9619 Replacing with the FQDN of the HTCondor-CE. For reference, condor_q -help status will provide details of job status codes. user@host $ condor_q -help status | tail JobStatus codes: 1 I IDLE 2 R RUNNING 3 X REMOVED 4 C COMPLETED 5 H HELD 6 > TRANSFERRING_OUTPUT 7 S SUSPENDED Troubleshooting \u00b6 All interactions between condor_submit and the HTCondor-CE will be recorded in the file specified by the log command in your submit file. This includes acknowledgement of the job in your local queue, connection to the CE, and a record of job completion: 000 (786.000.000) 12/09 16:49:55 Job submitted from host: <131.225.154.68:53134> ... 027 (786.000.000) 12/09 16:50:09 Job submitted to grid resource GridResource: condor ce.htcondor.org ce.htcondor.org:9619 GridJobId: condor ce.htcondor.org ce.htcondor.org:9619 796.0 ... 005 (786.000.000) 12/09 16:52:19 Job terminated. (1) Normal termination (return value 0) Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage 0 - Run Bytes Sent By Job 0 - Run Bytes Received By Job 0 - Total Bytes Sent By Job 0 - Total Bytes Received By Job If there are issues contacting the HTCondor-CE, you will see error messages about a Down Globus Resource : 020 (788.000.000) 12/09 16:56:17 Detected Down Globus Resource RM-Contact: ce.htcondor.org ... 026 (788.000.000) 12/09 16:56:17 Detected Down Grid Resource GridResource: condor ce.htcondor.org ce.htcondor.org:9619 This indicates a communication issue with the HTCondor-CE that may be diagnosed with condor_ce_ping . Submit File Commands \u00b6 The following table is a reference of commands that are commonly included in HTCondor submit files used for HTCondor-CE resource allocation requests. A more comprehensive list of submit file commands specific to HTCondor can be found in the HTCondor manual . HTCondor string values If you are setting an attribute to a string value, make sure enclose the string in double-quotes ( \" ) Command Description arguments Arguments that will be provided to the executable for the resource allocation request. error Path to the file on the client host that stores stderr from the resource allocation request. executable Path to the file on the client host that the resource allocation request will execute. input Path to the file on the client host that stores input to be piped into the stdin of the resource allocation request. +maxMemory The amount of memory in MB that you wish to allocate to the resource allocation request. +maxWallTime The maximum walltime (in minutes) the resource allocation request is allowed to run before it is removed. output Path to the file on the client host that stores stdout from the resource allocation request. +remote_queue Assign resource allocation request to the target queue in the scheduler. transfer_input_files A comma-delimited list of all the files and directories to be transferred into the working directory for the resource allocation request, before the resource allocation request is started. transfer_output_files A comma-delimited list of all the files and directories to be transferred back to the client, after the resource allocation request completes. +WantWholeNode When set to True , request entire node for the resource allocation request (HTCondor batch systems only) +xcount The number of cores to allocate for the resource allocation request. Getting Help \u00b6 If you have any questions or issues with job submission, please contact us for assistance.","title":"Submit Jobs Remotely"},{"location":"v23/remote-job-submission/#submitting-jobs-remotely-to-an-htcondor-ce","text":"This document outlines how to submit jobs to an HTCondor-CE from a remote client using two different methods: With dedicated tools for quickly verifying end-to-end job submission, and From an existing HTCondor submit host, useful for developing pilot submission infrastructure If you are the administrator of an HTCondor-CE, consider verifying your HTCondor-CE using the administrator-focused documentation .","title":"Submitting Jobs Remotely to an HTCondor-CE"},{"location":"v23/remote-job-submission/#before-starting","text":"Before attempting to submit jobs to an HTCondor-CE as documented below, ensure the following: The HTCondor-CE administrator has independently verified their HTCondor-CE The HTCondor-CE administrator has added your credential information (e.g. SciToken or grid proxy) to the HTCondor-CE authentication configuration Your credentials are valid and unexpired","title":"Before Starting"},{"location":"v23/remote-job-submission/#submission-with-debugging-tools","text":"The HTCondor-CE client contains debugging tools designed to quickly test an HTCondor-CE. To use these tools, install the RPM package from the relevant Yum repository : root@host # yum install htcondor-ce-client","title":"Submission with Debugging Tools"},{"location":"v23/remote-job-submission/#verify-end-to-end-submission","text":"The HTCondor-CE client package includes a debugging tool that perform tests of end-to-end job submission called condor_ce_trace . To submit a diagnostic job with condor_ce_trace , run the following command: user@host $ condor_ce_trace --debug Replacing with the hostname of the CE you wish to test. On success, you will see Job status: Completed and the job's environment on the worker node where it ran. If you do not see the expected output, refer to the troubleshooting guide . CONDOR_CE_TRACE_ATTEMPTS For a busy site cluster, it may take longer than the default 5 minutes to test end-to-end submission. To extend the length of time that condor_ce_trace waits for the job to complete, prepend the command with _condor_CONDOR_CE_TRACE_ATTEMPTS= .","title":"Verify end-to-end submission"},{"location":"v23/remote-job-submission/#optional-requesting-resources","text":"condor_ce_trace doesn't make any specific resource requests so its jobs are only given the default resources as configured by the HTCondor-CE you are debugging. To request specific resources (or other job attributes), you can specify the --attribute option on the command line: user@host $ condor_ce_trace --debug \\ --attribute='+resource1=value1'... \\ --attribute='+resourceN=valueN' \\ ce.htcondor.org For example, the following command submits a test job requesting 4 cores, 4 GB of RAM, a wall clock time of 2 hours, and the 'osg' queue, run the following command: user@host $ condor_ce_trace --debug \\ --attribute='+xcount=4' \\ --attribute='+maxMemory=4000' \\ --attribute='+maxWallTime=120' \\ --attribute='+remote_queue=osg' \\ ce.htcondor.org For a list of other attributes that can be set with the --attribute option, consult the submit file commands section. Note Non-HTCondor batch systems may need additional HTCondor-CE configuration to support these job attributes. See the batch system integration for details on how to support them.","title":"(Optional) Requesting resources"},{"location":"v23/remote-job-submission/#submission-with-htcondor-submit","text":"If you need to submit more complicated jobs than a trace job as described above (e.g. for developing piilot job submission infrastructures) and have access to an HTCondor submit host, you can use standard HTCondor submission tools.","title":"Submission with HTCondor Submit"},{"location":"v23/remote-job-submission/#submit-the-job","text":"To submit jobs to a remote HTCondor-CE (or any other externally facing HTCondor SchedD) from an HTCondor submit host, you need to construct an HTCondor submit file describing an HTCondor-C job : Write a submit file, ce_test.sub : # Required for remote HTCondor-CE submission universe = grid use_x509userproxy = true grid_resource = condor ce.htcondor.org ce.htcondor.org:9619 # Files executable = ce_test.sh output = ce_test.out error = ce_test.err log = ce_test.log # File transfer behavior ShouldTransferFiles = YES WhenToTransferOutput = ON_EXIT # Optional resource requests #+xcount = 4 # Request 4 cores #+maxMemory = 4000 # Request 4GB of RAM #+maxWallTime = 120 # Request 2 hrs of wall clock time #+remote_queue = \"osg\" # Request the OSG queue # Submit a single job queue Replacing ce_test.sh with the path to the executable you wish to run and ce.htcondor.org with the hostname of the CE you wish to test. Note The grid_resource line should start with condor and is not related to which batch system you are using. Submit the job: user@host $ condor_submit ce_test.sub","title":"Submit the job"},{"location":"v23/remote-job-submission/#tracking-job-progress","text":"You can track job progress by by querying the local queue: user@host $ condor__q As well as the remote HTCondor-CE queue: user@host $ condor__q -name -pool :9619 Replacing with the FQDN of the HTCondor-CE. For reference, condor_q -help status will provide details of job status codes. user@host $ condor_q -help status | tail JobStatus codes: 1 I IDLE 2 R RUNNING 3 X REMOVED 4 C COMPLETED 5 H HELD 6 > TRANSFERRING_OUTPUT 7 S SUSPENDED","title":"Tracking job progress"},{"location":"v23/remote-job-submission/#troubleshooting","text":"All interactions between condor_submit and the HTCondor-CE will be recorded in the file specified by the log command in your submit file. This includes acknowledgement of the job in your local queue, connection to the CE, and a record of job completion: 000 (786.000.000) 12/09 16:49:55 Job submitted from host: <131.225.154.68:53134> ... 027 (786.000.000) 12/09 16:50:09 Job submitted to grid resource GridResource: condor ce.htcondor.org ce.htcondor.org:9619 GridJobId: condor ce.htcondor.org ce.htcondor.org:9619 796.0 ... 005 (786.000.000) 12/09 16:52:19 Job terminated. (1) Normal termination (return value 0) Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage 0 - Run Bytes Sent By Job 0 - Run Bytes Received By Job 0 - Total Bytes Sent By Job 0 - Total Bytes Received By Job If there are issues contacting the HTCondor-CE, you will see error messages about a Down Globus Resource : 020 (788.000.000) 12/09 16:56:17 Detected Down Globus Resource RM-Contact: ce.htcondor.org ... 026 (788.000.000) 12/09 16:56:17 Detected Down Grid Resource GridResource: condor ce.htcondor.org ce.htcondor.org:9619 This indicates a communication issue with the HTCondor-CE that may be diagnosed with condor_ce_ping .","title":"Troubleshooting"},{"location":"v23/remote-job-submission/#submit-file-commands","text":"The following table is a reference of commands that are commonly included in HTCondor submit files used for HTCondor-CE resource allocation requests. A more comprehensive list of submit file commands specific to HTCondor can be found in the HTCondor manual . HTCondor string values If you are setting an attribute to a string value, make sure enclose the string in double-quotes ( \" ) Command Description arguments Arguments that will be provided to the executable for the resource allocation request. error Path to the file on the client host that stores stderr from the resource allocation request. executable Path to the file on the client host that the resource allocation request will execute. input Path to the file on the client host that stores input to be piped into the stdin of the resource allocation request. +maxMemory The amount of memory in MB that you wish to allocate to the resource allocation request. +maxWallTime The maximum walltime (in minutes) the resource allocation request is allowed to run before it is removed. output Path to the file on the client host that stores stdout from the resource allocation request. +remote_queue Assign resource allocation request to the target queue in the scheduler. transfer_input_files A comma-delimited list of all the files and directories to be transferred into the working directory for the resource allocation request, before the resource allocation request is started. transfer_output_files A comma-delimited list of all the files and directories to be transferred back to the client, after the resource allocation request completes. +WantWholeNode When set to True , request entire node for the resource allocation request (HTCondor batch systems only) +xcount The number of cores to allocate for the resource allocation request.","title":"Submit File Commands"},{"location":"v23/remote-job-submission/#getting-help","text":"If you have any questions or issues with job submission, please contact us for assistance.","title":"Getting Help"},{"location":"v23/configuration/authentication/","text":"Configuring Authentication \u00b6 To authenticate job submission from external users and VOs, the HTCondor-CE service uses X.509 certificates for SciTokens and SSL authentication. Built-in Mapfiles \u00b6 HTCondor-CE uses unified HTCondor mapfiles stored in /etc/condor-ce/mapfiles.d/*.conf to map incoming jobs with credentials to local Unix accounts. These files are parsed in lexicographic order and HTCondor-CE will use the first line that matches for the authentication method that the client and your HTCondor-CE negotiates. Each mapfile line consists of three fields: HTCondor authentication method Incoming credential principal formatted as a Perl Compatible Regular Expression (PCRE) Local account Applying mapping changes When changing your HTCondor-CE mappings, run condor_ce_reconfig to apply your changes. SciTokens \u00b6 To allow clients with SciToken or WLCG tokens to submit jobs to your HTCondor-CE, add lines of the following format: SCITOKENS /,/ Replacing (escaping any / with \\/ , , and with the token issuer ( iss ), token subject ( sub ), and the unix account under which the job should run, respectively. For example, to map any token from the OSG VO regardless of the token sub , add the following line to a *.conf file in /etc/condor-ce/mapfiles.d/ : SCITOKENS /^https:\\/\\/scitokens.org\\/osg-connect,.*/ osg Configuring Certificates \u00b6 HTCondor-CE uses X.509 host certificates and certificate authorities (CAs) when authenticating SciToken and SSL connections. By default, HTCondor-CE uses the default system locations to locate CAs and host certificate when authenticating SciToken and SSL connections. But traditionally, CEs and their clients have authenticated with each other using specialized grid certificates (e.g. certificates issued by IGTF CAs ) located in /etc/grid-security/ . Choose one of the following options to configure your HTCondor-CE to use grid or system certificates for authentication: If your SSL or SciTokens clients will be interacting with your CE using grid certificates or you are using a grid certificate as your host certificate: Set the following configuration in /etc/condor-ce/config.d/01-ce-auth.conf : AUTH_SSL_SERVER_CERTFILE = /etc/grid-security/hostcert.pem AUTH_SSL_SERVER_KEYFILE = /etc/grid-security/hostkey.pem AUTH_SSL_SERVER_CADIR = /etc/grid-security/certificates AUTH_SSL_SERVER_CAFILE = AUTH_SSL_CLIENT_CERTFILE = /etc/grid-security/hostcert.pem AUTH_SSL_CLIENT_KEYFILE = /etc/grid-security/hostkey.pem AUTH_SSL_CLIENT_CADIR = /etc/grid-security/certificates AUTH_SSL_CLIENT_CAFILE = Install your host certificate and key into /etc/grid-security/hostcert.pem and /etc/grid-security/hostkey.pem , respectively Set the ownership and Unix permissions of the host certificate and key root@host # chown root:root /etc/grid-security/hostcert.pem /etc/grid-security/hostkey.pem root@host # chmod 644 /etc/grid-security/hostcert.pem root@host # chmod 600 /etc/grid-security/hostkey.pem Otherwise, use the default system locations: Install your host certificate and key into /etc/pki/tls/certs/localhost.crt and /etc/pki/tls/private/localhost.key , respectively Set the ownership and Unix permissions of the host certificate and key root@host # chown root:root /etc/pki/tls/certs/localhost.crt /etc/pki/tls/private/localhost.key root@host # chmod 644 /etc/pki/tls/certs/localhost.crt root@host # chmod 600 /etc/pki/tls/private/localhost.key Next Steps \u00b6 At this point, you should have an HTCondor-CE that will take credentials from incoming jobs and map them to local Unix accounts. The next step is to configure the CE for your local batch system so that HTCondor-CE knows where to route your jobs.","title":"Authentication"},{"location":"v23/configuration/authentication/#configuring-authentication","text":"To authenticate job submission from external users and VOs, the HTCondor-CE service uses X.509 certificates for SciTokens and SSL authentication.","title":"Configuring Authentication"},{"location":"v23/configuration/authentication/#built-in-mapfiles","text":"HTCondor-CE uses unified HTCondor mapfiles stored in /etc/condor-ce/mapfiles.d/*.conf to map incoming jobs with credentials to local Unix accounts. These files are parsed in lexicographic order and HTCondor-CE will use the first line that matches for the authentication method that the client and your HTCondor-CE negotiates. Each mapfile line consists of three fields: HTCondor authentication method Incoming credential principal formatted as a Perl Compatible Regular Expression (PCRE) Local account Applying mapping changes When changing your HTCondor-CE mappings, run condor_ce_reconfig to apply your changes.","title":"Built-in Mapfiles"},{"location":"v23/configuration/authentication/#scitokens","text":"To allow clients with SciToken or WLCG tokens to submit jobs to your HTCondor-CE, add lines of the following format: SCITOKENS /,/ Replacing (escaping any / with \\/ , , and with the token issuer ( iss ), token subject ( sub ), and the unix account under which the job should run, respectively. For example, to map any token from the OSG VO regardless of the token sub , add the following line to a *.conf file in /etc/condor-ce/mapfiles.d/ : SCITOKENS /^https:\\/\\/scitokens.org\\/osg-connect,.*/ osg","title":"SciTokens"},{"location":"v23/configuration/authentication/#configuring-certificates","text":"HTCondor-CE uses X.509 host certificates and certificate authorities (CAs) when authenticating SciToken and SSL connections. By default, HTCondor-CE uses the default system locations to locate CAs and host certificate when authenticating SciToken and SSL connections. But traditionally, CEs and their clients have authenticated with each other using specialized grid certificates (e.g. certificates issued by IGTF CAs ) located in /etc/grid-security/ . Choose one of the following options to configure your HTCondor-CE to use grid or system certificates for authentication: If your SSL or SciTokens clients will be interacting with your CE using grid certificates or you are using a grid certificate as your host certificate: Set the following configuration in /etc/condor-ce/config.d/01-ce-auth.conf : AUTH_SSL_SERVER_CERTFILE = /etc/grid-security/hostcert.pem AUTH_SSL_SERVER_KEYFILE = /etc/grid-security/hostkey.pem AUTH_SSL_SERVER_CADIR = /etc/grid-security/certificates AUTH_SSL_SERVER_CAFILE = AUTH_SSL_CLIENT_CERTFILE = /etc/grid-security/hostcert.pem AUTH_SSL_CLIENT_KEYFILE = /etc/grid-security/hostkey.pem AUTH_SSL_CLIENT_CADIR = /etc/grid-security/certificates AUTH_SSL_CLIENT_CAFILE = Install your host certificate and key into /etc/grid-security/hostcert.pem and /etc/grid-security/hostkey.pem , respectively Set the ownership and Unix permissions of the host certificate and key root@host # chown root:root /etc/grid-security/hostcert.pem /etc/grid-security/hostkey.pem root@host # chmod 644 /etc/grid-security/hostcert.pem root@host # chmod 600 /etc/grid-security/hostkey.pem Otherwise, use the default system locations: Install your host certificate and key into /etc/pki/tls/certs/localhost.crt and /etc/pki/tls/private/localhost.key , respectively Set the ownership and Unix permissions of the host certificate and key root@host # chown root:root /etc/pki/tls/certs/localhost.crt /etc/pki/tls/private/localhost.key root@host # chmod 644 /etc/pki/tls/certs/localhost.crt root@host # chmod 600 /etc/pki/tls/private/localhost.key","title":"Configuring Certificates"},{"location":"v23/configuration/authentication/#next-steps","text":"At this point, you should have an HTCondor-CE that will take credentials from incoming jobs and map them to local Unix accounts. The next step is to configure the CE for your local batch system so that HTCondor-CE knows where to route your jobs.","title":"Next Steps"},{"location":"v23/configuration/htcondor-routes/","text":"For HTCondor Batch Systems \u00b6 This page contains information about job routes that can be used if you are running an HTCondor pool at your site. Setting periodic hold or release \u00b6 Avoid setting PERIODIC_REMOVE expressions The HTCondor Job Router will automatically resubmit jobs that are removed by the underlying batch system, which can result in unintended churn. Therefore, it is recommended to append removal expressions to HTCondor-CE's configuration by adding the following to a file in /etc/condor-ce/config.d/ SYSTEM_PERIODIC_REMOVE = $(SYSTEM_PERIODIC_REMOVE) || To release or put routed jobs on hold if they meet certain criteria, use the Periodic* family of attributes. By default, periodic expressions are evaluated once every 300 seconds but this can be changed by setting PERIODIC_EXPR_INTERVAL in your local HTCondor configuration. In this example, we set the routed job on hold if the job is idle and has been started at least once or if the job has tried to start more than once. This will catch jobs which are starting and stopping multiple times. ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA # Puts the routed job on hold if the job's been idle and has been started at least # once or if the job has tried to start more than once SET PeriodicHold ((NumJobStarts >= 1 && JobStatus == 1) || NumJobStarts > 1) # Release routed jobs if the condor_starter couldn't start the executable and # 'VMGAHP_ERR_INTERNAL' is in the HoldReason SET PeriodicRelease = (HoldReasonCode == 6 && regexp(\"VMGAHP_ERR_INTERNAL\", HoldReason)) @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; # Puts the routed job on hold if the job's been idle and has been started at least # once or if the job has tried to start more than once set_PeriodicHold = (NumJobStarts >= 1 && JobStatus == 1) || NumJobStarts > 1; # Release routed jobs if the condor_starter couldn't start the executable and # 'VMGAHP_ERR_INTERNAL' is in the HoldReason set_PeriodicRelease = HoldReasonCode == 6 && regexp(\"VMGAHP_ERR_INTERNAL\", HoldReason); ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool Setting routed job requirements \u00b6 If you need to set requirements on your routed job, you will need to use SET REQUIREMENTS or set_Requirements instead of Requirements for ClassAd transform and deprecated syntaxes, respectively. The Requirements attribute filters jobs coming into your CE into different job routes whereas the set function will set conditions on the routed job that must be met by the worker node it lands on. For more information on requirements, consult the HTCondor manual . To ensure that your job lands on a Linux machine in your pool: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @jrt UNIVERSE VANILLA SET Requirements = (TARGET.OpSys == \"LINUX\") @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; set_Requirements = (TARGET.OpSys == \"LINUX\"); ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool Preserving original job requirements \u00b6 To preserve and include the original job requirements, rather than just setting new requirements, you can use COPY Requirements or copy_Requirements to store the current value of Requirements to another variable, which we'll call original_requirements . To do this, replace the above SET Requirements or set_Requirements lines with: ClassAd Transform SET Requirements = ($(MY.Requirements)) && () Deprecated Syntax copy_Requirements = \"original_requirements\"; set_Requirements = original_requirements && ...; Setting the accounting group based on the credential of the submitted job \u00b6 A common need in the CE is to want to set the accounting identity of the routed job using information from the credential of the submitter of the job. This originally was done using information from the x509 certificate, in particular X509UserProxyVOName and x509UserProxySubject . With the switch to SCITOKENs, the equivalent job attributes are AuthTokenIssuer and AuthTokenSubject . It is important to understand that the condor_schedd treats AuthTokenSubject and AuthTokenIssuer as secure attributes. The values of these attributes cannot be supplied by the condor_job_router directly, they will be set based on what credential the condor_job_router uses to submit the routed job. Because of this the value of these attributes in the routed job is almost never the same as the value in the original job. This is different from the way the x509* job attributes behaved. Because of this, the default CE config will copy all attributes that match AuthToken* to orig_AuthToken* before the route transforms are applied. Example of setting the accounting group from AuthToken or x509 attributes. ClassAd Transform JOB_ROUTER_CLASSAD_USER_MAP_NAMES = $(JOB_ROUTER_CLASSAD_USER_MAP_NAMES) AcctGroupMap CLASSAD_USER_MAPFILE_AcctGroupMap = JOB_ROUTER_TRANSFORM_SetAcctGroup @=end REQUIREMENTS (orig_AuthTokenSubject ?: x509UserProxySubject) isnt undefined EVALSET AcctGroup UserMap(\"AcctGroupMap\", orig_AuthTokenSubject ?: x509UserProxySubject, AcctGroup) EVALSET AccountingGroup join(\".\", AcctGroup, Owner) @end JOB_ROUTER_PRE_ROUTE_TRANSFORMS = $(JOB_ROUTER_PRE_ROUTE_TRANSFORMS) SetAcctGroup Refer to the HTCondor documentation for information on mapfiles . Getting Help \u00b6 If you have any questions or issues with configuring job routes, please contact us for assistance.","title":"For HTCondor Batch Systems"},{"location":"v23/configuration/htcondor-routes/#for-htcondor-batch-systems","text":"This page contains information about job routes that can be used if you are running an HTCondor pool at your site.","title":"For HTCondor Batch Systems"},{"location":"v23/configuration/htcondor-routes/#setting-periodic-hold-or-release","text":"Avoid setting PERIODIC_REMOVE expressions The HTCondor Job Router will automatically resubmit jobs that are removed by the underlying batch system, which can result in unintended churn. Therefore, it is recommended to append removal expressions to HTCondor-CE's configuration by adding the following to a file in /etc/condor-ce/config.d/ SYSTEM_PERIODIC_REMOVE = $(SYSTEM_PERIODIC_REMOVE) || To release or put routed jobs on hold if they meet certain criteria, use the Periodic* family of attributes. By default, periodic expressions are evaluated once every 300 seconds but this can be changed by setting PERIODIC_EXPR_INTERVAL in your local HTCondor configuration. In this example, we set the routed job on hold if the job is idle and has been started at least once or if the job has tried to start more than once. This will catch jobs which are starting and stopping multiple times. ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA # Puts the routed job on hold if the job's been idle and has been started at least # once or if the job has tried to start more than once SET PeriodicHold ((NumJobStarts >= 1 && JobStatus == 1) || NumJobStarts > 1) # Release routed jobs if the condor_starter couldn't start the executable and # 'VMGAHP_ERR_INTERNAL' is in the HoldReason SET PeriodicRelease = (HoldReasonCode == 6 && regexp(\"VMGAHP_ERR_INTERNAL\", HoldReason)) @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; # Puts the routed job on hold if the job's been idle and has been started at least # once or if the job has tried to start more than once set_PeriodicHold = (NumJobStarts >= 1 && JobStatus == 1) || NumJobStarts > 1; # Release routed jobs if the condor_starter couldn't start the executable and # 'VMGAHP_ERR_INTERNAL' is in the HoldReason set_PeriodicRelease = HoldReasonCode == 6 && regexp(\"VMGAHP_ERR_INTERNAL\", HoldReason); ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool","title":"Setting periodic hold or release"},{"location":"v23/configuration/htcondor-routes/#setting-routed-job-requirements","text":"If you need to set requirements on your routed job, you will need to use SET REQUIREMENTS or set_Requirements instead of Requirements for ClassAd transform and deprecated syntaxes, respectively. The Requirements attribute filters jobs coming into your CE into different job routes whereas the set function will set conditions on the routed job that must be met by the worker node it lands on. For more information on requirements, consult the HTCondor manual . To ensure that your job lands on a Linux machine in your pool: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @jrt UNIVERSE VANILLA SET Requirements = (TARGET.OpSys == \"LINUX\") @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; set_Requirements = (TARGET.OpSys == \"LINUX\"); ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool","title":"Setting routed job requirements"},{"location":"v23/configuration/htcondor-routes/#preserving-original-job-requirements","text":"To preserve and include the original job requirements, rather than just setting new requirements, you can use COPY Requirements or copy_Requirements to store the current value of Requirements to another variable, which we'll call original_requirements . To do this, replace the above SET Requirements or set_Requirements lines with: ClassAd Transform SET Requirements = ($(MY.Requirements)) && () Deprecated Syntax copy_Requirements = \"original_requirements\"; set_Requirements = original_requirements && ...;","title":"Preserving original job requirements"},{"location":"v23/configuration/htcondor-routes/#setting-the-accounting-group-based-on-the-credential-of-the-submitted-job","text":"A common need in the CE is to want to set the accounting identity of the routed job using information from the credential of the submitter of the job. This originally was done using information from the x509 certificate, in particular X509UserProxyVOName and x509UserProxySubject . With the switch to SCITOKENs, the equivalent job attributes are AuthTokenIssuer and AuthTokenSubject . It is important to understand that the condor_schedd treats AuthTokenSubject and AuthTokenIssuer as secure attributes. The values of these attributes cannot be supplied by the condor_job_router directly, they will be set based on what credential the condor_job_router uses to submit the routed job. Because of this the value of these attributes in the routed job is almost never the same as the value in the original job. This is different from the way the x509* job attributes behaved. Because of this, the default CE config will copy all attributes that match AuthToken* to orig_AuthToken* before the route transforms are applied. Example of setting the accounting group from AuthToken or x509 attributes. ClassAd Transform JOB_ROUTER_CLASSAD_USER_MAP_NAMES = $(JOB_ROUTER_CLASSAD_USER_MAP_NAMES) AcctGroupMap CLASSAD_USER_MAPFILE_AcctGroupMap = JOB_ROUTER_TRANSFORM_SetAcctGroup @=end REQUIREMENTS (orig_AuthTokenSubject ?: x509UserProxySubject) isnt undefined EVALSET AcctGroup UserMap(\"AcctGroupMap\", orig_AuthTokenSubject ?: x509UserProxySubject, AcctGroup) EVALSET AccountingGroup join(\".\", AcctGroup, Owner) @end JOB_ROUTER_PRE_ROUTE_TRANSFORMS = $(JOB_ROUTER_PRE_ROUTE_TRANSFORMS) SetAcctGroup Refer to the HTCondor documentation for information on mapfiles .","title":"Setting the accounting group based on the credential of the submitted job"},{"location":"v23/configuration/htcondor-routes/#getting-help","text":"If you have any questions or issues with configuring job routes, please contact us for assistance.","title":"Getting Help"},{"location":"v23/configuration/job-router-overview/","text":"Job Router Configuration Overview \u00b6 The HTCondor Job Router is at the heart of HTCondor-CE and allows admins to transform and direct jobs to specific batch systems. Customizations are made in the form of job routes where each route corresponds to a separate job transformation: If an incoming job matches a job route's requirements, the route creates a transformed job (referred to as the 'routed job') that is then submitted to the batch system. The CE package comes with default routes located in /etc/condor-ce/config.d/02-ce-*.conf that provide enough basic functionality for a small site. If you have needs beyond delegating all incoming jobs to your batch system as they are, this document provides an overview of how to configure your HTCondor-CE Job Router Definitions Incoming Job : A job which was submitted to HTCondor-CE from an external source. Routed Job : A job that has been transformed by the Job Router. Route Syntaxes \u00b6 HTCondor-CE 5 introduced the ability to write job routes using ClassAd transform syntax in addition to the existing configuration syntax . The old route configuration syntax is no longer the default in HTCondor-CE 23.x and one should transition to the new syntax as outlined below . ClassAd transforms \u00b6 The HTCondor ClassAd transforms were originally introduced to HTCondor to perform in-place transformations of user jobs submitted to an HTCondor pool. In the HTCondor 8.9 series, the Job Router was updated to support transforms and HTCondor-CE 5 adds the configuration necessary to support routes written as ClassAd transforms. If configured to use transform-based routes, HTCondor-CE routes and transforms jobs that by chaining ClassAd transforms in the following order: Each transform in JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES whose requirements are met by the job The first transform from JOB_ROUTER_ROUTE_NAMES whose requirements are met by the job. See the section on route matching below. Each transform in JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES whose requirements are met by the job Deprecated syntax \u00b6 Planned Removal of Deprecated Syntax JOB_ROUTER_DEFAULTS , JOB_ROUTER_ENTRIES , JOB_ROUTER_ENTRIES_CMD , and JOB_ROUTER_ENTRIES_FILE are deprecated and will be removed for V24 of the HTCondor Software Suite. New configuration syntax for the job router is defined using JOB_ROUTER_ROUTE_NAMES and JOB_ROUTER_ROUTE_[name] . For a ClassAd transform syntax example vist: HTCondor Documentation - Job Router Note: The removal will occur during the lifetime of the HTCondor V23 feature series. Since the inception of HTCondor-CE, job routes have been written as a list of ClassAds . Each job route\u2019s ClassAd is constructed by combining each entry from the JOB_ROUTER_ENTRIES with the JOB_ROUTER_DEFAULTS : JOB_ROUTER_ENTRIES is a configuration variable whose default is set in /etc/condor-ce/config.d/02-ce-*.conf but may be overriden by the administrator in subsequent files in /etc/condor-ce/config.d/ . JOB_ROUTER_DEFAULTS is a generated configuration variable that sets default job route values that are required for HTCondor-CE's functionality. To view its contents in a readable format, run the following command: user@host $ condor_ce_config_val JOB_ROUTER_DEFAULTS | sed 's/;/;\\n/g' Take care when modifying attributes in JOB_ROUTER_DEFAULTS : you may add new attributes and override attributes that are set_* in JOB_ROUTER_DEFAULTS . The following may break your HTCondor-CE Do not set the JOB_ROUTER_DEFAULTS configuration variable yourself. This will cause the CE to stop functioning. If a value is set in JOB_ROUTER_DEFAULTS with eval_set_ , override it by using eval_set_ in the JOB_ROUTER_ENTRIES . Do this at your own risk as it may cause the CE to break. Choosing a syntax \u00b6 For existing HTCondor-CEs, it's recommended that administrators stop using the deprecated syntax and transition to ClassAd transforms now. For new HTCondor-CEs, it's recommended that administrators start with ClassAd transforms. The ClassAd transform syntax provides many benefits including: Statements being evaluated in the order they are written Use of variables that are not included in the resultant job ad Use simple case-like logic Additionally, it is now easier to include job transformations that should be evaluated before or after your routes by including transforms in the lists of JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES and JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES , respectively. Converting to ClassAd transforms \u00b6 For existing HTCondor-CE's utilizing the deprecated syntax can do the following steps to convert to using the ClassAd transform syntax: Output the current configuration by running the following: condor_ce_config_val -summary > summary-file Convert the stored configuration by running the following: condor_transform_ads -convert:file summary-file > 90-converted-job-routes.conf Place the 90-converted-job-routes.conf from the previous command into the /etc/condor-ce/config.d . Potential need to rename generated config The files in /etc/condor-ce/config.d are read in lexicographical order. So if you define your current job router configuration in /etc/condor-ce/config.d in a file that is read later, e.g. 95-local.conf , you will need to rename your generated config file, e.g. 96-generated-job-routes.conf . Tweak new job routes as needed. For assistance please reach out to htcondor-users@cs.wisc.edu Set JOB_ROUTER_USE_DEPRECATED_ROUTER_ENTRIES = False (see this documentation in the HTCondor-CE's configuration. Restart the HTCondor-CE Not Using Custom Job Routes? Conversion of job router syntax from the deprecated syntax to ClassAd transform syntax needs to occur if custom job routes have been configured. How Jobs Match to Routes \u00b6 The Job Router considers incoming jobs in the HTCondor-CE SchedD (i.e., jobs visible in condor_ce_q ) that meet the following constraints: The job has not already been considered by the Job Router The job's universe is standard or vanilla If the incoming job meets the above constraints, then the job is matched to the first route in JOB_ROUTER_ROUTE_NAMES whose requirements are satisfied by the job's ClassAd. Additionally: If you are using the ClassAd transform syntax , transforms in JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES and JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES may also have their own requirements that determine whether or not that transform is applied. If you are using the deprecated syntax , you may configure the Job Router to evenly distribute jobs across all matching routes (i.e., round-robin matching). To do so, add the following configuration to a file in /etc/condor-ce/config.d/ : JOB_ROUTER_ROUND_ROBIN_SELECTION = True Getting Help \u00b6 If you have any questions or issues with configuring job routes, please contact us for assistance.","title":"Overview"},{"location":"v23/configuration/job-router-overview/#job-router-configuration-overview","text":"The HTCondor Job Router is at the heart of HTCondor-CE and allows admins to transform and direct jobs to specific batch systems. Customizations are made in the form of job routes where each route corresponds to a separate job transformation: If an incoming job matches a job route's requirements, the route creates a transformed job (referred to as the 'routed job') that is then submitted to the batch system. The CE package comes with default routes located in /etc/condor-ce/config.d/02-ce-*.conf that provide enough basic functionality for a small site. If you have needs beyond delegating all incoming jobs to your batch system as they are, this document provides an overview of how to configure your HTCondor-CE Job Router Definitions Incoming Job : A job which was submitted to HTCondor-CE from an external source. Routed Job : A job that has been transformed by the Job Router.","title":"Job Router Configuration Overview"},{"location":"v23/configuration/job-router-overview/#route-syntaxes","text":"HTCondor-CE 5 introduced the ability to write job routes using ClassAd transform syntax in addition to the existing configuration syntax . The old route configuration syntax is no longer the default in HTCondor-CE 23.x and one should transition to the new syntax as outlined below .","title":"Route Syntaxes"},{"location":"v23/configuration/job-router-overview/#classad-transforms","text":"The HTCondor ClassAd transforms were originally introduced to HTCondor to perform in-place transformations of user jobs submitted to an HTCondor pool. In the HTCondor 8.9 series, the Job Router was updated to support transforms and HTCondor-CE 5 adds the configuration necessary to support routes written as ClassAd transforms. If configured to use transform-based routes, HTCondor-CE routes and transforms jobs that by chaining ClassAd transforms in the following order: Each transform in JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES whose requirements are met by the job The first transform from JOB_ROUTER_ROUTE_NAMES whose requirements are met by the job. See the section on route matching below. Each transform in JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES whose requirements are met by the job","title":"ClassAd transforms"},{"location":"v23/configuration/job-router-overview/#deprecated-syntax","text":"Planned Removal of Deprecated Syntax JOB_ROUTER_DEFAULTS , JOB_ROUTER_ENTRIES , JOB_ROUTER_ENTRIES_CMD , and JOB_ROUTER_ENTRIES_FILE are deprecated and will be removed for V24 of the HTCondor Software Suite. New configuration syntax for the job router is defined using JOB_ROUTER_ROUTE_NAMES and JOB_ROUTER_ROUTE_[name] . For a ClassAd transform syntax example vist: HTCondor Documentation - Job Router Note: The removal will occur during the lifetime of the HTCondor V23 feature series. Since the inception of HTCondor-CE, job routes have been written as a list of ClassAds . Each job route\u2019s ClassAd is constructed by combining each entry from the JOB_ROUTER_ENTRIES with the JOB_ROUTER_DEFAULTS : JOB_ROUTER_ENTRIES is a configuration variable whose default is set in /etc/condor-ce/config.d/02-ce-*.conf but may be overriden by the administrator in subsequent files in /etc/condor-ce/config.d/ . JOB_ROUTER_DEFAULTS is a generated configuration variable that sets default job route values that are required for HTCondor-CE's functionality. To view its contents in a readable format, run the following command: user@host $ condor_ce_config_val JOB_ROUTER_DEFAULTS | sed 's/;/;\\n/g' Take care when modifying attributes in JOB_ROUTER_DEFAULTS : you may add new attributes and override attributes that are set_* in JOB_ROUTER_DEFAULTS . The following may break your HTCondor-CE Do not set the JOB_ROUTER_DEFAULTS configuration variable yourself. This will cause the CE to stop functioning. If a value is set in JOB_ROUTER_DEFAULTS with eval_set_ , override it by using eval_set_ in the JOB_ROUTER_ENTRIES . Do this at your own risk as it may cause the CE to break.","title":"Deprecated syntax"},{"location":"v23/configuration/job-router-overview/#choosing-a-syntax","text":"For existing HTCondor-CEs, it's recommended that administrators stop using the deprecated syntax and transition to ClassAd transforms now. For new HTCondor-CEs, it's recommended that administrators start with ClassAd transforms. The ClassAd transform syntax provides many benefits including: Statements being evaluated in the order they are written Use of variables that are not included in the resultant job ad Use simple case-like logic Additionally, it is now easier to include job transformations that should be evaluated before or after your routes by including transforms in the lists of JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES and JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES , respectively.","title":"Choosing a syntax"},{"location":"v23/configuration/job-router-overview/#converting-to-classad-transforms","text":"For existing HTCondor-CE's utilizing the deprecated syntax can do the following steps to convert to using the ClassAd transform syntax: Output the current configuration by running the following: condor_ce_config_val -summary > summary-file Convert the stored configuration by running the following: condor_transform_ads -convert:file summary-file > 90-converted-job-routes.conf Place the 90-converted-job-routes.conf from the previous command into the /etc/condor-ce/config.d . Potential need to rename generated config The files in /etc/condor-ce/config.d are read in lexicographical order. So if you define your current job router configuration in /etc/condor-ce/config.d in a file that is read later, e.g. 95-local.conf , you will need to rename your generated config file, e.g. 96-generated-job-routes.conf . Tweak new job routes as needed. For assistance please reach out to htcondor-users@cs.wisc.edu Set JOB_ROUTER_USE_DEPRECATED_ROUTER_ENTRIES = False (see this documentation in the HTCondor-CE's configuration. Restart the HTCondor-CE Not Using Custom Job Routes? Conversion of job router syntax from the deprecated syntax to ClassAd transform syntax needs to occur if custom job routes have been configured.","title":"Converting to ClassAd transforms"},{"location":"v23/configuration/job-router-overview/#how-jobs-match-to-routes","text":"The Job Router considers incoming jobs in the HTCondor-CE SchedD (i.e., jobs visible in condor_ce_q ) that meet the following constraints: The job has not already been considered by the Job Router The job's universe is standard or vanilla If the incoming job meets the above constraints, then the job is matched to the first route in JOB_ROUTER_ROUTE_NAMES whose requirements are satisfied by the job's ClassAd. Additionally: If you are using the ClassAd transform syntax , transforms in JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES and JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES may also have their own requirements that determine whether or not that transform is applied. If you are using the deprecated syntax , you may configure the Job Router to evenly distribute jobs across all matching routes (i.e., round-robin matching). To do so, add the following configuration to a file in /etc/condor-ce/config.d/ : JOB_ROUTER_ROUND_ROBIN_SELECTION = True","title":"How Jobs Match to Routes"},{"location":"v23/configuration/job-router-overview/#getting-help","text":"If you have any questions or issues with configuring job routes, please contact us for assistance.","title":"Getting Help"},{"location":"v23/configuration/local-batch-system/","text":"Configuring for the Local Batch System \u00b6 Before HTCondor-CE can submit jobs to your local batch system, it has to be configured to do so. The configuration will differ depending on if your local batch system is HTCondor or one of the other supported batch systems. Choose the section corresponding to your batch system below. HTCondor Batch Systems \u00b6 To configure HTCondor-CE for an HTCondor batch system, set JOB_ROUTER_SCHEDD2_POOL to your site's central manager host and port: JOB_ROUTER_SCHEDD2_POOL = cm.chtc.wisc.edu:9618 Additionally, set JOB_ROUTER_SCHEDD2_SPOOL to the location of the local batch SPOOL directory on the CE host if it is different than the default location ( /var/lib/condor/spool ). Non-HTCondor Batch Systems \u00b6 Configuring the BLAHP \u00b6 HTCondor-CE uses the Batch Language ASCII Helper Protocol (BLAHP) to submit and track jobs to non-HTCondor batch systems. If your batch system tools are installed in a non-standard location (i.e., outside of /usr/bin/ ), set the corresponding *_binpath variable in /etc/blah.config to the directory containing your batch system tools: If your batch system is... Then change the following configuration variable... LSF lsf_binpath PBS/Torque pbs_binpath SGE sge_binpath Slurm slurm_binpath For example, if your Slurm binaries (e.g. sbatch ) exist in /opt/slurm/bin , you would set the following: slurm_binpath=/opt/slurm/bin/ Sharing the SPOOL directory \u00b6 Non-HTCondor batch systems require a shared file system configuration to support file transfer from the HTCondor-CE to your site's worker nodes. The current recommendation is to run a dedicated NFS server on the CE host . In this setup, HTCondor-CE writes to the local spool directory, the NFS server shares the directory, and each worker node mounts the directory in the same location as on the CE. For example, if your spool directory is /var/lib/condor-ce (the default), you must mount the shared directory to /var/lib/condor-ce on the worker nodes. Note If you choose not to host the NFS server on your CE, you will need to turn off root squash so that the HTCondor-CE daemons can write to the spool directory. You can control the value of the spool directory by setting SPOOL in /etc/condor-ce/config.d/99-local.conf (create this file if it doesn't exist). For example, the following sets the SPOOL directory to /home/condor : SPOOL = /home/condor Note The shared spool directory must be readable and writeable by the condor user for HTCondor-CE to function correctly.","title":"Local Batch System"},{"location":"v23/configuration/local-batch-system/#configuring-for-the-local-batch-system","text":"Before HTCondor-CE can submit jobs to your local batch system, it has to be configured to do so. The configuration will differ depending on if your local batch system is HTCondor or one of the other supported batch systems. Choose the section corresponding to your batch system below.","title":"Configuring for the Local Batch System"},{"location":"v23/configuration/local-batch-system/#htcondor-batch-systems","text":"To configure HTCondor-CE for an HTCondor batch system, set JOB_ROUTER_SCHEDD2_POOL to your site's central manager host and port: JOB_ROUTER_SCHEDD2_POOL = cm.chtc.wisc.edu:9618 Additionally, set JOB_ROUTER_SCHEDD2_SPOOL to the location of the local batch SPOOL directory on the CE host if it is different than the default location ( /var/lib/condor/spool ).","title":"HTCondor Batch Systems"},{"location":"v23/configuration/local-batch-system/#non-htcondor-batch-systems","text":"","title":"Non-HTCondor Batch Systems"},{"location":"v23/configuration/local-batch-system/#configuring-the-blahp","text":"HTCondor-CE uses the Batch Language ASCII Helper Protocol (BLAHP) to submit and track jobs to non-HTCondor batch systems. If your batch system tools are installed in a non-standard location (i.e., outside of /usr/bin/ ), set the corresponding *_binpath variable in /etc/blah.config to the directory containing your batch system tools: If your batch system is... Then change the following configuration variable... LSF lsf_binpath PBS/Torque pbs_binpath SGE sge_binpath Slurm slurm_binpath For example, if your Slurm binaries (e.g. sbatch ) exist in /opt/slurm/bin , you would set the following: slurm_binpath=/opt/slurm/bin/","title":"Configuring the BLAHP"},{"location":"v23/configuration/local-batch-system/#sharing-the-spool-directory","text":"Non-HTCondor batch systems require a shared file system configuration to support file transfer from the HTCondor-CE to your site's worker nodes. The current recommendation is to run a dedicated NFS server on the CE host . In this setup, HTCondor-CE writes to the local spool directory, the NFS server shares the directory, and each worker node mounts the directory in the same location as on the CE. For example, if your spool directory is /var/lib/condor-ce (the default), you must mount the shared directory to /var/lib/condor-ce on the worker nodes. Note If you choose not to host the NFS server on your CE, you will need to turn off root squash so that the HTCondor-CE daemons can write to the spool directory. You can control the value of the spool directory by setting SPOOL in /etc/condor-ce/config.d/99-local.conf (create this file if it doesn't exist). For example, the following sets the SPOOL directory to /home/condor : SPOOL = /home/condor Note The shared spool directory must be readable and writeable by the condor user for HTCondor-CE to function correctly.","title":"Sharing the SPOOL directory"},{"location":"v23/configuration/non-htcondor-routes/","text":"For Non-HTCondor Batch Systems \u00b6 This page contains information about job routes that can be used if you are running a non-HTCondor pool at your site. Setting a default batch queue \u00b6 To set a default queue for routed jobs, set the variable or attribute default_queue for the ClassAd transform and deprecated syntax, respectively: ClassAd Transform JOB_ROUTER_ROUTE_Slurm_Cluster @=jrt GridResource = \"batch slurm\" default_queue = osg_queue @jrt JOB_ROUTER_ROUTE_NAMES = Slurm_Cluster Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ GridResource = \"batch slurm\"; name = \"Slurm_Cluster\"; set_default_queue = \"osg_queue\"; ] @jre JOB_ROUTER_ROUTE_NAMES = Slurm_Cluster Setting batch system directives \u00b6 To write batch system directives that are not supported in the route examples above, you will need to edit the job submit script for your local batch system in /etc/blahp/ (e.g., if your local batch system is Slurm, edit /etc/blahp/slurm_local_submit_attributes.sh ). This file is sourced during submit time and anything printed to stdout is appended to the generated batch system job submit script. ClassAd attributes can be passed from the routed job to the local submit attributes script via default_CERequirements attribute, which takes a comma-separated list of other attributes: ClassAd Transform SET foo = \"X\" SET bar = \"Y\" SET default_CERequirements = \"foo,bar\" Deprecated Syntax set_foo = \"X\"; set_bar = \"Y\"; set_default_CERequirements = \"foo,bar\"; This sets foo to the string X and bar to the string Y in the environment of the local submit attributes script. The following example sets the maximum walltime to 1 hour and the accounting group to the x509UserProxyFirstFQAN attribute of the job submitted to a PBS batch system: ClassAd Transform JOB_ROUTER_ROUTE_Slurm_Cluster @=jrt GridResource = \"batch slurm\" SET Walltime = 3600 SET AccountingGroup = x509UserProxyFirstFQAN SET default_CERequirements = \"WallTime,AccountingGroup\" @jrt JOB_ROUTER_ROUTE_NAMES = Slurm_Cluster Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ GridResource = \"batch slurm\"; name = \"Slurm_Cluster\"; set_Walltime = 3600; set_AccountingGroup = x509UserProxyFirstFQAN; set_default_CERequirements = \"WallTime,AccountingGroup\"; ] @jre JOB_ROUTER_ROUTE_NAMES = Slurm_Cluster With /etc/blahp/pbs_local_submit_attributes.sh containing: #!/bin/bash echo \"#PBS -l walltime=$Walltime\" echo \"#PBS -A $AccountingGroup\" This results in the following being appended to the script that gets submitted to your batch system: #PBS -l walltime=3600 #PBS -A Getting Help \u00b6 If you have any questions or issues with configuring job routes, please contact us for assistance.","title":"For Non-HTCondor Batch Systems"},{"location":"v23/configuration/non-htcondor-routes/#for-non-htcondor-batch-systems","text":"This page contains information about job routes that can be used if you are running a non-HTCondor pool at your site.","title":"For Non-HTCondor Batch Systems"},{"location":"v23/configuration/non-htcondor-routes/#setting-a-default-batch-queue","text":"To set a default queue for routed jobs, set the variable or attribute default_queue for the ClassAd transform and deprecated syntax, respectively: ClassAd Transform JOB_ROUTER_ROUTE_Slurm_Cluster @=jrt GridResource = \"batch slurm\" default_queue = osg_queue @jrt JOB_ROUTER_ROUTE_NAMES = Slurm_Cluster Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ GridResource = \"batch slurm\"; name = \"Slurm_Cluster\"; set_default_queue = \"osg_queue\"; ] @jre JOB_ROUTER_ROUTE_NAMES = Slurm_Cluster","title":"Setting a default batch queue"},{"location":"v23/configuration/non-htcondor-routes/#setting-batch-system-directives","text":"To write batch system directives that are not supported in the route examples above, you will need to edit the job submit script for your local batch system in /etc/blahp/ (e.g., if your local batch system is Slurm, edit /etc/blahp/slurm_local_submit_attributes.sh ). This file is sourced during submit time and anything printed to stdout is appended to the generated batch system job submit script. ClassAd attributes can be passed from the routed job to the local submit attributes script via default_CERequirements attribute, which takes a comma-separated list of other attributes: ClassAd Transform SET foo = \"X\" SET bar = \"Y\" SET default_CERequirements = \"foo,bar\" Deprecated Syntax set_foo = \"X\"; set_bar = \"Y\"; set_default_CERequirements = \"foo,bar\"; This sets foo to the string X and bar to the string Y in the environment of the local submit attributes script. The following example sets the maximum walltime to 1 hour and the accounting group to the x509UserProxyFirstFQAN attribute of the job submitted to a PBS batch system: ClassAd Transform JOB_ROUTER_ROUTE_Slurm_Cluster @=jrt GridResource = \"batch slurm\" SET Walltime = 3600 SET AccountingGroup = x509UserProxyFirstFQAN SET default_CERequirements = \"WallTime,AccountingGroup\" @jrt JOB_ROUTER_ROUTE_NAMES = Slurm_Cluster Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ GridResource = \"batch slurm\"; name = \"Slurm_Cluster\"; set_Walltime = 3600; set_AccountingGroup = x509UserProxyFirstFQAN; set_default_CERequirements = \"WallTime,AccountingGroup\"; ] @jre JOB_ROUTER_ROUTE_NAMES = Slurm_Cluster With /etc/blahp/pbs_local_submit_attributes.sh containing: #!/bin/bash echo \"#PBS -l walltime=$Walltime\" echo \"#PBS -A $AccountingGroup\" This results in the following being appended to the script that gets submitted to your batch system: #PBS -l walltime=3600 #PBS -A ","title":"Setting batch system directives"},{"location":"v23/configuration/non-htcondor-routes/#getting-help","text":"If you have any questions or issues with configuring job routes, please contact us for assistance.","title":"Getting Help"},{"location":"v23/configuration/optional-configuration/","text":"Optional Configuration \u00b6 The following configuration steps are optional and will not be required for all sites. If you do not need any of the following special configurations, skip to the page for verifying your HTCondor-CE . Configuring for Multiple Network Interfaces \u00b6 If you have multiple network interfaces with different hostnames, the HTCondor-CE daemons need to know which hostname and interface to use when communicating with each other. Set NETWORK_HOSTNAME and NETWORK_INTERFACE to the hostname and IP address of your public interface, respectively, in /etc/condor-ce/config.d/99-local.conf directory with the line: NETWORK_HOSTNAME = condorce.example.com NETWORK_INTERFACE = 127.0.0.1 Replacing condorce.example.com text with your public interface\u2019s hostname and 127.0.0.1 with your public interface\u2019s IP address. Limiting or Disabling Locally Running Jobs \u00b6 If you want to limit or disable jobs running locally on your CE, you will need to configure HTCondor-CE's local and scheduler universes. Local and scheduler universes allow jobs to be run on the CE itself, mainly for remote troubleshooting. Pilot jobs will not run as local/scheduler universe jobs so leaving them enabled does NOT turn your CE into another worker node. The two universes are effectively the same (scheduler universe launches a starter process for each job), so we will be configuring them in unison. To change the default limit on the number of locally run jobs (the current default is 20), add the following to /etc/condor-ce/config.d/99-local.conf : START_LOCAL_UNIVERSE = TotalLocalJobsRunning + TotalSchedulerJobsRunning < START_SCHEDULER_UNIVERSE = $(START_LOCAL_UNIVERSE) Where is the maximum number of jobs allowed to run locally To only allow a specific user to start locally run jobs, add the following to /etc/condor-ce/config.d/99-local.conf : START_LOCAL_UNIVERSE = target.Owner =?= \"\" START_SCHEDULER_UNIVERSE = $(START_LOCAL_UNIVERSE) Change for the username allowed to run jobs locally To disable locally run jobs, add the following to /etc/condor-ce/config.d/99-local.conf : START_LOCAL_UNIVERSE = False START_SCHEDULER_UNIVERSE = $(START_LOCAL_UNIVERSE) Inserting IDTOKENs into the routed job's sandbox \u00b6 If you want to insert IDTOKENS into the routed job's sandbox you can use the SendIDTokens route command, or the JOB_ROUTER_SEND_ROUTE_IDTOKENS global configuration variable. Tokens sent using this mechanism must be named and declared using the JOB_ROUTER_CREATE_IDTOKEN_NAMES and JOB_ROUTER_CREATE_IDTOKEN_ configuration variables. Tokens whose names are declared in the JOB_ROUTER_SEND_ROUTE_IDTOKENS configuration variable are sent by default for each route that does not have a SendIDTokens command. To declare IDTOKENS for inclusion in glide-in jobs for the purpose of advertising to a collector add something like the following to /etc/condor-ce/config.d/99-local-ce-token.conf : JOB_ROUTER_CREATE_IDTOKEN_NAMES = name1 name2 JOB_ROUTER_CREATE_IDTOKEN_name1 @=end sub = \"name1@users.htcondor.org\" kid = \"POOL\" lifetime = 3900 scope = \"ADVERTISE_STARTD, ADVERTISE_MASTER, READ\" dir = \"/etc/condor-ce/gltokens/name1\" filename = \"ce_name1.idtoken\" owner = \"owner1\" @end JOB_ROUTER_CREATE_IDTOKEN_Name2 @=end sub = \"name2@users.htcondor.org\" kid = \"POOL\" lifetime = 3900 scope = \"ADVERTISE_STARTD, ADVERTISE_MASTER, READ\" dir = \"/etc/condor-ce/gltokens/name2\" filename = \"ce_name2.idtoken\" owner = \"owner2\" @end To insert one of the above IDTOKENS in the sandbox of a routed job , include the token name in the SendIDTokens route command like this. SendIDTokens = \"Name2\" Route commands SendIDTokens is a route command, not a job attribute. This means that you will not be able to manipulate it through transform verbs such as EVALSET . To add an IDTOKEN to a routed job in addition to the default tokens , build a string containing the token name along with the value of the global configuration variable like this. SendIDTokens = \"Name2 $(JOB_ROUTER_SEND_ROUTE_IDTOKENS)\" You can use an attribute of the source job to choose the IDTOKEN by writing an expression like this. SendIDTokens = strcat( My.Owner, \" $(JOB_ROUTER_SEND_ROUTE_IDTOKENS)\") It is presumed that the value of My.Owner above is the same as the of an IDTOKEN and as the owner field of that token. For instance, the Fermilab CE config uses the above SendIDTokens expression and the following token declarations at the time of this guide. JOB_ROUTER_CREATE_IDTOKEN_NAMES = fermilab3 osg JOB_ROUTER_CREATE_IDTOKEN_fermilab3 @=end sub = \"fermilabpilot@fnal.gov\" kid = \"POOL\" lifetime = 3900 scope = \"ADVERTISE_STARTD, ADVERTISE_MASTER, READ\" dir = \"/etc/condor-ce/gltokens/fermilab\" filename = \"ce_fermilab3.idtoken\" owner = \"fermilab\" @end JOB_ROUTER_CREATE_IDTOKEN_osg @=end sub = \"osgpilot@fnal.gov\" kid = \"POOL\" lifetime = 600 scope = \"ADVERTISE_STARTD, ADVERTISE_MASTER, READ\" dir = \"/etc/condor-ce/gltokens/fermilab\" filename = \"ce_osg.idtoken\" owner = \"osg\" @end Enabling the Monitoring Web Interface \u00b6 The HTCondor-CE View is an optional web interface to the status of your CE. To run the HTCondor-CE View, install the appropriate package and set the relevant configuration. Begin by installing the htcondor-ce-view package: root@host # yum install htcondor-ce-view Restart the condor-ce service Verify the service by entering your CE's hostname into your web browser The website is served on port 80 by default. To change this default, edit the value of HTCONDORCE_VIEW_PORT in /etc/condor-ce/config.d/05-ce-view.conf . Uploading Accounting Records to APEL \u00b6 Batch System Support HTCondor-CE only supports generation of APEL accounting records for HTCondor batch systems. For sites outside of the OSG that need to upload the APEL accounting records, HTCondor-CE supports uploading batch and blah APEL records for HTCondor batch systems. Please refer to EGI's HTCondor-CE Accounting Documentation . Enabling BDII Integration \u00b6 Batch System Support HTCondor-CE only supports reporting BDII information for HTCondor batch systems. HTCondor-CE supports reporting BDII information for all HTCondor-CE endpoints and batch information for an HTCondor batch system. To make this information available, perform the following instructions on your site BDII host. Install the HTCondor-CE BDII package: root@host # yum install htcondor-ce-bdii Configure HTCondor ( /etc/condor/config.d/ ) on your site BDII host to point to your central manager: CONDOR_HOST = Replacing with the hostname of your HTCondor central manager Configure BDII static information by modifying /etc/condor/config.d/99-ce-bdii.conf Additionally, install the HTCondor-CE BDII package on each of your HTCondor-CE hosts: root@host # yum install htcondor-ce-bdii","title":"Optional Configuration"},{"location":"v23/configuration/optional-configuration/#optional-configuration","text":"The following configuration steps are optional and will not be required for all sites. If you do not need any of the following special configurations, skip to the page for verifying your HTCondor-CE .","title":"Optional Configuration"},{"location":"v23/configuration/optional-configuration/#configuring-for-multiple-network-interfaces","text":"If you have multiple network interfaces with different hostnames, the HTCondor-CE daemons need to know which hostname and interface to use when communicating with each other. Set NETWORK_HOSTNAME and NETWORK_INTERFACE to the hostname and IP address of your public interface, respectively, in /etc/condor-ce/config.d/99-local.conf directory with the line: NETWORK_HOSTNAME = condorce.example.com NETWORK_INTERFACE = 127.0.0.1 Replacing condorce.example.com text with your public interface\u2019s hostname and 127.0.0.1 with your public interface\u2019s IP address.","title":"Configuring for Multiple Network Interfaces"},{"location":"v23/configuration/optional-configuration/#limiting-or-disabling-locally-running-jobs","text":"If you want to limit or disable jobs running locally on your CE, you will need to configure HTCondor-CE's local and scheduler universes. Local and scheduler universes allow jobs to be run on the CE itself, mainly for remote troubleshooting. Pilot jobs will not run as local/scheduler universe jobs so leaving them enabled does NOT turn your CE into another worker node. The two universes are effectively the same (scheduler universe launches a starter process for each job), so we will be configuring them in unison. To change the default limit on the number of locally run jobs (the current default is 20), add the following to /etc/condor-ce/config.d/99-local.conf : START_LOCAL_UNIVERSE = TotalLocalJobsRunning + TotalSchedulerJobsRunning < START_SCHEDULER_UNIVERSE = $(START_LOCAL_UNIVERSE) Where is the maximum number of jobs allowed to run locally To only allow a specific user to start locally run jobs, add the following to /etc/condor-ce/config.d/99-local.conf : START_LOCAL_UNIVERSE = target.Owner =?= \"\" START_SCHEDULER_UNIVERSE = $(START_LOCAL_UNIVERSE) Change for the username allowed to run jobs locally To disable locally run jobs, add the following to /etc/condor-ce/config.d/99-local.conf : START_LOCAL_UNIVERSE = False START_SCHEDULER_UNIVERSE = $(START_LOCAL_UNIVERSE)","title":"Limiting or Disabling Locally Running Jobs"},{"location":"v23/configuration/optional-configuration/#inserting-idtokens-into-the-routed-jobs-sandbox","text":"If you want to insert IDTOKENS into the routed job's sandbox you can use the SendIDTokens route command, or the JOB_ROUTER_SEND_ROUTE_IDTOKENS global configuration variable. Tokens sent using this mechanism must be named and declared using the JOB_ROUTER_CREATE_IDTOKEN_NAMES and JOB_ROUTER_CREATE_IDTOKEN_ configuration variables. Tokens whose names are declared in the JOB_ROUTER_SEND_ROUTE_IDTOKENS configuration variable are sent by default for each route that does not have a SendIDTokens command. To declare IDTOKENS for inclusion in glide-in jobs for the purpose of advertising to a collector add something like the following to /etc/condor-ce/config.d/99-local-ce-token.conf : JOB_ROUTER_CREATE_IDTOKEN_NAMES = name1 name2 JOB_ROUTER_CREATE_IDTOKEN_name1 @=end sub = \"name1@users.htcondor.org\" kid = \"POOL\" lifetime = 3900 scope = \"ADVERTISE_STARTD, ADVERTISE_MASTER, READ\" dir = \"/etc/condor-ce/gltokens/name1\" filename = \"ce_name1.idtoken\" owner = \"owner1\" @end JOB_ROUTER_CREATE_IDTOKEN_Name2 @=end sub = \"name2@users.htcondor.org\" kid = \"POOL\" lifetime = 3900 scope = \"ADVERTISE_STARTD, ADVERTISE_MASTER, READ\" dir = \"/etc/condor-ce/gltokens/name2\" filename = \"ce_name2.idtoken\" owner = \"owner2\" @end To insert one of the above IDTOKENS in the sandbox of a routed job , include the token name in the SendIDTokens route command like this. SendIDTokens = \"Name2\" Route commands SendIDTokens is a route command, not a job attribute. This means that you will not be able to manipulate it through transform verbs such as EVALSET . To add an IDTOKEN to a routed job in addition to the default tokens , build a string containing the token name along with the value of the global configuration variable like this. SendIDTokens = \"Name2 $(JOB_ROUTER_SEND_ROUTE_IDTOKENS)\" You can use an attribute of the source job to choose the IDTOKEN by writing an expression like this. SendIDTokens = strcat( My.Owner, \" $(JOB_ROUTER_SEND_ROUTE_IDTOKENS)\") It is presumed that the value of My.Owner above is the same as the of an IDTOKEN and as the owner field of that token. For instance, the Fermilab CE config uses the above SendIDTokens expression and the following token declarations at the time of this guide. JOB_ROUTER_CREATE_IDTOKEN_NAMES = fermilab3 osg JOB_ROUTER_CREATE_IDTOKEN_fermilab3 @=end sub = \"fermilabpilot@fnal.gov\" kid = \"POOL\" lifetime = 3900 scope = \"ADVERTISE_STARTD, ADVERTISE_MASTER, READ\" dir = \"/etc/condor-ce/gltokens/fermilab\" filename = \"ce_fermilab3.idtoken\" owner = \"fermilab\" @end JOB_ROUTER_CREATE_IDTOKEN_osg @=end sub = \"osgpilot@fnal.gov\" kid = \"POOL\" lifetime = 600 scope = \"ADVERTISE_STARTD, ADVERTISE_MASTER, READ\" dir = \"/etc/condor-ce/gltokens/fermilab\" filename = \"ce_osg.idtoken\" owner = \"osg\" @end","title":"Inserting IDTOKENs into the routed job's sandbox"},{"location":"v23/configuration/optional-configuration/#enabling-the-monitoring-web-interface","text":"The HTCondor-CE View is an optional web interface to the status of your CE. To run the HTCondor-CE View, install the appropriate package and set the relevant configuration. Begin by installing the htcondor-ce-view package: root@host # yum install htcondor-ce-view Restart the condor-ce service Verify the service by entering your CE's hostname into your web browser The website is served on port 80 by default. To change this default, edit the value of HTCONDORCE_VIEW_PORT in /etc/condor-ce/config.d/05-ce-view.conf .","title":"Enabling the Monitoring Web Interface"},{"location":"v23/configuration/optional-configuration/#uploading-accounting-records-to-apel","text":"Batch System Support HTCondor-CE only supports generation of APEL accounting records for HTCondor batch systems. For sites outside of the OSG that need to upload the APEL accounting records, HTCondor-CE supports uploading batch and blah APEL records for HTCondor batch systems. Please refer to EGI's HTCondor-CE Accounting Documentation .","title":"Uploading Accounting Records to APEL"},{"location":"v23/configuration/optional-configuration/#enabling-bdii-integration","text":"Batch System Support HTCondor-CE only supports reporting BDII information for HTCondor batch systems. HTCondor-CE supports reporting BDII information for all HTCondor-CE endpoints and batch information for an HTCondor batch system. To make this information available, perform the following instructions on your site BDII host. Install the HTCondor-CE BDII package: root@host # yum install htcondor-ce-bdii Configure HTCondor ( /etc/condor/config.d/ ) on your site BDII host to point to your central manager: CONDOR_HOST = Replacing with the hostname of your HTCondor central manager Configure BDII static information by modifying /etc/condor/config.d/99-ce-bdii.conf Additionally, install the HTCondor-CE BDII package on each of your HTCondor-CE hosts: root@host # yum install htcondor-ce-bdii","title":"Enabling BDII Integration"},{"location":"v23/configuration/writing-job-routes/","text":"Writing Job Routes \u00b6 This document contains documentation for HTCondor-CE Job Router configurations with equivalent examples for the ClassAd transform and deprecated syntaxes. Configuration from this page should be written to files in /etc/condor-ce/config.d/ , whose contents are parsed in lexicographic order with subsequent variables overriding earlier ones. Each example is displayed in code blocks with tabs to switch between the two syntaxes: ClassAd Transform This is an example for the ClassAd transform syntax Deprecated syntax This is an example for the deprecated syntax Syntax Differences \u00b6 Planned Removal of Deprecated Syntax JOB_ROUTER_DEFAULTS , JOB_ROUTER_ENTRIES , JOB_ROUTER_ENTRIES_CMD , and JOB_ROUTER_ENTRIES_FILE are deprecated and will be removed for V24 of the HTCondor Software Suite. New configuration syntax for the job router is defined using JOB_ROUTER_ROUTE_NAMES and JOB_ROUTER_ROUTE_[name] . For new syntax example vist: HTCondor Documentation - Job Router Note: The removal will occur during the lifetime of the HTCondor V23 feature series. In HTCondor-CE 23, the deprecated syntax continues to be the default and administrator's can move to the ClassAd transform syntax by setting the following in a file in /etc/condor-ce/config.d/ : JOB_ROUTER_USE_DEPRECATED_ROUTER_ENTRIES = False The ClassAd transform syntax provides many benefits including: Statements being evaluated in the order they are written Use of variables that are not included in the resultant job ad Use simple case-like logic Additionally, it is now easier to include job transformations that should be evaluated before or after your routes by including transforms in the lists of JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES and JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES , respectively. For examples of the ClassAd transform syntax, you can inspect default job router transforms packaged with HTCondor-CE with the following command: user@host $ condor_ce_config_val -dump JOB_ROUTER_TRANSFORM_ Differences in MY. and TARGET. \u00b6 In addition to the above, the behavior of the MY. and TARGET. ClassAd attribute prefixes has changed between the two different syntaxes: In ClassAd transform syntax, MY. always refers to the incoming job's attributes and can be referenced within $() , e.g. $(MY.Owner) refers to the mapped user of the incoming job. TARGET is only used in SET expressions to refer to attributes in the slot ad (HTCondor pools only). In the deprecated syntax, MY. refers to attributes in the job route and TARGET. refers to attributes in the incoming job ad for copy_ , delete_ , and eval_set_ functions. However, in expressions defined by set_* , MY. refers to the attributes in the incoming job ad and TARGET. refers to the attribute in the slot ad (HTCondor pools only). Required Fields \u00b6 The minimum requirements for a route are that you specify the type of batch system that jobs should be routed to and a name for each route. Default routes can be found in /usr/share/condor-ce/config.d/02-ce--defaults.conf , provided by the htcondor-ce- packages. Route name \u00b6 To identify routes, you will need to assign a name to the route, either in the name of the configuration macro (i.e., JOB_ROUTER_ROUTE_ ) for the ClassAd transform syntax or with the name attribute for the deprecated syntax: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; ] JOB_ROUTER_ROUTE_NAMES = Condor_Pool Naming restrictions Route names should only contain alphanumeric and _ characters. Routes specified by JOB_ROUTER_ROUTE_* will override routes with the same name in JOB_ROUTER_ENTRIES The name of the route will be useful in debugging since it shows up in the output of condor_ce_job_router_info ; the JobRouterLog ; in the ClassAd of the routed job, which can be viewed with condor_q and condor_history for HTCondor batch systems; and in the ClassAd of the routed job, which can be vieweed with condor_ce_q or condor_ce_history for non-HTCondor batch systems. Batch system \u00b6 Each route needs to indicate the type of batch system that jobs should be routed to. For HTCondor batch systems, the UNIVERSE command or TargetUniverse attribute needs to be set to \"VANILLA\" or 5 , respectively. For all other batch systems, the GridResource attribute needs to be set to \"batch \" (where can be one of pbs , slurm , lsf , or sge ). ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA @jrt JOB_ROUTER_ROUTE_My_Slurm @=jrt GridResource = \"batch slurm\" @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool My_Slurm Deprecated syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; ] [ GridResource = \"batch slurm\"; name = \"My_Slurm\"; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool My_Slurm Writing Multiple Routes \u00b6 If your batch system needs incoming jobs to be sorted (e.g. if different VO's need to go to separate queues), you will need to write multiple job routes where each route is a separate JOB_ROUTER_ROUTE_* macro in the ClassAd transform syntax and enclosed by square brackets in the deprecated syntax. Additionally, the route names must be added to JOB_ROUTER_ROUTE_NAMES in the order that you want their requirements statements compared to incoming jobs. The following routes takes incoming jobs that have a queue attribute set to \"prod\" and sets IsProduction = True . All other jobs will be routed with IsProduction = False . ClassAd Transform JOB_ROUTER_ROUTE_Production_Jobs @=jrt REQUIREMENTS queue == \"prod\" UNIVERSE VANILLA SET IsProduction = True @jrt JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA SET IsProduction = False @jrt JOB_ROUTER_ROUTE_NAMES = Production_Jobs Condor_Pool Deprecated syntax JOB_ROUTER_ENTRIES @=jre [ Requirements = (TARGET.queue == \"prod\"); TargetUniverse = 5; set_IsProduction = True; name = \"Production_Jobs\"; ] [ TargetUniverse = 5; set_IsProduction = False; name = \"Condor_Pool\"; ] @jre JOB_ROUTER_ROUTE_NAMES = Production_Jobs Condor_Pool Writing Comments \u00b6 To write comments you can use # to comment a line: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt # This is a comment UNIVERSE VANILLA @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; # This is a comment ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool Setting Attributes for All Routes \u00b6 ClassAd transform \u00b6 With the ClassAd transform syntax, any function from the Editing Attributes section can be applied before or after your routes are considered by appending the names of transforms specified by JOB_ROUTER_TRANSFORM_ to the lists of JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES and JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES , respectively. The following configuration sets the Periodic_Hold attribute for all routed jobs before any route transforms are applied: JOB_ROUTER_TRANSFORM_Periodic_Hold SET Periodic_Hold = (NumJobStarts >= 1 && JobStatus == 1) || NumJobStarts > 1 @jrt JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES = $(JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES) Periodic_Hold To apply the same transform after your pre-route and route transforms, append the name of the transform to JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES instead: JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES = $(JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES) Periodic_Hold Deprecated syntax \u00b6 To set an attribute that will be applied to all routes, you will need to ensure that MERGE_JOB_ROUTER_DEFAULT_ADS is set to True (check the value with condor_ce_config_val ) and use the set_ function in the JOB_ROUTER_DEFAULTS . The following configuration sets the Periodic_Hold attribute for all routes: # Use the defaults generated by the condor_ce_router_defaults script. To add # additional defaults, add additional lines of the form: # # JOB_ROUTER_DEFAULTS = $(JOB_ROUTER_DEFAULTS) [set_foo = 1;] # MERGE_JOB_ROUTER_DEFAULT_ADS=True JOB_ROUTER_DEFAULTS = $(JOB_ROUTER_DEFAULTS) [set_Periodic_Hold = (NumJobStarts >= 1 && JobStatus == 1) || NumJobStarts > 1;] Filtering Jobs Based On\u2026 \u00b6 To filter jobs, use the route's REQUIREMENTS or Requirements attribute for ClassAd transforms and deprecated syntaxes, respectively. Incoming jobs will be evaluated against the ClassAd expression set in the route's requirements and if the expression evaluates to TRUE , the route will match. More information on the syntax of ClassAd's can be found in the HTCondor manual . For an example on how incoming jobs interact with filtering in job routes, consult this document . In the deprecated syntax, you may need to specify TARGET. to refer to differentiate between job and route attributes. See this section for more details. Note If you have an HTCondor batch system, note the difference with set_requirements : Pilot job queue \u00b6 To filter jobs based on their pilot job queue attribute, your routes will need a requirements expression using the incoming job's queue attribute. The following entry routes jobs to HTCondor if the incoming job (specified by TARGET ) is an analy (Analysis) glidein: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt REQUIREMENTS queue == \"prod\" UNIVERSE VANILLA @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated syntax JOB_ROUTER_ENTRIES @=jre [ Requirements = (TARGET.queue == \"prod\"); TargetUniverse = 5; name = \"Condor_Pool\"; ] @jre JOB_ROUTER_ROUTE_NAMES = My_HTCONDOR Mapped user \u00b6 To filter jobs based on what local account the incoming job was mapped to, your routes will need a requirements expression using the incoming job's Owner attribute. The following entry routes jobs to the HTCondor batch system if the mapped user is usatlas2 : ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt REQUIREMENTS Owner == \"usatlas2\" UNIVERSE VANILLA @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated syntax JOB_ROUTER_ENTRIES @=jre [ Requirements = (TARGET.Owner == \"usatlas2\"); TargetUniverse = 5; name = \"Condor_Pool\"; ] @jre JOB_ROUTER_ROUTE_NAMES = My_HTCONDOR Alternatively, you can match based on regular expression. The following entry routes jobs to the HTCondor batch system if the mapped user begins with usatlas : ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt REQUIREMENTS regexp(\"^usatlas\", Owner) UNIVERSE VANILLA @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated syntax JOB_ROUTER_ENTRIES @=jre [ Requirements = regexp(\"^usatlas\", TARGET.Owner); TargetUniverse = 5; name = \"Condor_Pool\"; ] @jre JOB_ROUTER_ROUTE_NAMES = My_HTCONDOR VOMS attribute \u00b6 To filter jobs based on the subject of the job's proxy, your routes will need a requirements expression using the incoming job's x509UserProxyFirstFQAN attribute. The following entry routes jobs to the HTCondor batch system if the proxy subject contains /cms/Role=Pilot : ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt REQUIREMENTS regexp(\"\\/cms\\/Role\\=pilot\", x509UserProxyFirstFQAN) UNIVERSE VANILLA @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated syntax JOB_ROUTER_ENTRIES @=jre [ Requirements = regexp(\"\\/cms\\/Role\\=pilot\", TARGET.x509UserProxyFirstFQAN); TargetUniverse = 5; name = \"Condor_Pool\"; ] @jre JOB_ROUTER_ROUTE_NAMES = My_HTCONDOR Setting a Default\u2026 \u00b6 This section outlines how to set default job limits, memory, cores, and maximum walltime. For an example on how users can override these defaults, consult this document . Maximum number of jobs \u00b6 To set a default limit to the maximum number of jobs per route, you can edit the configuration variable CONDORCE_MAX_JOBS in /etc/condor-ce/config.d/01-ce-router.conf : CONDORCE_MAX_JOBS = 10000 Note The above configuration is to be placed directly into the HTCondor-CE configuration instead of a job route or transform. Maximum memory \u00b6 To set a default maximum memory (in MB) for routed jobs, set the variable or attribute default_maxMemory for the ClassAd transform and deprecated syntax, respectively: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA # Set the requested memory to 1 GB default_maxMemory = 1000 @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; # Set the requested memory to 1 GB set_default_maxMemory = 1000; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool Number of cores to request \u00b6 To set a default number of cores for routed jobs, set the variable or attribute default_xcount for the ClassAd transform and deprecated syntax, respectively: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA # Set the requested memory to 1 GB default_xcount = 8 @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; # Set the requested cores to 8 set_default_xcount = 8; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool Number of gpus to request \u00b6 To set a default number of GPUs for routed jobs, set the job ClassAd attribute RequestGPUs in the route transform: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA # If the job does not already have a RequestGPUs value set it to 1 DEFAULT RequestGPUs = 1 @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool The DEFAULT keyword works for any job attribute other than those mentioned above that require the use of alternative names for defaulting in the CE. The deprecated syntax has no keyword for defaulting. Maximum walltime \u00b6 To set a default number of cores for routed jobs, set the variable or attribute default_maxWallTime for the ClassAd transform and deprecated syntax, respectively: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA # Set the max walltime to 1 hr default_maxWallTime = 60 @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; # Set the max walltime to 1 hr set_default_maxWallTime = 60; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool Setting Job Environments \u00b6 HTCondor-CE offers two different methods for setting environment variables of routed jobs: CONDORCE_PILOT_JOB_ENV configuration, which should be used for setting environment variables for all routed jobs to static strings. default_pilot_job_env or set_default_pilot_job_env job route configuration, which should be used for setting environment variables: Per job route To values based on incoming job attributes Using ClassAd functions Both of these methods use the new HTCondor format of the environment command , which is described by environment variable/value pairs separated by whitespace and enclosed in double-quotes. For example, the following HTCondor-CE configuration would result in the following environment for all routed jobs: HTCondor-CE Configuration CONDORCE_PILOT_JOB_ENV = \"WN_SCRATCH_DIR=/nobackup/ http_proxy=proxy.wisc.edu\" Resulting Environment WN_SCRATCH_DIR = /nobackup/ http_proxy = proxy.wisc.edu Contents of CONDORCE_PILOT_JOB_ENV can reference other HTCondor-CE configuration using HTCondor's configuration $() macro expansion . For example, the following HTCondor-CE configuration would result in the following environment for all routed jobs: HTCondor-CE Configuration LOCAL_PROXY = proxy.wisc.edu CONDORCE_PILOT_JOB_ENV = \"WN_SCRATCH_DIR=/nobackup/ http_proxy=$(LOCAL_PROXY)\" Resulting Environment WN_SCRATCH_DIR = /nobackup/ http_proxy = proxy.wisc.edu To set environment variables per job route, based on incoming job attributes, or using ClassAd functions, add default_pilot_job_env or set_default_pilot_job_env to your job route configuration for ClassAd transforms and deprecated syntax, respectively. For example, the following HTCondor-CE configuration would result in this environment for a job with these attributes: ClassAd Transform JOB_ROUTER_Condor_Pool @=jrt UNIVERSE VANILLA default_pilot_job_env = strcat(\"WN_SCRATCH_DIR=/nobackup\", \" PILOT_COLLECTOR=\", JOB_COLLECTOR, \" ACCOUNTING_GROUP=\", toLower(JOB_VO)) @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; set_default_pilot_job_env = strcat(\"WN_SCRATCH_DIR=/nobackup\", \" PILOT_COLLECTOR=\", JOB_COLLECTOR, \" ACCOUNTING_GROUP=\", toLower(JOB_VO)); ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool Incoming Job Attributes JOB_COLLECTOR = \"collector.wisc.edu\" JOB_VO = \"GLOW\" Resulting Environment WN_SCRATCH_DIR = /nobackup/ PILOT_COLLECTOR = collector.wisc.edu ACCOUNTING_GROUP = glow Debugging job route environment expressions While constructing default_pilot_job_env or set_default_pilot_job_env expressions, try wrapping your expression in debug() to help with any issues that may arise. Make sure to remove debug() after you're done! Editing Attributes\u2026 \u00b6 The following functions are operations that can be used to take incoming job attributes and modify them for the routed job for the ClassAd transform and deprecated syntax, respectively: COPY , copy_* DELETE , delete_* SET , set_* EVALSET , eval_set_* The above operations are evaluated in order differently depending on your chosen syntax: If you are using ClassAd transforms , each function is evaluated in order of appearance. For example, the following will set FOO in the routed job to the incoming job's Owner attribute and then subsequently remove FOO from the routed job: JOB_ROUTER_Condor_Pool @=jrt EVALSET FOO = \"$(MY.Owner)\" DELETE FOO @jrt If you are using the deprecated syntax , each class of operations is evaluated in the order specified above, i.e. all copy_* , before delete_* , etc. For example, if the attribute FOO is set using eval_set_FOO in the JOB_ROUTER_DEFAULTS , you'll be unable to use delete_foo to remove it from your jobs since the attribute is set using eval_set_foo after the deletion occurs according to the order of operations. To get around this, we can take advantage of the fact that operations defined in JOB_ROUTER_DEFAULTS get overridden by the same operation in JOB_ROUTER_ENTRIES . So to 'delete' FOO , you could add eval_set_foo = \"\" to the route in the JOB_ROUTER_ENTRIES , resulting in foo being set to the empty string in the routed job. More documentation can be found in the HTCondor manual Copying attributes \u00b6 To copy the value of an attribute of the incoming job to an attribute of the routed job, use COPY or copy_ for ClassAd transform and deprecated syntaxes, respectively.. The following route copies the Environment attribute of the incoming job and sets the attribute Original_Environment on the routed job to the same value: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA COPY Environment Original_Environment @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; copy_Environment = \"Original_Environment\"; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool Removing attributes \u00b6 To remove an attribute of the incoming job from the routed job, use DELETE or delete_ for ClassAd transform and deprecated syntaxes, respectively. The following route removes the Environment attribute from the routed job: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA DELETE Environment @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; delete_Environment = True; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool Setting attributes \u00b6 To set an attribute on the routed job, use SET or set_ for ClassAd transform and deprecated syntaxes, respectively. The following route sets the Job's Rank attribute to 5: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA SET Rank = 5 @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; set_Rank = 5; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool Setting attributes with ClassAd expressions \u00b6 To set an attribute to a ClassAd expression to be evaluated, use EVALSET or eval_set for ClassAd transform and deprecated syntaxes, respectively. The following route sets the Experiment attribute to atlas.osguser if the Owner of the incoming job is osguser : ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA EVALSET Experiment = strcat(\"atlas.\", Owner) @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; eval_set_Experiment = strcat(\"atlas.\", Owner); ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool Limiting the Number of Jobs \u00b6 This section outlines how to limit the number of total or idle jobs in a specific route (i.e., if this limit is reached, jobs will no longer be placed in this route). Note If you are using an HTCondor batch system, limiting the number of jobs is not the preferred solution: HTCondor manages fair share on its own via user priorities and group accounting . Total jobs \u00b6 To set a limit on the number of jobs for a specific route, set the MaxJobs attribute: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Poole @=jrt UNIVERSE VANILLA MaxJobs = 100 @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; MaxJobs = 100; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool Idle jobs \u00b6 To set a limit on the number of idle jobs for a specific route, set the MaxIdleJobs attribute: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Poole @=jrt UNIVERSE VANILLA MaxIdleJobs = 100 @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; MaxIdleJobs = 100; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool Debugging Routes \u00b6 To help debug expressions in your routes, you can use the debug() function. First, set the debug mode for the JobRouter by editing a file in /etc/condor-ce/config.d/ to read JOB_ROUTER_DEBUG = D_ALWAYS:2 D_CAT Then wrap the problematic attribute in debug() : ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt EVALSET Experiment = debug(strcat(\"atlas\", Name)) @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ name = \"Condor_Pool\"; eval_set_Experiment = debug(strcat(\"atlas\", Name)); ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool You will find the debugging output in /var/log/condor-ce/JobRouterLog . Getting Help \u00b6 If you have any questions or issues with configuring job routes, please contact us for assistance.","title":"Writing Job Routes"},{"location":"v23/configuration/writing-job-routes/#writing-job-routes","text":"This document contains documentation for HTCondor-CE Job Router configurations with equivalent examples for the ClassAd transform and deprecated syntaxes. Configuration from this page should be written to files in /etc/condor-ce/config.d/ , whose contents are parsed in lexicographic order with subsequent variables overriding earlier ones. Each example is displayed in code blocks with tabs to switch between the two syntaxes: ClassAd Transform This is an example for the ClassAd transform syntax Deprecated syntax This is an example for the deprecated syntax","title":"Writing Job Routes"},{"location":"v23/configuration/writing-job-routes/#syntax-differences","text":"Planned Removal of Deprecated Syntax JOB_ROUTER_DEFAULTS , JOB_ROUTER_ENTRIES , JOB_ROUTER_ENTRIES_CMD , and JOB_ROUTER_ENTRIES_FILE are deprecated and will be removed for V24 of the HTCondor Software Suite. New configuration syntax for the job router is defined using JOB_ROUTER_ROUTE_NAMES and JOB_ROUTER_ROUTE_[name] . For new syntax example vist: HTCondor Documentation - Job Router Note: The removal will occur during the lifetime of the HTCondor V23 feature series. In HTCondor-CE 23, the deprecated syntax continues to be the default and administrator's can move to the ClassAd transform syntax by setting the following in a file in /etc/condor-ce/config.d/ : JOB_ROUTER_USE_DEPRECATED_ROUTER_ENTRIES = False The ClassAd transform syntax provides many benefits including: Statements being evaluated in the order they are written Use of variables that are not included in the resultant job ad Use simple case-like logic Additionally, it is now easier to include job transformations that should be evaluated before or after your routes by including transforms in the lists of JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES and JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES , respectively. For examples of the ClassAd transform syntax, you can inspect default job router transforms packaged with HTCondor-CE with the following command: user@host $ condor_ce_config_val -dump JOB_ROUTER_TRANSFORM_","title":"Syntax Differences"},{"location":"v23/configuration/writing-job-routes/#differences-in-my-and-target","text":"In addition to the above, the behavior of the MY. and TARGET. ClassAd attribute prefixes has changed between the two different syntaxes: In ClassAd transform syntax, MY. always refers to the incoming job's attributes and can be referenced within $() , e.g. $(MY.Owner) refers to the mapped user of the incoming job. TARGET is only used in SET expressions to refer to attributes in the slot ad (HTCondor pools only). In the deprecated syntax, MY. refers to attributes in the job route and TARGET. refers to attributes in the incoming job ad for copy_ , delete_ , and eval_set_ functions. However, in expressions defined by set_* , MY. refers to the attributes in the incoming job ad and TARGET. refers to the attribute in the slot ad (HTCondor pools only).","title":"Differences in MY. and TARGET."},{"location":"v23/configuration/writing-job-routes/#required-fields","text":"The minimum requirements for a route are that you specify the type of batch system that jobs should be routed to and a name for each route. Default routes can be found in /usr/share/condor-ce/config.d/02-ce--defaults.conf , provided by the htcondor-ce- packages.","title":"Required Fields"},{"location":"v23/configuration/writing-job-routes/#route-name","text":"To identify routes, you will need to assign a name to the route, either in the name of the configuration macro (i.e., JOB_ROUTER_ROUTE_ ) for the ClassAd transform syntax or with the name attribute for the deprecated syntax: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; ] JOB_ROUTER_ROUTE_NAMES = Condor_Pool Naming restrictions Route names should only contain alphanumeric and _ characters. Routes specified by JOB_ROUTER_ROUTE_* will override routes with the same name in JOB_ROUTER_ENTRIES The name of the route will be useful in debugging since it shows up in the output of condor_ce_job_router_info ; the JobRouterLog ; in the ClassAd of the routed job, which can be viewed with condor_q and condor_history for HTCondor batch systems; and in the ClassAd of the routed job, which can be vieweed with condor_ce_q or condor_ce_history for non-HTCondor batch systems.","title":"Route name"},{"location":"v23/configuration/writing-job-routes/#batch-system","text":"Each route needs to indicate the type of batch system that jobs should be routed to. For HTCondor batch systems, the UNIVERSE command or TargetUniverse attribute needs to be set to \"VANILLA\" or 5 , respectively. For all other batch systems, the GridResource attribute needs to be set to \"batch \" (where can be one of pbs , slurm , lsf , or sge ). ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA @jrt JOB_ROUTER_ROUTE_My_Slurm @=jrt GridResource = \"batch slurm\" @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool My_Slurm Deprecated syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; ] [ GridResource = \"batch slurm\"; name = \"My_Slurm\"; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool My_Slurm","title":"Batch system"},{"location":"v23/configuration/writing-job-routes/#writing-multiple-routes","text":"If your batch system needs incoming jobs to be sorted (e.g. if different VO's need to go to separate queues), you will need to write multiple job routes where each route is a separate JOB_ROUTER_ROUTE_* macro in the ClassAd transform syntax and enclosed by square brackets in the deprecated syntax. Additionally, the route names must be added to JOB_ROUTER_ROUTE_NAMES in the order that you want their requirements statements compared to incoming jobs. The following routes takes incoming jobs that have a queue attribute set to \"prod\" and sets IsProduction = True . All other jobs will be routed with IsProduction = False . ClassAd Transform JOB_ROUTER_ROUTE_Production_Jobs @=jrt REQUIREMENTS queue == \"prod\" UNIVERSE VANILLA SET IsProduction = True @jrt JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA SET IsProduction = False @jrt JOB_ROUTER_ROUTE_NAMES = Production_Jobs Condor_Pool Deprecated syntax JOB_ROUTER_ENTRIES @=jre [ Requirements = (TARGET.queue == \"prod\"); TargetUniverse = 5; set_IsProduction = True; name = \"Production_Jobs\"; ] [ TargetUniverse = 5; set_IsProduction = False; name = \"Condor_Pool\"; ] @jre JOB_ROUTER_ROUTE_NAMES = Production_Jobs Condor_Pool","title":"Writing Multiple Routes"},{"location":"v23/configuration/writing-job-routes/#writing-comments","text":"To write comments you can use # to comment a line: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt # This is a comment UNIVERSE VANILLA @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; # This is a comment ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool","title":"Writing Comments"},{"location":"v23/configuration/writing-job-routes/#setting-attributes-for-all-routes","text":"","title":"Setting Attributes for All Routes"},{"location":"v23/configuration/writing-job-routes/#classad-transform","text":"With the ClassAd transform syntax, any function from the Editing Attributes section can be applied before or after your routes are considered by appending the names of transforms specified by JOB_ROUTER_TRANSFORM_ to the lists of JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES and JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES , respectively. The following configuration sets the Periodic_Hold attribute for all routed jobs before any route transforms are applied: JOB_ROUTER_TRANSFORM_Periodic_Hold SET Periodic_Hold = (NumJobStarts >= 1 && JobStatus == 1) || NumJobStarts > 1 @jrt JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES = $(JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES) Periodic_Hold To apply the same transform after your pre-route and route transforms, append the name of the transform to JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES instead: JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES = $(JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES) Periodic_Hold","title":"ClassAd transform"},{"location":"v23/configuration/writing-job-routes/#deprecated-syntax","text":"To set an attribute that will be applied to all routes, you will need to ensure that MERGE_JOB_ROUTER_DEFAULT_ADS is set to True (check the value with condor_ce_config_val ) and use the set_ function in the JOB_ROUTER_DEFAULTS . The following configuration sets the Periodic_Hold attribute for all routes: # Use the defaults generated by the condor_ce_router_defaults script. To add # additional defaults, add additional lines of the form: # # JOB_ROUTER_DEFAULTS = $(JOB_ROUTER_DEFAULTS) [set_foo = 1;] # MERGE_JOB_ROUTER_DEFAULT_ADS=True JOB_ROUTER_DEFAULTS = $(JOB_ROUTER_DEFAULTS) [set_Periodic_Hold = (NumJobStarts >= 1 && JobStatus == 1) || NumJobStarts > 1;]","title":"Deprecated syntax"},{"location":"v23/configuration/writing-job-routes/#filtering-jobs-based-on","text":"To filter jobs, use the route's REQUIREMENTS or Requirements attribute for ClassAd transforms and deprecated syntaxes, respectively. Incoming jobs will be evaluated against the ClassAd expression set in the route's requirements and if the expression evaluates to TRUE , the route will match. More information on the syntax of ClassAd's can be found in the HTCondor manual . For an example on how incoming jobs interact with filtering in job routes, consult this document . In the deprecated syntax, you may need to specify TARGET. to refer to differentiate between job and route attributes. See this section for more details. Note If you have an HTCondor batch system, note the difference with set_requirements :","title":"Filtering Jobs Based On\u2026"},{"location":"v23/configuration/writing-job-routes/#pilot-job-queue","text":"To filter jobs based on their pilot job queue attribute, your routes will need a requirements expression using the incoming job's queue attribute. The following entry routes jobs to HTCondor if the incoming job (specified by TARGET ) is an analy (Analysis) glidein: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt REQUIREMENTS queue == \"prod\" UNIVERSE VANILLA @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated syntax JOB_ROUTER_ENTRIES @=jre [ Requirements = (TARGET.queue == \"prod\"); TargetUniverse = 5; name = \"Condor_Pool\"; ] @jre JOB_ROUTER_ROUTE_NAMES = My_HTCONDOR","title":"Pilot job queue"},{"location":"v23/configuration/writing-job-routes/#mapped-user","text":"To filter jobs based on what local account the incoming job was mapped to, your routes will need a requirements expression using the incoming job's Owner attribute. The following entry routes jobs to the HTCondor batch system if the mapped user is usatlas2 : ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt REQUIREMENTS Owner == \"usatlas2\" UNIVERSE VANILLA @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated syntax JOB_ROUTER_ENTRIES @=jre [ Requirements = (TARGET.Owner == \"usatlas2\"); TargetUniverse = 5; name = \"Condor_Pool\"; ] @jre JOB_ROUTER_ROUTE_NAMES = My_HTCONDOR Alternatively, you can match based on regular expression. The following entry routes jobs to the HTCondor batch system if the mapped user begins with usatlas : ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt REQUIREMENTS regexp(\"^usatlas\", Owner) UNIVERSE VANILLA @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated syntax JOB_ROUTER_ENTRIES @=jre [ Requirements = regexp(\"^usatlas\", TARGET.Owner); TargetUniverse = 5; name = \"Condor_Pool\"; ] @jre JOB_ROUTER_ROUTE_NAMES = My_HTCONDOR","title":"Mapped user"},{"location":"v23/configuration/writing-job-routes/#voms-attribute","text":"To filter jobs based on the subject of the job's proxy, your routes will need a requirements expression using the incoming job's x509UserProxyFirstFQAN attribute. The following entry routes jobs to the HTCondor batch system if the proxy subject contains /cms/Role=Pilot : ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt REQUIREMENTS regexp(\"\\/cms\\/Role\\=pilot\", x509UserProxyFirstFQAN) UNIVERSE VANILLA @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated syntax JOB_ROUTER_ENTRIES @=jre [ Requirements = regexp(\"\\/cms\\/Role\\=pilot\", TARGET.x509UserProxyFirstFQAN); TargetUniverse = 5; name = \"Condor_Pool\"; ] @jre JOB_ROUTER_ROUTE_NAMES = My_HTCONDOR","title":"VOMS attribute"},{"location":"v23/configuration/writing-job-routes/#setting-a-default","text":"This section outlines how to set default job limits, memory, cores, and maximum walltime. For an example on how users can override these defaults, consult this document .","title":"Setting a Default\u2026"},{"location":"v23/configuration/writing-job-routes/#maximum-number-of-jobs","text":"To set a default limit to the maximum number of jobs per route, you can edit the configuration variable CONDORCE_MAX_JOBS in /etc/condor-ce/config.d/01-ce-router.conf : CONDORCE_MAX_JOBS = 10000 Note The above configuration is to be placed directly into the HTCondor-CE configuration instead of a job route or transform.","title":"Maximum number of jobs"},{"location":"v23/configuration/writing-job-routes/#maximum-memory","text":"To set a default maximum memory (in MB) for routed jobs, set the variable or attribute default_maxMemory for the ClassAd transform and deprecated syntax, respectively: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA # Set the requested memory to 1 GB default_maxMemory = 1000 @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; # Set the requested memory to 1 GB set_default_maxMemory = 1000; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool","title":"Maximum memory"},{"location":"v23/configuration/writing-job-routes/#number-of-cores-to-request","text":"To set a default number of cores for routed jobs, set the variable or attribute default_xcount for the ClassAd transform and deprecated syntax, respectively: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA # Set the requested memory to 1 GB default_xcount = 8 @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; # Set the requested cores to 8 set_default_xcount = 8; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool","title":"Number of cores to request"},{"location":"v23/configuration/writing-job-routes/#number-of-gpus-to-request","text":"To set a default number of GPUs for routed jobs, set the job ClassAd attribute RequestGPUs in the route transform: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA # If the job does not already have a RequestGPUs value set it to 1 DEFAULT RequestGPUs = 1 @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool The DEFAULT keyword works for any job attribute other than those mentioned above that require the use of alternative names for defaulting in the CE. The deprecated syntax has no keyword for defaulting.","title":"Number of gpus to request"},{"location":"v23/configuration/writing-job-routes/#maximum-walltime","text":"To set a default number of cores for routed jobs, set the variable or attribute default_maxWallTime for the ClassAd transform and deprecated syntax, respectively: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA # Set the max walltime to 1 hr default_maxWallTime = 60 @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; # Set the max walltime to 1 hr set_default_maxWallTime = 60; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool","title":"Maximum walltime"},{"location":"v23/configuration/writing-job-routes/#setting-job-environments","text":"HTCondor-CE offers two different methods for setting environment variables of routed jobs: CONDORCE_PILOT_JOB_ENV configuration, which should be used for setting environment variables for all routed jobs to static strings. default_pilot_job_env or set_default_pilot_job_env job route configuration, which should be used for setting environment variables: Per job route To values based on incoming job attributes Using ClassAd functions Both of these methods use the new HTCondor format of the environment command , which is described by environment variable/value pairs separated by whitespace and enclosed in double-quotes. For example, the following HTCondor-CE configuration would result in the following environment for all routed jobs: HTCondor-CE Configuration CONDORCE_PILOT_JOB_ENV = \"WN_SCRATCH_DIR=/nobackup/ http_proxy=proxy.wisc.edu\" Resulting Environment WN_SCRATCH_DIR = /nobackup/ http_proxy = proxy.wisc.edu Contents of CONDORCE_PILOT_JOB_ENV can reference other HTCondor-CE configuration using HTCondor's configuration $() macro expansion . For example, the following HTCondor-CE configuration would result in the following environment for all routed jobs: HTCondor-CE Configuration LOCAL_PROXY = proxy.wisc.edu CONDORCE_PILOT_JOB_ENV = \"WN_SCRATCH_DIR=/nobackup/ http_proxy=$(LOCAL_PROXY)\" Resulting Environment WN_SCRATCH_DIR = /nobackup/ http_proxy = proxy.wisc.edu To set environment variables per job route, based on incoming job attributes, or using ClassAd functions, add default_pilot_job_env or set_default_pilot_job_env to your job route configuration for ClassAd transforms and deprecated syntax, respectively. For example, the following HTCondor-CE configuration would result in this environment for a job with these attributes: ClassAd Transform JOB_ROUTER_Condor_Pool @=jrt UNIVERSE VANILLA default_pilot_job_env = strcat(\"WN_SCRATCH_DIR=/nobackup\", \" PILOT_COLLECTOR=\", JOB_COLLECTOR, \" ACCOUNTING_GROUP=\", toLower(JOB_VO)) @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; set_default_pilot_job_env = strcat(\"WN_SCRATCH_DIR=/nobackup\", \" PILOT_COLLECTOR=\", JOB_COLLECTOR, \" ACCOUNTING_GROUP=\", toLower(JOB_VO)); ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool Incoming Job Attributes JOB_COLLECTOR = \"collector.wisc.edu\" JOB_VO = \"GLOW\" Resulting Environment WN_SCRATCH_DIR = /nobackup/ PILOT_COLLECTOR = collector.wisc.edu ACCOUNTING_GROUP = glow Debugging job route environment expressions While constructing default_pilot_job_env or set_default_pilot_job_env expressions, try wrapping your expression in debug() to help with any issues that may arise. Make sure to remove debug() after you're done!","title":"Setting Job Environments"},{"location":"v23/configuration/writing-job-routes/#editing-attributes","text":"The following functions are operations that can be used to take incoming job attributes and modify them for the routed job for the ClassAd transform and deprecated syntax, respectively: COPY , copy_* DELETE , delete_* SET , set_* EVALSET , eval_set_* The above operations are evaluated in order differently depending on your chosen syntax: If you are using ClassAd transforms , each function is evaluated in order of appearance. For example, the following will set FOO in the routed job to the incoming job's Owner attribute and then subsequently remove FOO from the routed job: JOB_ROUTER_Condor_Pool @=jrt EVALSET FOO = \"$(MY.Owner)\" DELETE FOO @jrt If you are using the deprecated syntax , each class of operations is evaluated in the order specified above, i.e. all copy_* , before delete_* , etc. For example, if the attribute FOO is set using eval_set_FOO in the JOB_ROUTER_DEFAULTS , you'll be unable to use delete_foo to remove it from your jobs since the attribute is set using eval_set_foo after the deletion occurs according to the order of operations. To get around this, we can take advantage of the fact that operations defined in JOB_ROUTER_DEFAULTS get overridden by the same operation in JOB_ROUTER_ENTRIES . So to 'delete' FOO , you could add eval_set_foo = \"\" to the route in the JOB_ROUTER_ENTRIES , resulting in foo being set to the empty string in the routed job. More documentation can be found in the HTCondor manual","title":"Editing Attributes\u2026"},{"location":"v23/configuration/writing-job-routes/#copying-attributes","text":"To copy the value of an attribute of the incoming job to an attribute of the routed job, use COPY or copy_ for ClassAd transform and deprecated syntaxes, respectively.. The following route copies the Environment attribute of the incoming job and sets the attribute Original_Environment on the routed job to the same value: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA COPY Environment Original_Environment @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; copy_Environment = \"Original_Environment\"; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool","title":"Copying attributes"},{"location":"v23/configuration/writing-job-routes/#removing-attributes","text":"To remove an attribute of the incoming job from the routed job, use DELETE or delete_ for ClassAd transform and deprecated syntaxes, respectively. The following route removes the Environment attribute from the routed job: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA DELETE Environment @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; delete_Environment = True; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool","title":"Removing attributes"},{"location":"v23/configuration/writing-job-routes/#setting-attributes","text":"To set an attribute on the routed job, use SET or set_ for ClassAd transform and deprecated syntaxes, respectively. The following route sets the Job's Rank attribute to 5: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA SET Rank = 5 @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; set_Rank = 5; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool","title":"Setting attributes"},{"location":"v23/configuration/writing-job-routes/#setting-attributes-with-classad-expressions","text":"To set an attribute to a ClassAd expression to be evaluated, use EVALSET or eval_set for ClassAd transform and deprecated syntaxes, respectively. The following route sets the Experiment attribute to atlas.osguser if the Owner of the incoming job is osguser : ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA EVALSET Experiment = strcat(\"atlas.\", Owner) @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; eval_set_Experiment = strcat(\"atlas.\", Owner); ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool","title":"Setting attributes with ClassAd expressions"},{"location":"v23/configuration/writing-job-routes/#limiting-the-number-of-jobs","text":"This section outlines how to limit the number of total or idle jobs in a specific route (i.e., if this limit is reached, jobs will no longer be placed in this route). Note If you are using an HTCondor batch system, limiting the number of jobs is not the preferred solution: HTCondor manages fair share on its own via user priorities and group accounting .","title":"Limiting the Number of Jobs"},{"location":"v23/configuration/writing-job-routes/#total-jobs","text":"To set a limit on the number of jobs for a specific route, set the MaxJobs attribute: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Poole @=jrt UNIVERSE VANILLA MaxJobs = 100 @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; MaxJobs = 100; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool","title":"Total jobs"},{"location":"v23/configuration/writing-job-routes/#idle-jobs","text":"To set a limit on the number of idle jobs for a specific route, set the MaxIdleJobs attribute: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Poole @=jrt UNIVERSE VANILLA MaxIdleJobs = 100 @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; MaxIdleJobs = 100; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool","title":"Idle jobs"},{"location":"v23/configuration/writing-job-routes/#debugging-routes","text":"To help debug expressions in your routes, you can use the debug() function. First, set the debug mode for the JobRouter by editing a file in /etc/condor-ce/config.d/ to read JOB_ROUTER_DEBUG = D_ALWAYS:2 D_CAT Then wrap the problematic attribute in debug() : ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt EVALSET Experiment = debug(strcat(\"atlas\", Name)) @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ name = \"Condor_Pool\"; eval_set_Experiment = debug(strcat(\"atlas\", Name)); ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool You will find the debugging output in /var/log/condor-ce/JobRouterLog .","title":"Debugging Routes"},{"location":"v23/configuration/writing-job-routes/#getting-help","text":"If you have any questions or issues with configuring job routes, please contact us for assistance.","title":"Getting Help"},{"location":"v23/installation/central-collector/","text":"Installing an HTCondor-CE Central Collector \u00b6 The HTCondor-CE Central Collector is an information service designed to provide a an overview and descriptions of grid services. Based on the HTCondorView Server , the Central Collector accepts ClassAds from site HTCondor-CEs by default but may accept from other services using the HTCondor Python Bindings . By distributing configuration to each member site, a central grid team can coordinate the information that site HTCondor-CEs should advertise. Additionally, the the HTCondor-CE View web server may be installed alongside a Central Collector to display pilot job statistics across its grid, as well as information for each site HTCondor-CE. For example, the OSG Central Collector can be viewed at https://collector.opensciencegrid.org . Use this page to learn how to install, configure, and run an HTCondor-CE Central Collector as part of your central operations. Before Starting \u00b6 Before starting the installation process, consider the following points (consulting the reference page as necessary): User IDs: If they do not exist already, the installation will create the condor Linux user (UID 4716) SSL certificate: The HTCondor-CE Central Collector service uses a host certificate and key for SSL authentication DNS entries: Forward and reverse DNS must resolve for the HTCondor-CE Central Collector host Network ports: Site HTCondor-CEs must be able to contact the Central Collector on port 9619 (TCP). Additionally, the optional HTCondor-CE View web server should be accessible on port 80 (TCP). There are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system (Red Hat Enterprise Linux variant 7) Obtain root access to the host Prepare the EPEL and HTCondor Yum repositories Install CA certificates and VO data into /etc/grid-security/certificates and /etc/grid-security/vomsdir , respectively Installing a Central Collector \u00b6 Clean yum cache: root@host # yum clean all --enablerepo = * Update software: root@host # yum update This command will update all packages Install the Central Collector software: root@host # yum install htcondor-ce-collector Configuring a Central Collector \u00b6 Like a site HTCondor-CE, the Central Collector uses X.509 host certificates and certificate authorities (CAs) when authenticating SSL connections. By default, the Central Collector uses the default system locations to locate CAs and host certificate when authenticating SSL connections, i.e. for SSL authentication methods. But traditionally, the Central Collector and HTCondor-CEs have authenticated with each other using specialized grid certificates (e.g. certificates issued by IGTF CAs ) located in /etc/grid-security/ . Choose one of the following options to configure your Central Collector to use grid or system certificates for authentication: If your site HTCondor-CEs will be advertising to your Central Collector using grid certificates or you are using a grid certificate for your Central Collector's host certificate: Set the following configuration in /etc/condor-ce/config.d/01-ce-auth.conf : AUTH_SSL_SERVER_CERTFILE = /etc/grid-security/hostcert.pem AUTH_SSL_SERVER_KEYFILE = /etc/grid-security/hostkey.pem AUTH_SSL_SERVER_CADIR = /etc/grid-security/certificates AUTH_SSL_SERVER_CAFILE = AUTH_SSL_CLIENT_CERTFILE = /etc/grid-security/hostcert.pem AUTH_SSL_CLIENT_KEYFILE = /etc/grid-security/hostkey.pem AUTH_SSL_CLIENT_CADIR = /etc/grid-security/certificates AUTH_SSL_CLIENT_CAFILE = Install your host certificate and key into /etc/grid-security/hostcert.pem and /etc/grid-security/hostkey.pem , respectively Set the ownership and Unix permissions of the host certificate and key root@host # chown root:root /etc/grid-security/hostcert.pem /etc/grid-security/hostkey.pem root@host # chmod 644 /etc/grid-security/hostcert.pem root@host # chmod 600 /etc/grid-security/hostkey.pem Otherwise, use the default system locations: Install your host certificate and key into /etc/pki/tls/certs/localhost.crt and /etc/pki/tls/private/localhost.key , respectively Set the ownership and Unix permissions of the host certificate and key root@host # chown root:root /etc/pki/tls/certs/localhost.crt /etc/pki/tls/private/localhost.key root@host # chmod 644 /etc/pki/tls/certs/localhost.crt root@host # chmod 600 /etc/pki/tls/private/localhost.key Optional configuration \u00b6 The following configuration steps are optional and will not be required for all Central Collectors. If you do not need any of the following special configurations, skip to the section on next steps . Banning HTCondor-CEs \u00b6 By default, Central Collectors accept ClassAds from all HTCondor-CEs with a valid and accepted certificate. If you want to stop accepting ClassAds from a particular HTCondor-CE, add its hostname to DENY_ADVERTISE_SCHEDD in /etc/condor-ce/config.d/01-ce-collector.conf . For example: DENY_ADVERTISE_SCHEDD = $(DENY_ADVERTISE_SCHEDD), misbehaving-ce-1.bad-domain.com, misbehaving-ce-2.bad-domain.com Configuring HTCondor-CE View \u00b6 The HTCondor-CE View is an optional web interface to the status of all HTCondor-CEs advertising to your Central Collector. To run the HTCondor-CE View, install the appropriate package and set the relevant configuration. Begin by installing the htcondor-ce-view package: root@host # yum install htcondor-ce-view Restart the condor-ce-collector service Verify the service by entering your Central Collector's hostname into your web browser The website is served on port 80 by default. To change this default, edit the value of HTCONDORCE_VIEW_PORT in /etc/condor-ce/config.d/05-ce-view.conf . Distributing Configuration to Site HTCondor-CEs \u00b6 To make the Central Collector truly useful, each site HTCondor-CE in your organization will need to configure their HTCondor-CEs to advertise to your Central Collector(s) along with any custom information that may be of interest. For example, the OSG provides default configuration to OSG sites through an osg-ce metapackage and configuration tools. Following the Filesystem Hierarchy Standard , the following configuration should be set by HTCondor-CE administrators in /etc/condor-ce/config.d/ or by packagers in /usr/share/condor-ce/config.d/ : Set CONDOR_VIEW_HOST to a comma-separated list of Central Collectors: CONDOR_VIEW_HOST = collector.htcondor.org:9619, collector1.htcondor.org:9619, collector2.htcondor.org:9619 Append arbitrary attributes to SCHEDD_ATTRS containing custom information in any number of arbitrarily configuration attributes: ATTR_NAME_1 = value1 ATTR_NAME_2 = value2 SCHEDD_ATTRS = $(SCHEDD_ATTRS) ATTR_NAME_1 ATTR_NAME_2 For example, OSG sites advertise information describing their OSG Topology registrations, local batch system, and local resourcess: OSG_Resource = \"local\" OSG_ResourceGroup = \"\" OSG_BatchSystems = \"condor\" OSG_ResourceCatalog = { \\ [ \\ AllowedVOs = { \"osg\" }; \\ CPUs = 2; \\ MaxWallTime = 1440; \\ Memory = 10000; \\ Name = \"test\"; \\ Requirements = TARGET.RequestCPUs <= CPUs && TARGET.RequestMemory <= Memory && member(TARGET.VO, AllowedVOs); \\ Transform = [ set_MaxMemory = RequestMemory; set_xcount = RequestCPUs; ]; \\ ] \\ } SCHEDD_ATTRS = $(SCHEDD_ATTRS) OSG_Resource OSG_ResourceGroup OSG_BatchSystems OSG_ResourceCatalog Verifying a Central Collector \u00b6 To verify that you have a working installation of a Central Collector, ensure that all the relevant services are started and enabled then perform the validation steps below. Managing Central Collector services \u00b6 In addition to the Central Collector service itself, there are a number of supporting services in your installation. The specific services are: Software Service name HTCondor-CE condor-ce-collector Start and enable the services in the order listed and stop them in reverse order. As a reminder, here are common service commands (all run as root ): To... On EL7, run the command... Start a service systemctl start Stop a service systemctl stop Enable a service to start on boot systemctl enable Disable a service from starting on boot systemctl disable Validating a Central Collector \u00b6 Getting Help \u00b6 If you have any questions or issues with the installation process, please contact us for assistance.","title":"Install a Central Collector"},{"location":"v23/installation/central-collector/#installing-an-htcondor-ce-central-collector","text":"The HTCondor-CE Central Collector is an information service designed to provide a an overview and descriptions of grid services. Based on the HTCondorView Server , the Central Collector accepts ClassAds from site HTCondor-CEs by default but may accept from other services using the HTCondor Python Bindings . By distributing configuration to each member site, a central grid team can coordinate the information that site HTCondor-CEs should advertise. Additionally, the the HTCondor-CE View web server may be installed alongside a Central Collector to display pilot job statistics across its grid, as well as information for each site HTCondor-CE. For example, the OSG Central Collector can be viewed at https://collector.opensciencegrid.org . Use this page to learn how to install, configure, and run an HTCondor-CE Central Collector as part of your central operations.","title":"Installing an HTCondor-CE Central Collector"},{"location":"v23/installation/central-collector/#before-starting","text":"Before starting the installation process, consider the following points (consulting the reference page as necessary): User IDs: If they do not exist already, the installation will create the condor Linux user (UID 4716) SSL certificate: The HTCondor-CE Central Collector service uses a host certificate and key for SSL authentication DNS entries: Forward and reverse DNS must resolve for the HTCondor-CE Central Collector host Network ports: Site HTCondor-CEs must be able to contact the Central Collector on port 9619 (TCP). Additionally, the optional HTCondor-CE View web server should be accessible on port 80 (TCP). There are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system (Red Hat Enterprise Linux variant 7) Obtain root access to the host Prepare the EPEL and HTCondor Yum repositories Install CA certificates and VO data into /etc/grid-security/certificates and /etc/grid-security/vomsdir , respectively","title":"Before Starting"},{"location":"v23/installation/central-collector/#installing-a-central-collector","text":"Clean yum cache: root@host # yum clean all --enablerepo = * Update software: root@host # yum update This command will update all packages Install the Central Collector software: root@host # yum install htcondor-ce-collector","title":"Installing a Central Collector"},{"location":"v23/installation/central-collector/#configuring-a-central-collector","text":"Like a site HTCondor-CE, the Central Collector uses X.509 host certificates and certificate authorities (CAs) when authenticating SSL connections. By default, the Central Collector uses the default system locations to locate CAs and host certificate when authenticating SSL connections, i.e. for SSL authentication methods. But traditionally, the Central Collector and HTCondor-CEs have authenticated with each other using specialized grid certificates (e.g. certificates issued by IGTF CAs ) located in /etc/grid-security/ . Choose one of the following options to configure your Central Collector to use grid or system certificates for authentication: If your site HTCondor-CEs will be advertising to your Central Collector using grid certificates or you are using a grid certificate for your Central Collector's host certificate: Set the following configuration in /etc/condor-ce/config.d/01-ce-auth.conf : AUTH_SSL_SERVER_CERTFILE = /etc/grid-security/hostcert.pem AUTH_SSL_SERVER_KEYFILE = /etc/grid-security/hostkey.pem AUTH_SSL_SERVER_CADIR = /etc/grid-security/certificates AUTH_SSL_SERVER_CAFILE = AUTH_SSL_CLIENT_CERTFILE = /etc/grid-security/hostcert.pem AUTH_SSL_CLIENT_KEYFILE = /etc/grid-security/hostkey.pem AUTH_SSL_CLIENT_CADIR = /etc/grid-security/certificates AUTH_SSL_CLIENT_CAFILE = Install your host certificate and key into /etc/grid-security/hostcert.pem and /etc/grid-security/hostkey.pem , respectively Set the ownership and Unix permissions of the host certificate and key root@host # chown root:root /etc/grid-security/hostcert.pem /etc/grid-security/hostkey.pem root@host # chmod 644 /etc/grid-security/hostcert.pem root@host # chmod 600 /etc/grid-security/hostkey.pem Otherwise, use the default system locations: Install your host certificate and key into /etc/pki/tls/certs/localhost.crt and /etc/pki/tls/private/localhost.key , respectively Set the ownership and Unix permissions of the host certificate and key root@host # chown root:root /etc/pki/tls/certs/localhost.crt /etc/pki/tls/private/localhost.key root@host # chmod 644 /etc/pki/tls/certs/localhost.crt root@host # chmod 600 /etc/pki/tls/private/localhost.key","title":"Configuring a Central Collector"},{"location":"v23/installation/central-collector/#optional-configuration","text":"The following configuration steps are optional and will not be required for all Central Collectors. If you do not need any of the following special configurations, skip to the section on next steps .","title":"Optional configuration"},{"location":"v23/installation/central-collector/#banning-htcondor-ces","text":"By default, Central Collectors accept ClassAds from all HTCondor-CEs with a valid and accepted certificate. If you want to stop accepting ClassAds from a particular HTCondor-CE, add its hostname to DENY_ADVERTISE_SCHEDD in /etc/condor-ce/config.d/01-ce-collector.conf . For example: DENY_ADVERTISE_SCHEDD = $(DENY_ADVERTISE_SCHEDD), misbehaving-ce-1.bad-domain.com, misbehaving-ce-2.bad-domain.com","title":"Banning HTCondor-CEs"},{"location":"v23/installation/central-collector/#configuring-htcondor-ce-view","text":"The HTCondor-CE View is an optional web interface to the status of all HTCondor-CEs advertising to your Central Collector. To run the HTCondor-CE View, install the appropriate package and set the relevant configuration. Begin by installing the htcondor-ce-view package: root@host # yum install htcondor-ce-view Restart the condor-ce-collector service Verify the service by entering your Central Collector's hostname into your web browser The website is served on port 80 by default. To change this default, edit the value of HTCONDORCE_VIEW_PORT in /etc/condor-ce/config.d/05-ce-view.conf .","title":"Configuring HTCondor-CE View"},{"location":"v23/installation/central-collector/#distributing-configuration-to-site-htcondor-ces","text":"To make the Central Collector truly useful, each site HTCondor-CE in your organization will need to configure their HTCondor-CEs to advertise to your Central Collector(s) along with any custom information that may be of interest. For example, the OSG provides default configuration to OSG sites through an osg-ce metapackage and configuration tools. Following the Filesystem Hierarchy Standard , the following configuration should be set by HTCondor-CE administrators in /etc/condor-ce/config.d/ or by packagers in /usr/share/condor-ce/config.d/ : Set CONDOR_VIEW_HOST to a comma-separated list of Central Collectors: CONDOR_VIEW_HOST = collector.htcondor.org:9619, collector1.htcondor.org:9619, collector2.htcondor.org:9619 Append arbitrary attributes to SCHEDD_ATTRS containing custom information in any number of arbitrarily configuration attributes: ATTR_NAME_1 = value1 ATTR_NAME_2 = value2 SCHEDD_ATTRS = $(SCHEDD_ATTRS) ATTR_NAME_1 ATTR_NAME_2 For example, OSG sites advertise information describing their OSG Topology registrations, local batch system, and local resourcess: OSG_Resource = \"local\" OSG_ResourceGroup = \"\" OSG_BatchSystems = \"condor\" OSG_ResourceCatalog = { \\ [ \\ AllowedVOs = { \"osg\" }; \\ CPUs = 2; \\ MaxWallTime = 1440; \\ Memory = 10000; \\ Name = \"test\"; \\ Requirements = TARGET.RequestCPUs <= CPUs && TARGET.RequestMemory <= Memory && member(TARGET.VO, AllowedVOs); \\ Transform = [ set_MaxMemory = RequestMemory; set_xcount = RequestCPUs; ]; \\ ] \\ } SCHEDD_ATTRS = $(SCHEDD_ATTRS) OSG_Resource OSG_ResourceGroup OSG_BatchSystems OSG_ResourceCatalog","title":"Distributing Configuration to Site HTCondor-CEs"},{"location":"v23/installation/central-collector/#verifying-a-central-collector","text":"To verify that you have a working installation of a Central Collector, ensure that all the relevant services are started and enabled then perform the validation steps below.","title":"Verifying a Central Collector"},{"location":"v23/installation/central-collector/#managing-central-collector-services","text":"In addition to the Central Collector service itself, there are a number of supporting services in your installation. The specific services are: Software Service name HTCondor-CE condor-ce-collector Start and enable the services in the order listed and stop them in reverse order. As a reminder, here are common service commands (all run as root ): To... On EL7, run the command... Start a service systemctl start Stop a service systemctl stop Enable a service to start on boot systemctl enable Disable a service from starting on boot systemctl disable ","title":"Managing Central Collector services"},{"location":"v23/installation/central-collector/#validating-a-central-collector","text":"","title":"Validating a Central Collector"},{"location":"v23/installation/central-collector/#getting-help","text":"If you have any questions or issues with the installation process, please contact us for assistance.","title":"Getting Help"},{"location":"v23/installation/htcondor-ce/","text":"Installing HTCondor-CE 23 \u00b6 Joining the OSG Consortium (OSG)? If you are installing an HTCondor-CE for the OSG, consult the OSG-specific documentation . HTCondor-CE is a special configuration of the HTCondor software designed as a Compute Entrypoint solution for computing grids (e.g. European Grid Infrastructure , The OSG Consortium ). It is configured to use the Job Router daemon to delegate resource allocation requests by transforming and submitting them to the site\u2019s batch system. See the home page for more details on the features and architecture of HTCondor-CE. Use this page to learn how to install, configure, run, test, and troubleshoot HTCondor-CE 23 from the CHTC yum repositories . Before Starting \u00b6 Before starting the installation process, consider the following points (consulting the reference page as necessary): User IDs: If they do not exist already, the installation will create the condor Linux user (UID 4716) SSL certificate: The HTCondor-CE service uses a host certificate and key for SSL authentication DNS entries: Forward and reverse DNS must resolve for the HTCondor-CE host Network ports: The pilot factories must be able to contact your HTCondor-CE service on port 9619 (TCP) Submit host: HTCondor-CE should be installed on a host that already has the ability to submit jobs into your local cluster running supported batch system software (Grid Engine, HTCondor, LSF, PBS/Torque, Slurm) File Systems : Non-HTCondor batch systems require a shared file system between the HTCondor-CE host and the batch system worker nodes. There are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system (Red Hat Enterprise Linux variant 7) Obtain root access to the host Prepare the EPEL and HTCondor Development Yum repositories Install CA certificates and VO data into /etc/grid-security/certificates and /etc/grid-security/vomsdir , respectively Installing HTCondor-CE \u00b6 Important HTCondor-CE must be installed on a host that is configured to submit jobs to your batch system. The details of this setup is site-specific by nature and therefore beyond the scope of this document. Clean yum cache: root@host # yum clean all --enablerepo = * Update software: root@host # yum update This command will update all packages Select the appropriate convenience RPM: If your batch system is... Then use the following package... Grid Engine htcondor-ce-sge HTCondor htcondor-ce-condor LSF htcondor-ce-lsf PBS/Torque htcondor-ce-pbs SLURM htcondor-ce-slurm Install the CE software: root@host # yum install Where is the package you selected in the above step. Next Steps \u00b6 At this point, you should have all the necessary binaries, scripts, and default configurations. The next step is to configure authentication to allow for remote submission to your HTCondor-CE. Getting Help \u00b6 If you have any questions or issues with the installation process, please contact us for assistance.","title":"Installation"},{"location":"v23/installation/htcondor-ce/#installing-htcondor-ce-23","text":"Joining the OSG Consortium (OSG)? If you are installing an HTCondor-CE for the OSG, consult the OSG-specific documentation . HTCondor-CE is a special configuration of the HTCondor software designed as a Compute Entrypoint solution for computing grids (e.g. European Grid Infrastructure , The OSG Consortium ). It is configured to use the Job Router daemon to delegate resource allocation requests by transforming and submitting them to the site\u2019s batch system. See the home page for more details on the features and architecture of HTCondor-CE. Use this page to learn how to install, configure, run, test, and troubleshoot HTCondor-CE 23 from the CHTC yum repositories .","title":"Installing HTCondor-CE 23"},{"location":"v23/installation/htcondor-ce/#before-starting","text":"Before starting the installation process, consider the following points (consulting the reference page as necessary): User IDs: If they do not exist already, the installation will create the condor Linux user (UID 4716) SSL certificate: The HTCondor-CE service uses a host certificate and key for SSL authentication DNS entries: Forward and reverse DNS must resolve for the HTCondor-CE host Network ports: The pilot factories must be able to contact your HTCondor-CE service on port 9619 (TCP) Submit host: HTCondor-CE should be installed on a host that already has the ability to submit jobs into your local cluster running supported batch system software (Grid Engine, HTCondor, LSF, PBS/Torque, Slurm) File Systems : Non-HTCondor batch systems require a shared file system between the HTCondor-CE host and the batch system worker nodes. There are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system (Red Hat Enterprise Linux variant 7) Obtain root access to the host Prepare the EPEL and HTCondor Development Yum repositories Install CA certificates and VO data into /etc/grid-security/certificates and /etc/grid-security/vomsdir , respectively","title":"Before Starting"},{"location":"v23/installation/htcondor-ce/#installing-htcondor-ce","text":"Important HTCondor-CE must be installed on a host that is configured to submit jobs to your batch system. The details of this setup is site-specific by nature and therefore beyond the scope of this document. Clean yum cache: root@host # yum clean all --enablerepo = * Update software: root@host # yum update This command will update all packages Select the appropriate convenience RPM: If your batch system is... Then use the following package... Grid Engine htcondor-ce-sge HTCondor htcondor-ce-condor LSF htcondor-ce-lsf PBS/Torque htcondor-ce-pbs SLURM htcondor-ce-slurm Install the CE software: root@host # yum install Where is the package you selected in the above step.","title":"Installing HTCondor-CE"},{"location":"v23/installation/htcondor-ce/#next-steps","text":"At this point, you should have all the necessary binaries, scripts, and default configurations. The next step is to configure authentication to allow for remote submission to your HTCondor-CE.","title":"Next Steps"},{"location":"v23/installation/htcondor-ce/#getting-help","text":"If you have any questions or issues with the installation process, please contact us for assistance.","title":"Getting Help"},{"location":"v23/troubleshooting/common-issues/","text":"Common Issues \u00b6 Known Issues \u00b6 SUBMIT_ATTRS are not applied to jobs on the local HTCondor \u00b6 If you are adding attributes to jobs submitted to your HTCondor pool with SUBMIT_ATTRS , these will not be applied to jobs that are entering your pool from the HTCondor-CE. To get around this, you will want to add the attributes to your job routes . If the CE is the only entry point for jobs into your pool, you can get rid of SUBMIT_ATTRS on your backend. Otherwise, you will have to maintain your list of attributes both in your list of routes and in your SUBMIT_ATTRS . General Troubleshooting Items \u00b6 Making sure packages are up-to-date \u00b6 It is important to make sure that the HTCondor-CE and related RPMs are up-to-date. root@host # yum update \"htcondor-ce*\" blahp condor If you just want to see the packages to update, but do not want to perform the update now, answer N at the prompt. Verify package contents \u00b6 If the contents of your HTCondor-CE packages have been changed, the CE may cease to function properly. To verify the contents of your packages (ignoring changes to configuration files): user@host $ rpm -q --verify htcondor-ce htcondor-ce-client blahp | grep -v '/var/' | awk '$2 != \"c\" {print $0}' If the verification command returns output, this means that your packages have been changed. To fix this, you can reinstall the packages: user@host $ yum reinstall htcondor-ce htcondor-ce-client blahp Note The reinstall command may place original versions of configuration files alongside the versions that you have modified. If this is the case, the reinstall command will notify you that the original versions will have an .rpmnew suffix. Further inspection of these files may be required as to whether or not you need to merge them into your current configuration. Verify clocks are synchronized \u00b6 Like all network-based authentication, HTCondor-CE is sensitive to time skews. Make sure the clock on your CE is synchronized using a utility such as ntpd . Additionally, HTCondor itself is sensitive to time skews on the NFS server. If you see empty stdout / err being returned to the submitter, verify there is no NFS server time skew. HTCondor-CE Troubleshooting Items \u00b6 This section contains common issues you may encounter using HTCondor-CE and next actions to take when you do. Before troubleshooting, we recommend increasing the log level: Write the following into /etc/condor-ce/config.d/99-local.conf to increase the log level for all daemons: ALL_DEBUG = D_ALWAYS:2 D_CAT Ensure that the configuration is in place: root@host # condor_ce_reconfig Reproduce the issue Note Before spending any time on troubleshooting, you should ensure that the state of configuration is as expected by running condor_ce_reconfig . Daemons fail to start \u00b6 If there are errors in your configuration of HTCondor-CE, this may cause some of its required daemons to fail to startup. Check the following subsections in order: Symptoms Daemon startup failure may manifest in many ways, the following are few symptoms of the problem. The service fails to start: root@host # service condor-ce start Starting Condor-CE daemons: [ FAIL ] condor_ce_q fails with a lengthy error message: user@host $ condor_ce_q Error: Extra Info: You probably saw this error because the condor_schedd is not running on the machine you are trying to query. If the condor_schedd is not running, the Condor system will not be able to find an address and port to connect to and satisfy this request. Please make sure the Condor daemons are running and try again. Extra Info: If the condor_schedd is running on the machine you are trying to query and you still see the error, the most likely cause is that you have setup a personal Condor, you have not defined SCHEDD_NAME in your condor_config file, and something is wrong with your SCHEDD_ADDRESS_FILE setting. You must define either or both of those settings in your config file, or you must use the -name option to condor_q. Please see the Condor manual for details on SCHEDD_NAME and SCHEDD_ADDRESS_FILE. Next actions If the MasterLog is filled with ERROR:SECMAN...TCP connection to collector...failed : This is likely due to a misconfiguration for a host with multiple network interfaces. Verify that you have followed the instructions in this section of the optional configuration page. If the MasterLog is filled with DC_AUTHENTICATE errors: The HTCondor-CE daemons use the host certificate to authenticate with each other. Verify that your host certificate\u2019s DN matches one of the regular expressions found in /etc/condor-ce/condor_mapfile . If the SchedLog is filled with Can\u2019t find address for negotiator : You can ignore this error! The negotiator daemon is used in HTCondor batch systems to match jobs with resources but since HTCondor-CE does not manage any resources directly, it does not run one. Jobs fail to submit to the CE \u00b6 If a user is having issues submitting jobs to the CE and you've ruled out general connectivity or firewalls as the culprit, then you may have encountered an authentication or authorization issue. You may see error messages like the following in your SchedLog : 08/30/16 16:52:56 DC_AUTHENTICATE: required authentication of 72.33.0.189 failed: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using GSI|GSI:5002:Failed to authenticate because the remote (client) side was not able to acquire its credentials.|AUTHENTICATE:1004:Failed to authenticate using FS|FS:1004:Unable to lstat(/tmp/FS_XXXZpUlYa) 08/30/16 16:53:12 PERMISSION DENIED to gsi@unmapped from host 72.33.0.189 for command 60021 (DC_NOP_WRITE), access level WRITE: reason: WRITE authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 72.33.0.189,dyn-72-33-0-189.uwnet.wisc.edu, hostname size = 1, original ip address = 72.33.0.189 08/30/16 16:53:12 DC_AUTHENTICATE: Command not authorized, done! The detailed debug output of condor_ce_ping -d can provide useful data from the client side. The following are several potential causes and how to check and correct them. Jobs fail to submit: Verify SSL configuration on the CE \u00b6 Your machine must have a valid host certificate and private key, and the CE must be configured to use them. See the documentation about Configuring Certificates for details. If the CE can't read its host certificate and private key, you will see an error like the following in /var/log/condor-ce/SchedLog if D_SECURITY is enabled in SCHEDD_DEBUG 10/07/21 17:52:01 (D_SECURITY) SSL Auth: Error loading private key from file 10/07/21 17:52:01 (D_SECURITY) SSL Auth: Error initializing server security context 10/07/21 17:52:01 (D_SECURITY) SSL Auth: Error creating SSL context Next actions If your host certificate is installed under /etc/grid-security/ , ensure the CE is configured look for it there (see configuring certificates ). Jobs fail to submit: Verify SSL configuration on the client \u00b6 The CE client tools on the client machine must be configured to recognize the Certificate Authority (CA) that issued the CE's host certificate. If the client tools don't trust your CE's host certificate's CA, then the output of condor_ce_trace -d will include something like the following: 10/07/21 16:39:10 (D_SECURITY) -Error with certificate at depth: 0 10/07/21 16:39:10 (D_SECURITY) issuer = /DC=org/DC=opensciencegrid/C=US/O=OSG Software/CN=OSG Test CA 10/07/21 16:39:10 (D_SECURITY) subject = /DC=org/DC=opensciencegrid/C=US/O=OSG Software/OU=Services/CN=4c75de0db10c.htcondor.org 10/07/21 16:39:10 (D_SECURITY) err 20:unable to get local issuer certificate 10/07/21 16:39:10 (D_SECURITY) Tried to connect: -1 10/07/21 16:39:10 (D_SECURITY) SSL: library failure: error:14090086:SSL routines:ssl3_get_server_certificate:certificate verify failed If your CE is using a grid certificate (i.e. one installed under /etc/grid-security/ ), then the client machine will need an /etc/grid-security/certificates/ directory containing the CA files for your grid certificate, and the CE client tools must be configured to look there for the CA files. The CE configuration files on the client machine will need to include the following: AUTH_SSL_CLIENT_CADIR = /etc/grid-security/certificates Jobs fail to submit: Verify SciToken contents \u00b6 If SciTokens is the authentication method being used, you can examine the token's payload for some common errors. If you have access to the token itself, you can decode it at jwt.io . The token's payload will appear in /var/log/condor-ce/AuditLog* files, like so: 10/05/21 18:34:06 (D_AUDIT) Examining SciToken with payload {}. The token's payload will look something like this: { \"aud\": \"ANY\", \"ver\": \"scitokens:2.0\", \"scope\": \"condor:/READ condor:/WRITE\", \"exp\": 1633488473, \"sub\": \"htcondor-ce-dev\", \"iss\": \"https://demo.scitokens.org\", \"iat\": 1633459675, \"nbf\": 1633459675, \"jti\": \"cb84b7af-ed21-450d-a50e-552a5cd2904c\" } Next actions If any of the following checks fail, the user will need a new, corrected, token. Check that the aud (audience) value is either ANY , https://wlcg.cern.ch/jwt/v1/any , or matches one of the items from condor_ce_config_val SCITOKENS_SERVER_AUDIENCE (i.e. :9619 ). Tokens with an invalid aud value will appear in /var/log/condor-ce/SchedLog with the following errors if D_SECURITY is enabled in SCHEDD_DEBUG : 10/07/21 15:55:39 (D_SECURITY) SCITOKENS:2:Failed to verify token and generate ACLs: token verification failed: 'aud' claim verification failed. Check that the scope value includes the string condor:/READ condor:/WRITE or compute.cancel compute.create compute.modify compute.read . Tokens with an invalid scope value will appear in /var/log/condor-ce/SchedLog with the following errors: 10/05/21 18:41:50 (D_ALWAYS) DC_AUTHENTICATE: authentication of <172.17.0.3:40489> was successful but resulted in a limited authorization which did not include this command (60021 DC_NOP_WRITE), so aborting. Check that the exp (expiration) value is in the future. Tokens that have expired will appear in /var/log/condor-ce/SchedLog with the following errors if D_SECURITY is enabled in SCHEDD_DEBUG : 10/05/21 18:10:55 (D_SECURITY) SCITOKENS:2:Failed to deserialize scitoken: token verification failed: token expired Check that the nbf (not before) value is in the past. Jobs fail to submit: Check user mapping \u00b6 The CE must be able to map the identity of the job submitter to a local OS account, used for storing the job sandbox and running the job under the local batch system. This mapping is done via a set of mapfiles . If no mapping is available, then job submission will fail. If a SciToken can't be mapped and the D_SECURITY debug level is enabled, then you will see this in the SchedLog file: 10/05/21 18:56:04 (D_SECURITY) Failed to map SCITOKENS authenticated identity 'https://demo.scitokens.org,htcondor-ce-dev', failing authentication to give another authentication method a go. Next actions Check the files in /etc/condor-ce/mapfiles.d/ and ensure that the user's authentication method and identity are present (possibly via a regular expression), and that the mapped OS account exists on your CE and cluster. Jobs stay idle on the CE \u00b6 Check the following subsections in order, but note that jobs may take several minutes or longer to run if the CE is busy. Idle jobs on CE: Make sure the underlying batch system can run jobs \u00b6 HTCondor-CE delegates jobs to your batch system, which is then responsible for matching jobs to worker nodes. If you cannot manually submit jobs (e.g., condor_submit , qsub ) on the CE host to your batch system, then HTCondor-CE won't be able to either. Procedure Manually create and submit a simple job (e.g., one that runs sleep ) Check for errors in the submission itself Watch the job in the batch system queue (e.g., condor_q , qstat ) If the job does not run, check for errors on the batch system Next actions Consult troubleshooting documentation or support avenues for your batch system. Once you can run simple manual jobs on your batch system, try submitting to the HTCondor-CE again. Idle jobs on CE: Is the job router handling the incoming job? \u00b6 Jobs on the CE will be put on hold if they do not match any job routes after 30 minutes, but you can check a few things if you suspect that the jobs are not being matched. Check if the JobRouter sees a job before that by looking at the job router log and looking for the text src=\u2026claimed job . Next actions Use condor_ce_job_router_info to see why your idle job does not match any routes Idle jobs on CE: Verify correct operation between the CE and your local batch system \u00b6 For HTCondor batch systems \u00b6 HTCondor-CE submits jobs directly to an HTCondor batch system via the JobRouter, so any issues with the CE/local batch system interaction will appear in the JobRouterLog . Next actions Check the JobRouterLog for failures. Verify that the local HTCondor is functional. Use condor_ce_config_val to verify that the JOB_ROUTER_SCHEDD2_NAME , JOB_ROUTER_SCHEDD2_POOL , and JOB_ROUTER_SCHEDD2_SPOOL configuration variables are set to the hostname of your CE, the hostname and port of your local HTCondor\u2019s collector, and the location of your local HTCondor\u2019s spool directory, respectively. Use condor_config_val QUEUE_SUPER_USER_MAY_IMPERSONATE and verify that it is set to .* . For non-HTCondor batch systems \u00b6 HTCondor-CE submits jobs to a non-HTCondor batch system via the Gridmanager, so any issues with the CE/local batch system interaction will appear in the GridmanagerLog . Look for gm state change\u2026 lines to figure out where the issues are occurring. Next actions If you see failures in the GridmanagerLog during job submission: Save the submit files by adding the appropriate entry to blah.config and submit it manually to the batch system. If that succeeds, make sure that the BLAHP knows where your binaries are located by setting the _binpath in /etc/blah.config . If you see failures in the GridmanagerLog during queries for job status: Query the resultant job with your batch system tools from the CE. If you can, the BLAHP uses scripts to query for status in /usr/libexec/blahp/_status.sh (e.g., /usr/libexec/blahp/lsf_status.sh ) that take the argument batch system/YYYMMDD/job ID (e.g., lsf/20141008/65053 ). Run the appropriate status script for your batch system and upon success, you should see the following output: root@host # /usr/libexec/blahp/lsf_status.sh lsf/20141008/65053 [ BatchjobId = \"894862\"; JobStatus = 4; ExitCode = 0; WorkerNode = \"atl-prod08\" ] If the script fails, request help from the OSG. Idle jobs on CE: Verify ability to change permissions on key files \u00b6 HTCondor-CE needs the ability to write and chown files in its spool directory and if it cannot, jobs will not run at all. Spool permission errors can appear in the SchedLog and the JobRouterLog . Symptoms 09/17/14 14:45:42 Error: Unable to chown '/var/lib/condor-ce/spool/1/0/cluster1.proc0.subproc0/env' from 12345 to 54321 Next actions As root, try to change ownership of the file or directory in question. If the file does not exist, a parent directory may have improper permissions. Verify that there aren't any underlying file system issues in the specified location Jobs stay idle on a remote host submitting to the CE \u00b6 If you are submitting your job from a separate submit host to the CE, it stays idle in the queue forever, and you do not see a resultant job in the CE's queue, this means that your job cannot contact the CE for submission or it is not authorized to run there. Note that jobs may take several minutes or longer if the CE is busy. Remote idle jobs: Can you contact the CE? \u00b6 To check basic connectivity to a CE, use condor_ce_ping : Symptoms user@host $ condor_ping -verbose -name condorce.example.com -pool condorce.example.com:9619 WRITE ERROR: couldn't locate condorce.example.com! Next actions Make sure that the HTCondor-CE daemons are running with condor_ce_status . Verify that your CE is reachable from your submit host, replacing condorce.example.com with the hostname of your CE: user@host $ ping condorce.example.com Remote idle jobs: Are you authorized to run jobs on the CE? \u00b6 The CE will only run jobs from users that authenticate through the HTCondor-CE configuration . You can use condor_ce_ping to check if you are authorized and what user your proxy is being mapped to. Symptoms user@host $ condor_ping -verbose -name condorce.example.com -pool condorce.example.com:9619 WRITE Remote Version: $CondorVersion: 8.0.7 Sep 24 2014 $ Local Version: $CondorVersion: 8.0.7 Sep 24 2014 $ Session ID: condorce:3343:1412790611:0 Instruction: WRITE Command: 60021 Encryption: none Integrity: MD5 Authenticated using: GSI All authentication methods: GSI Remote Mapping: gsi@unmapped Authorized: FALSE Notice the failures in the above message: Remote Mapping: gsi@unmapped and Authorized: FALSE Next actions Verify that an authentication method is set up on the CE Verify that your user DN is mapped to an existing system user Jobs go on hold \u00b6 Jobs can be put on hold with a HoldReason attribute that can be inspected with condor_ce_q : user@host $ condor_ce_q -l -attr HoldReason HoldReason = \"CE job in status 5 put on hold by SYSTEM_PERIODIC_HOLD due to no matching routes, route job limit, or route failure threshold.\" The CE (and CE client) will put a job on hold when it encounters a problem with the job that it doesn't know how to resolve. If the HTCondor schedd believes that the existing job it has submitted to a remote queue may be recoverable, then it will leave the remote job queued and keep the GridJobId attribute defined in the local job ad. If you release the local job (with condor_ce_release ), then the schedd will attempt to re-establish contact with the remote scheduler. If the schedd believes the existing remote job is not recoverable, then it willremove the job from the remote queue and set GridJobId to Undefined in the local job ad. If you release the local job, then a new job instance will be submitted to the remote scheduler. Held jobs: no matching routes, route job limit, or route failure threshold \u00b6 Jobs on the CE will be put on hold if they are not claimed by the job router within 30 minutes. The most common cases for this behavior are as follows: The job does not match any job routes: use condor_ce_job_router_info to see why your idle job does not match any routes . The route(s) that the job matches to are full: See limiting the number of jobs . The job router is throttling submission to your batch system due to submission failures: See the HTCondor manual for FailureRateThreshold . Check for errors in the JobRouterLog or GridmanagerLog for HTCondor and non-HTCondor batch systems, respectively. Note It is expected that jobs from remote submitters will temporarily be held with Spooling input data files as the reason. Once the input files have transferred the job will continue. Held jobs: Missing/expired user proxy \u00b6 HTCondor-CE requires a valid user proxy for each job that is submitted. You can check the status of your proxy with the following user@host $ voms-proxy-info -all Next actions Ensure that the owner of the job generates their proxy with voms-proxy-init . Held jobs: Invalid job universe \u00b6 The HTCondor-CE only accepts jobs that have universe in their submit files set to vanilla , local , or scheduler . These universes also have corresponding integer values that can be found in the HTCondor manual . Next actions Ensure jobs submitted locally, from the CE host, are submitted with universe = vanilla Ensure jobs submitted from a remote submit point are submitted with: universe = grid grid_resource = condor condorce.example.com condorce.example.com:9619 replacing condorce.example.com with the hostname of the CE. Identifying the corresponding job ID on the local batch system \u00b6 When troubleshooting interactions between your CE and your local batch system, you will need to associate the CE job ID and the resultant job ID on the batch system. The methods for finding the resultant job ID differs between batch systems. HTCondor batch systems \u00b6 To inspect the CE\u2019s job ad, use condor_ce_q or condor_ce_history : Use condor_ce_q if the job is still in the CE\u2019s queue: user@host $ condor_ce_q -af RoutedToJobId Use condor_ce_history if the job has left the CE\u2019s queue: user@host $ condor_ce_history -af RoutedToJobId Parse the JobRouterLog for the CE\u2019s job ID. Non-HTCondor batch systems \u00b6 When HTCondor-CE records the corresponding batch system job ID, it is written in the form // : lsf/20141206/482046 To inspect the CE\u2019s job ad, use condor_ce_q : user@host $ condor_ce_q -af GridJobId Parse the GridmanagerLog for the CE\u2019s job ID. Jobs removed from the local HTCondor pool become resubmitted (HTCondor batch systems only) \u00b6 By design, HTCondor-CE will resubmit jobs that have been removed from the underlying HTCondor pool. Therefore, to remove misbehaving jobs, they will need to be removed on the CE level following these steps: Identify the misbehaving job ID in your batch system queue Find the job's corresponding CE job ID: user@host $ condor_q -af RoutedFromJobId Use condor_ce_rm to remove the CE job from the queue Missing HTCondor tools \u00b6 Most of the HTCondor-CE tools are just wrappers around existing HTCondor tools that load the CE-specific configuration. If you are trying to use HTCondor-CE tools and you see the following error: user@host $ condor_ce_job_router_info /usr/bin/condor_ce_job_router_info: line 6: exec: condor_job_router_info: not found This means that the condor_job_router_info (note this is not the CE version), is not in your PATH . Next Actions Either the condor RPM is missing or there are some other issues with it (try rpm --verify condor ). You have installed HTCondor in a non-standard location that is not in your PATH . The condor_job_router_info tool itself wasn't available until Condor-8.2.3-1.1 (available in osg-upcoming). Jobs removed from the local batch system \u00b6 When the CE removes a job from the local batch system, it may be due to a problem the CE encountered with managing the job or it may be at the behest of the submitter to the CE (which may be a remote HTCondor Access Point). Given a specific job ID in the CE logs, first find the job ad in CE queue with the condor_ce_q tool and check the value of the GridJobID attribute: user@host $ condor_ce_q -af GridJobId If the job is no longer in the queue, you will have to check the history using the condor_ce_history tool: user@host $ condor_ce_history -af GridJobId If the GridJobId is undefined , then the CE did the removal due to a problem interacting with the local batch system. Check the HoldReason and LastHoldReason attributes for why the CE removed the job. If GridJobID is not undefined , and is set to some value, then the submitter to the CE removed the job. If the submitter is a remote HTCondor Access Point, its daemons may have done the removal as part of putting its local job on hold. In that case, the HoldReason attribute in the remote job queue should indicate the source of the problem. Getting Help \u00b6 If you have any questions or issues about troubleshooting remote HTCondor-CEs, please contact us for assistance.","title":"Common Issues"},{"location":"v23/troubleshooting/common-issues/#common-issues","text":"","title":"Common Issues"},{"location":"v23/troubleshooting/common-issues/#known-issues","text":"","title":"Known Issues"},{"location":"v23/troubleshooting/common-issues/#submit_attrs-are-not-applied-to-jobs-on-the-local-htcondor","text":"If you are adding attributes to jobs submitted to your HTCondor pool with SUBMIT_ATTRS , these will not be applied to jobs that are entering your pool from the HTCondor-CE. To get around this, you will want to add the attributes to your job routes . If the CE is the only entry point for jobs into your pool, you can get rid of SUBMIT_ATTRS on your backend. Otherwise, you will have to maintain your list of attributes both in your list of routes and in your SUBMIT_ATTRS .","title":"SUBMIT_ATTRS are not applied to jobs on the local HTCondor"},{"location":"v23/troubleshooting/common-issues/#general-troubleshooting-items","text":"","title":"General Troubleshooting Items"},{"location":"v23/troubleshooting/common-issues/#making-sure-packages-are-up-to-date","text":"It is important to make sure that the HTCondor-CE and related RPMs are up-to-date. root@host # yum update \"htcondor-ce*\" blahp condor If you just want to see the packages to update, but do not want to perform the update now, answer N at the prompt.","title":"Making sure packages are up-to-date"},{"location":"v23/troubleshooting/common-issues/#verify-package-contents","text":"If the contents of your HTCondor-CE packages have been changed, the CE may cease to function properly. To verify the contents of your packages (ignoring changes to configuration files): user@host $ rpm -q --verify htcondor-ce htcondor-ce-client blahp | grep -v '/var/' | awk '$2 != \"c\" {print $0}' If the verification command returns output, this means that your packages have been changed. To fix this, you can reinstall the packages: user@host $ yum reinstall htcondor-ce htcondor-ce-client blahp Note The reinstall command may place original versions of configuration files alongside the versions that you have modified. If this is the case, the reinstall command will notify you that the original versions will have an .rpmnew suffix. Further inspection of these files may be required as to whether or not you need to merge them into your current configuration.","title":"Verify package contents"},{"location":"v23/troubleshooting/common-issues/#verify-clocks-are-synchronized","text":"Like all network-based authentication, HTCondor-CE is sensitive to time skews. Make sure the clock on your CE is synchronized using a utility such as ntpd . Additionally, HTCondor itself is sensitive to time skews on the NFS server. If you see empty stdout / err being returned to the submitter, verify there is no NFS server time skew.","title":"Verify clocks are synchronized"},{"location":"v23/troubleshooting/common-issues/#htcondor-ce-troubleshooting-items","text":"This section contains common issues you may encounter using HTCondor-CE and next actions to take when you do. Before troubleshooting, we recommend increasing the log level: Write the following into /etc/condor-ce/config.d/99-local.conf to increase the log level for all daemons: ALL_DEBUG = D_ALWAYS:2 D_CAT Ensure that the configuration is in place: root@host # condor_ce_reconfig Reproduce the issue Note Before spending any time on troubleshooting, you should ensure that the state of configuration is as expected by running condor_ce_reconfig .","title":"HTCondor-CE Troubleshooting Items"},{"location":"v23/troubleshooting/common-issues/#daemons-fail-to-start","text":"If there are errors in your configuration of HTCondor-CE, this may cause some of its required daemons to fail to startup. Check the following subsections in order: Symptoms Daemon startup failure may manifest in many ways, the following are few symptoms of the problem. The service fails to start: root@host # service condor-ce start Starting Condor-CE daemons: [ FAIL ] condor_ce_q fails with a lengthy error message: user@host $ condor_ce_q Error: Extra Info: You probably saw this error because the condor_schedd is not running on the machine you are trying to query. If the condor_schedd is not running, the Condor system will not be able to find an address and port to connect to and satisfy this request. Please make sure the Condor daemons are running and try again. Extra Info: If the condor_schedd is running on the machine you are trying to query and you still see the error, the most likely cause is that you have setup a personal Condor, you have not defined SCHEDD_NAME in your condor_config file, and something is wrong with your SCHEDD_ADDRESS_FILE setting. You must define either or both of those settings in your config file, or you must use the -name option to condor_q. Please see the Condor manual for details on SCHEDD_NAME and SCHEDD_ADDRESS_FILE. Next actions If the MasterLog is filled with ERROR:SECMAN...TCP connection to collector...failed : This is likely due to a misconfiguration for a host with multiple network interfaces. Verify that you have followed the instructions in this section of the optional configuration page. If the MasterLog is filled with DC_AUTHENTICATE errors: The HTCondor-CE daemons use the host certificate to authenticate with each other. Verify that your host certificate\u2019s DN matches one of the regular expressions found in /etc/condor-ce/condor_mapfile . If the SchedLog is filled with Can\u2019t find address for negotiator : You can ignore this error! The negotiator daemon is used in HTCondor batch systems to match jobs with resources but since HTCondor-CE does not manage any resources directly, it does not run one.","title":"Daemons fail to start"},{"location":"v23/troubleshooting/common-issues/#jobs-fail-to-submit-to-the-ce","text":"If a user is having issues submitting jobs to the CE and you've ruled out general connectivity or firewalls as the culprit, then you may have encountered an authentication or authorization issue. You may see error messages like the following in your SchedLog : 08/30/16 16:52:56 DC_AUTHENTICATE: required authentication of 72.33.0.189 failed: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using GSI|GSI:5002:Failed to authenticate because the remote (client) side was not able to acquire its credentials.|AUTHENTICATE:1004:Failed to authenticate using FS|FS:1004:Unable to lstat(/tmp/FS_XXXZpUlYa) 08/30/16 16:53:12 PERMISSION DENIED to gsi@unmapped from host 72.33.0.189 for command 60021 (DC_NOP_WRITE), access level WRITE: reason: WRITE authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 72.33.0.189,dyn-72-33-0-189.uwnet.wisc.edu, hostname size = 1, original ip address = 72.33.0.189 08/30/16 16:53:12 DC_AUTHENTICATE: Command not authorized, done! The detailed debug output of condor_ce_ping -d can provide useful data from the client side. The following are several potential causes and how to check and correct them.","title":"Jobs fail to submit to the CE"},{"location":"v23/troubleshooting/common-issues/#jobs-fail-to-submit-verify-ssl-configuration-on-the-ce","text":"Your machine must have a valid host certificate and private key, and the CE must be configured to use them. See the documentation about Configuring Certificates for details. If the CE can't read its host certificate and private key, you will see an error like the following in /var/log/condor-ce/SchedLog if D_SECURITY is enabled in SCHEDD_DEBUG 10/07/21 17:52:01 (D_SECURITY) SSL Auth: Error loading private key from file 10/07/21 17:52:01 (D_SECURITY) SSL Auth: Error initializing server security context 10/07/21 17:52:01 (D_SECURITY) SSL Auth: Error creating SSL context Next actions If your host certificate is installed under /etc/grid-security/ , ensure the CE is configured look for it there (see configuring certificates ).","title":"Jobs fail to submit: Verify SSL configuration on the CE"},{"location":"v23/troubleshooting/common-issues/#jobs-fail-to-submit-verify-ssl-configuration-on-the-client","text":"The CE client tools on the client machine must be configured to recognize the Certificate Authority (CA) that issued the CE's host certificate. If the client tools don't trust your CE's host certificate's CA, then the output of condor_ce_trace -d will include something like the following: 10/07/21 16:39:10 (D_SECURITY) -Error with certificate at depth: 0 10/07/21 16:39:10 (D_SECURITY) issuer = /DC=org/DC=opensciencegrid/C=US/O=OSG Software/CN=OSG Test CA 10/07/21 16:39:10 (D_SECURITY) subject = /DC=org/DC=opensciencegrid/C=US/O=OSG Software/OU=Services/CN=4c75de0db10c.htcondor.org 10/07/21 16:39:10 (D_SECURITY) err 20:unable to get local issuer certificate 10/07/21 16:39:10 (D_SECURITY) Tried to connect: -1 10/07/21 16:39:10 (D_SECURITY) SSL: library failure: error:14090086:SSL routines:ssl3_get_server_certificate:certificate verify failed If your CE is using a grid certificate (i.e. one installed under /etc/grid-security/ ), then the client machine will need an /etc/grid-security/certificates/ directory containing the CA files for your grid certificate, and the CE client tools must be configured to look there for the CA files. The CE configuration files on the client machine will need to include the following: AUTH_SSL_CLIENT_CADIR = /etc/grid-security/certificates","title":"Jobs fail to submit: Verify SSL configuration on the client"},{"location":"v23/troubleshooting/common-issues/#jobs-fail-to-submit-verify-scitoken-contents","text":"If SciTokens is the authentication method being used, you can examine the token's payload for some common errors. If you have access to the token itself, you can decode it at jwt.io . The token's payload will appear in /var/log/condor-ce/AuditLog* files, like so: 10/05/21 18:34:06 (D_AUDIT) Examining SciToken with payload {}. The token's payload will look something like this: { \"aud\": \"ANY\", \"ver\": \"scitokens:2.0\", \"scope\": \"condor:/READ condor:/WRITE\", \"exp\": 1633488473, \"sub\": \"htcondor-ce-dev\", \"iss\": \"https://demo.scitokens.org\", \"iat\": 1633459675, \"nbf\": 1633459675, \"jti\": \"cb84b7af-ed21-450d-a50e-552a5cd2904c\" } Next actions If any of the following checks fail, the user will need a new, corrected, token. Check that the aud (audience) value is either ANY , https://wlcg.cern.ch/jwt/v1/any , or matches one of the items from condor_ce_config_val SCITOKENS_SERVER_AUDIENCE (i.e. :9619 ). Tokens with an invalid aud value will appear in /var/log/condor-ce/SchedLog with the following errors if D_SECURITY is enabled in SCHEDD_DEBUG : 10/07/21 15:55:39 (D_SECURITY) SCITOKENS:2:Failed to verify token and generate ACLs: token verification failed: 'aud' claim verification failed. Check that the scope value includes the string condor:/READ condor:/WRITE or compute.cancel compute.create compute.modify compute.read . Tokens with an invalid scope value will appear in /var/log/condor-ce/SchedLog with the following errors: 10/05/21 18:41:50 (D_ALWAYS) DC_AUTHENTICATE: authentication of <172.17.0.3:40489> was successful but resulted in a limited authorization which did not include this command (60021 DC_NOP_WRITE), so aborting. Check that the exp (expiration) value is in the future. Tokens that have expired will appear in /var/log/condor-ce/SchedLog with the following errors if D_SECURITY is enabled in SCHEDD_DEBUG : 10/05/21 18:10:55 (D_SECURITY) SCITOKENS:2:Failed to deserialize scitoken: token verification failed: token expired Check that the nbf (not before) value is in the past.","title":"Jobs fail to submit: Verify SciToken contents"},{"location":"v23/troubleshooting/common-issues/#jobs-fail-to-submit-check-user-mapping","text":"The CE must be able to map the identity of the job submitter to a local OS account, used for storing the job sandbox and running the job under the local batch system. This mapping is done via a set of mapfiles . If no mapping is available, then job submission will fail. If a SciToken can't be mapped and the D_SECURITY debug level is enabled, then you will see this in the SchedLog file: 10/05/21 18:56:04 (D_SECURITY) Failed to map SCITOKENS authenticated identity 'https://demo.scitokens.org,htcondor-ce-dev', failing authentication to give another authentication method a go. Next actions Check the files in /etc/condor-ce/mapfiles.d/ and ensure that the user's authentication method and identity are present (possibly via a regular expression), and that the mapped OS account exists on your CE and cluster.","title":"Jobs fail to submit: Check user mapping"},{"location":"v23/troubleshooting/common-issues/#jobs-stay-idle-on-the-ce","text":"Check the following subsections in order, but note that jobs may take several minutes or longer to run if the CE is busy.","title":"Jobs stay idle on the CE"},{"location":"v23/troubleshooting/common-issues/#idle-jobs-on-ce-make-sure-the-underlying-batch-system-can-run-jobs","text":"HTCondor-CE delegates jobs to your batch system, which is then responsible for matching jobs to worker nodes. If you cannot manually submit jobs (e.g., condor_submit , qsub ) on the CE host to your batch system, then HTCondor-CE won't be able to either. Procedure Manually create and submit a simple job (e.g., one that runs sleep ) Check for errors in the submission itself Watch the job in the batch system queue (e.g., condor_q , qstat ) If the job does not run, check for errors on the batch system Next actions Consult troubleshooting documentation or support avenues for your batch system. Once you can run simple manual jobs on your batch system, try submitting to the HTCondor-CE again.","title":"Idle jobs on CE: Make sure the underlying batch system can run jobs"},{"location":"v23/troubleshooting/common-issues/#idle-jobs-on-ce-is-the-job-router-handling-the-incoming-job","text":"Jobs on the CE will be put on hold if they do not match any job routes after 30 minutes, but you can check a few things if you suspect that the jobs are not being matched. Check if the JobRouter sees a job before that by looking at the job router log and looking for the text src=\u2026claimed job . Next actions Use condor_ce_job_router_info to see why your idle job does not match any routes","title":"Idle jobs on CE: Is the job router handling the incoming job?"},{"location":"v23/troubleshooting/common-issues/#idle-jobs-on-ce-verify-correct-operation-between-the-ce-and-your-local-batch-system","text":"","title":"Idle jobs on CE: Verify correct operation between the CE and your local batch system"},{"location":"v23/troubleshooting/common-issues/#for-htcondor-batch-systems","text":"HTCondor-CE submits jobs directly to an HTCondor batch system via the JobRouter, so any issues with the CE/local batch system interaction will appear in the JobRouterLog . Next actions Check the JobRouterLog for failures. Verify that the local HTCondor is functional. Use condor_ce_config_val to verify that the JOB_ROUTER_SCHEDD2_NAME , JOB_ROUTER_SCHEDD2_POOL , and JOB_ROUTER_SCHEDD2_SPOOL configuration variables are set to the hostname of your CE, the hostname and port of your local HTCondor\u2019s collector, and the location of your local HTCondor\u2019s spool directory, respectively. Use condor_config_val QUEUE_SUPER_USER_MAY_IMPERSONATE and verify that it is set to .* .","title":"For HTCondor batch systems"},{"location":"v23/troubleshooting/common-issues/#for-non-htcondor-batch-systems","text":"HTCondor-CE submits jobs to a non-HTCondor batch system via the Gridmanager, so any issues with the CE/local batch system interaction will appear in the GridmanagerLog . Look for gm state change\u2026 lines to figure out where the issues are occurring. Next actions If you see failures in the GridmanagerLog during job submission: Save the submit files by adding the appropriate entry to blah.config and submit it manually to the batch system. If that succeeds, make sure that the BLAHP knows where your binaries are located by setting the _binpath in /etc/blah.config . If you see failures in the GridmanagerLog during queries for job status: Query the resultant job with your batch system tools from the CE. If you can, the BLAHP uses scripts to query for status in /usr/libexec/blahp/_status.sh (e.g., /usr/libexec/blahp/lsf_status.sh ) that take the argument batch system/YYYMMDD/job ID (e.g., lsf/20141008/65053 ). Run the appropriate status script for your batch system and upon success, you should see the following output: root@host # /usr/libexec/blahp/lsf_status.sh lsf/20141008/65053 [ BatchjobId = \"894862\"; JobStatus = 4; ExitCode = 0; WorkerNode = \"atl-prod08\" ] If the script fails, request help from the OSG.","title":"For non-HTCondor batch systems"},{"location":"v23/troubleshooting/common-issues/#idle-jobs-on-ce-verify-ability-to-change-permissions-on-key-files","text":"HTCondor-CE needs the ability to write and chown files in its spool directory and if it cannot, jobs will not run at all. Spool permission errors can appear in the SchedLog and the JobRouterLog . Symptoms 09/17/14 14:45:42 Error: Unable to chown '/var/lib/condor-ce/spool/1/0/cluster1.proc0.subproc0/env' from 12345 to 54321 Next actions As root, try to change ownership of the file or directory in question. If the file does not exist, a parent directory may have improper permissions. Verify that there aren't any underlying file system issues in the specified location","title":"Idle jobs on CE: Verify ability to change permissions on key files"},{"location":"v23/troubleshooting/common-issues/#jobs-stay-idle-on-a-remote-host-submitting-to-the-ce","text":"If you are submitting your job from a separate submit host to the CE, it stays idle in the queue forever, and you do not see a resultant job in the CE's queue, this means that your job cannot contact the CE for submission or it is not authorized to run there. Note that jobs may take several minutes or longer if the CE is busy.","title":"Jobs stay idle on a remote host submitting to the CE"},{"location":"v23/troubleshooting/common-issues/#remote-idle-jobs-can-you-contact-the-ce","text":"To check basic connectivity to a CE, use condor_ce_ping : Symptoms user@host $ condor_ping -verbose -name condorce.example.com -pool condorce.example.com:9619 WRITE ERROR: couldn't locate condorce.example.com! Next actions Make sure that the HTCondor-CE daemons are running with condor_ce_status . Verify that your CE is reachable from your submit host, replacing condorce.example.com with the hostname of your CE: user@host $ ping condorce.example.com","title":"Remote idle jobs: Can you contact the CE?"},{"location":"v23/troubleshooting/common-issues/#remote-idle-jobs-are-you-authorized-to-run-jobs-on-the-ce","text":"The CE will only run jobs from users that authenticate through the HTCondor-CE configuration . You can use condor_ce_ping to check if you are authorized and what user your proxy is being mapped to. Symptoms user@host $ condor_ping -verbose -name condorce.example.com -pool condorce.example.com:9619 WRITE Remote Version: $CondorVersion: 8.0.7 Sep 24 2014 $ Local Version: $CondorVersion: 8.0.7 Sep 24 2014 $ Session ID: condorce:3343:1412790611:0 Instruction: WRITE Command: 60021 Encryption: none Integrity: MD5 Authenticated using: GSI All authentication methods: GSI Remote Mapping: gsi@unmapped Authorized: FALSE Notice the failures in the above message: Remote Mapping: gsi@unmapped and Authorized: FALSE Next actions Verify that an authentication method is set up on the CE Verify that your user DN is mapped to an existing system user","title":"Remote idle jobs: Are you authorized to run jobs on the CE?"},{"location":"v23/troubleshooting/common-issues/#jobs-go-on-hold","text":"Jobs can be put on hold with a HoldReason attribute that can be inspected with condor_ce_q : user@host $ condor_ce_q -l -attr HoldReason HoldReason = \"CE job in status 5 put on hold by SYSTEM_PERIODIC_HOLD due to no matching routes, route job limit, or route failure threshold.\" The CE (and CE client) will put a job on hold when it encounters a problem with the job that it doesn't know how to resolve. If the HTCondor schedd believes that the existing job it has submitted to a remote queue may be recoverable, then it will leave the remote job queued and keep the GridJobId attribute defined in the local job ad. If you release the local job (with condor_ce_release ), then the schedd will attempt to re-establish contact with the remote scheduler. If the schedd believes the existing remote job is not recoverable, then it willremove the job from the remote queue and set GridJobId to Undefined in the local job ad. If you release the local job, then a new job instance will be submitted to the remote scheduler.","title":"Jobs go on hold"},{"location":"v23/troubleshooting/common-issues/#held-jobs-no-matching-routes-route-job-limit-or-route-failure-threshold","text":"Jobs on the CE will be put on hold if they are not claimed by the job router within 30 minutes. The most common cases for this behavior are as follows: The job does not match any job routes: use condor_ce_job_router_info to see why your idle job does not match any routes . The route(s) that the job matches to are full: See limiting the number of jobs . The job router is throttling submission to your batch system due to submission failures: See the HTCondor manual for FailureRateThreshold . Check for errors in the JobRouterLog or GridmanagerLog for HTCondor and non-HTCondor batch systems, respectively. Note It is expected that jobs from remote submitters will temporarily be held with Spooling input data files as the reason. Once the input files have transferred the job will continue.","title":"Held jobs: no matching routes, route job limit, or route failure threshold"},{"location":"v23/troubleshooting/common-issues/#held-jobs-missingexpired-user-proxy","text":"HTCondor-CE requires a valid user proxy for each job that is submitted. You can check the status of your proxy with the following user@host $ voms-proxy-info -all Next actions Ensure that the owner of the job generates their proxy with voms-proxy-init .","title":"Held jobs: Missing/expired user proxy"},{"location":"v23/troubleshooting/common-issues/#held-jobs-invalid-job-universe","text":"The HTCondor-CE only accepts jobs that have universe in their submit files set to vanilla , local , or scheduler . These universes also have corresponding integer values that can be found in the HTCondor manual . Next actions Ensure jobs submitted locally, from the CE host, are submitted with universe = vanilla Ensure jobs submitted from a remote submit point are submitted with: universe = grid grid_resource = condor condorce.example.com condorce.example.com:9619 replacing condorce.example.com with the hostname of the CE.","title":"Held jobs: Invalid job universe"},{"location":"v23/troubleshooting/common-issues/#identifying-the-corresponding-job-id-on-the-local-batch-system","text":"When troubleshooting interactions between your CE and your local batch system, you will need to associate the CE job ID and the resultant job ID on the batch system. The methods for finding the resultant job ID differs between batch systems.","title":"Identifying the corresponding job ID on the local batch system"},{"location":"v23/troubleshooting/common-issues/#htcondor-batch-systems","text":"To inspect the CE\u2019s job ad, use condor_ce_q or condor_ce_history : Use condor_ce_q if the job is still in the CE\u2019s queue: user@host $ condor_ce_q -af RoutedToJobId Use condor_ce_history if the job has left the CE\u2019s queue: user@host $ condor_ce_history -af RoutedToJobId Parse the JobRouterLog for the CE\u2019s job ID.","title":"HTCondor batch systems"},{"location":"v23/troubleshooting/common-issues/#non-htcondor-batch-systems","text":"When HTCondor-CE records the corresponding batch system job ID, it is written in the form // : lsf/20141206/482046 To inspect the CE\u2019s job ad, use condor_ce_q : user@host $ condor_ce_q -af GridJobId Parse the GridmanagerLog for the CE\u2019s job ID.","title":"Non-HTCondor batch systems"},{"location":"v23/troubleshooting/common-issues/#jobs-removed-from-the-local-htcondor-pool-become-resubmitted-htcondor-batch-systems-only","text":"By design, HTCondor-CE will resubmit jobs that have been removed from the underlying HTCondor pool. Therefore, to remove misbehaving jobs, they will need to be removed on the CE level following these steps: Identify the misbehaving job ID in your batch system queue Find the job's corresponding CE job ID: user@host $ condor_q -af RoutedFromJobId Use condor_ce_rm to remove the CE job from the queue","title":"Jobs removed from the local HTCondor pool become resubmitted (HTCondor batch systems only)"},{"location":"v23/troubleshooting/common-issues/#missing-htcondor-tools","text":"Most of the HTCondor-CE tools are just wrappers around existing HTCondor tools that load the CE-specific configuration. If you are trying to use HTCondor-CE tools and you see the following error: user@host $ condor_ce_job_router_info /usr/bin/condor_ce_job_router_info: line 6: exec: condor_job_router_info: not found This means that the condor_job_router_info (note this is not the CE version), is not in your PATH . Next Actions Either the condor RPM is missing or there are some other issues with it (try rpm --verify condor ). You have installed HTCondor in a non-standard location that is not in your PATH . The condor_job_router_info tool itself wasn't available until Condor-8.2.3-1.1 (available in osg-upcoming).","title":"Missing HTCondor tools"},{"location":"v23/troubleshooting/common-issues/#jobs-removed-from-the-local-batch-system","text":"When the CE removes a job from the local batch system, it may be due to a problem the CE encountered with managing the job or it may be at the behest of the submitter to the CE (which may be a remote HTCondor Access Point). Given a specific job ID in the CE logs, first find the job ad in CE queue with the condor_ce_q tool and check the value of the GridJobID attribute: user@host $ condor_ce_q -af GridJobId If the job is no longer in the queue, you will have to check the history using the condor_ce_history tool: user@host $ condor_ce_history -af GridJobId If the GridJobId is undefined , then the CE did the removal due to a problem interacting with the local batch system. Check the HoldReason and LastHoldReason attributes for why the CE removed the job. If GridJobID is not undefined , and is set to some value, then the submitter to the CE removed the job. If the submitter is a remote HTCondor Access Point, its daemons may have done the removal as part of putting its local job on hold. In that case, the HoldReason attribute in the remote job queue should indicate the source of the problem.","title":"Jobs removed from the local batch system"},{"location":"v23/troubleshooting/common-issues/#getting-help","text":"If you have any questions or issues about troubleshooting remote HTCondor-CEs, please contact us for assistance.","title":"Getting Help"},{"location":"v23/troubleshooting/debugging-tools/","text":"Debugging Tools \u00b6 HTCondor-CE has its own separate set of of the HTCondor tools with ce in the name (i.e., condor_ce_submit vs condor_submit ). Some of the the commands are only for the CE (e.g., condor_ce_run and condor_ce_trace ) but many of them are just HTCondor commands configured to interact with the CE (e.g., condor_ce_q , condor_ce_status ). It is important to differentiate the two: condor_ce_config_val will provide configuration values for your HTCondor-CE while condor_config_val will provide configuration values for your HTCondor batch system. If you are not running an HTCondor batch system, the non-CE commands will return errors. condor_ce_trace \u00b6 Usage \u00b6 condor_ce_trace is a useful tool for testing end-to-end job submission. It contacts both the CE\u2019s Schedd and Collector daemons to verify your permission to submit to the CE, displays the submit script that it submits to the CE, and tracks the resultant job. Note You must have generated a proxy (e.g., voms-proxy-init ) and your DN must be added to your chosen authentication method . user@host $ condor_ce_trace condorce.example.com Replacing the condorce.example.com with the hostname of the CE. If you are familiar with the output of condor commands, the command also takes a --debug option that displays verbose condor output. Troubleshooting \u00b6 If the command fails with \u201cFailed ping\u2026\u201d: Make sure that the HTCondor-CE daemons are running on the CE If you see \u201cgsi@unmapped\u201d in the \u201cRemote Mapping\u201d line: Either your credentials are not mapped on the CE or authentication is not set up at all. To set up authentication, refer to our configuration document . If the job submits but does not complete: Look at the status of the job and perform the relevant troubleshooting steps . condor_ce_host_network_check \u00b6 Usage \u00b6 condor_ce_host_network_check is a tool for testing an HTCondor-CE's networking configuration: root@host # condor_ce_host_network_check Starting analysis of host networking for HTCondor-CE System hostname: fermicloud360.fnal.gov FQDN matches hostname Forward resolution of hostname fermicloud360.fnal.gov is 131.225.155.96. Backward resolution of IPv4 131.225.155.96 is fermicloud360.fnal.gov. Forward and backward resolution match! HTCondor is considering all network interfaces and addresses. HTCondor would pick address of 131.225.155.96 as primary address. HTCondor primary address 131.225.155.96 matches system preferred address. Host network configuration should work with HTCondor-CE Troubleshooting \u00b6 If the tool reports that Host network configuration not expected to work with HTCondor-CE , ensure that forward and reverse DNS resolution return the public IP and hostname. condor_ce_run \u00b6 Usage \u00b6 condor_ce_run is a tool that submits a simple job to your CE, so it is useful for quickly submitting jobs through your CE. To submit a job to the CE and run the env command on the remote batch system: Note You must have generated a proxy (e.g., voms-proxy-init ) and your DN must be added to your chosen authentication method . user@host $ condor_ce_run -r condorce.example.com:9619 /bin/env Replacing the condorce.example.com with the hostname of the CE. If you are troubleshooting an HTCondor-CE that you do not have a login for and the CE accepts local universe jobs, you can run commands locally on the CE with condor_ce_run with the -l option. The following example outputs the JobRouterLog of the CE in question: user@host $ condor_ce_run -lr condorce.example.com:9619 cat /var/log/condor-ce/JobRouterLog Replacing the condorce.example.com text with the hostname of the CE. To disable this feature on your CE, consult this section of the install documentation. Troubleshooting \u00b6 If you do not see any results: condor_ce_run does not display results until the job completes on the CE, which may take several minutes or longer if the CE is busy. In the meantime, can use condor_ce_q in a separate terminal to track the job on the CE. If you never see any results, use condor_ce_trace to pinpoint errors. If you see an error message that begins with \u201cFailed to\u2026\u201d: Check connectivity to the CE with condor_ce_trace or condor_ce_ping condor_ce_submit \u00b6 See this documentation for details condor_ce_ping \u00b6 Usage \u00b6 Use the following condor_ce_ping command to test your ability to submit jobs to an HTCondor-CE, replacing condorce.example.com with the hostname of your CE: user@host $ condor_ce_ping -verbose -name condorce.example.com -pool condorce.example.com:9619 WRITE The following shows successful output where I am able to submit jobs ( Authorized: TRUE ) as the glow user ( Remote Mapping: glow@users.opensciencegrid.org ): Remote Version: $CondorVersion: 8.0.7 Sep 24 2014 $ Local Version: $CondorVersion: 8.0.7 Sep 24 2014 $ Session ID: condorce:27407:1412286981:3 Instruction: WRITE Command: 60021 Encryption: none Integrity: MD5 Authenticated using: GSI All authentication methods: GSI Remote Mapping: glow@users.opensciencegrid.org Authorized: TRUE Note If you run the condor_ce_ping command on the CE that you are testing, omit the -name and -pool options. condor_ce_ping takes the same arguments as condor_ping , which is documented in the HTCondor manual . Troubleshooting \u00b6 If you see \u201cERROR: couldn\u2019t locate (null)\u201d , that means the HTCondor-CE schedd (the daemon that schedules jobs) cannot be reached. To track down the issue, increase debugging levels on the CE: MASTER_DEBUG = D_ALWAYS:2 D_CAT SCHEDD_DEBUG = D_ALWAYS:2 D_CAT Then look in the MasterLog and SchedLog for any errors. If you see \u201cgsi@unmapped\u201d in the \u201cRemote Mapping\u201d line , this means that either your credentials are not mapped on the CE or that authentication is not set up at all. To set up authentication, refer to our configuration document . condor_ce_q \u00b6 Usage \u00b6 condor_ce_q can display job status or specific job attributes for jobs that are still in the CE\u2019s queue. To list jobs that are queued on a CE: user@host $ condor_ce_q -name condorce.example.com -pool condorce.example.com:9619 To inspect the full ClassAd for a specific job, specify the -l flag and the job ID: user@host $ condor_ce_q -name condorce.example.com -pool condorce.example.com:9619 -l Note If you run the condor_ce_q command on the CE that you are testing, omit the -name and -pool options. condor_ce_q takes the same arguments as condor_q , which is documented in the HTCondor manual . Troubleshooting \u00b6 If the jobs that you are submiting to a CE are not completing, condor_ce_q can tell you the status of your jobs. If the schedd is not running: You will see a lengthy message about being unable to contact the schedd. To track down the issue, increase the debugging levels on the CE with: MASTER_DEBUG = D_ALWAYS:2 D_CAT SCHEDD_DEBUG = D_ALWAYS:2 D_CAT To apply these changes, reconfigure HTCondor-CE: root@host # condor_ce_reconfig Then look in the MasterLog and SchedLog on the CE for any errors. If there are issues with contacting the collector: You will see the following message: user@host $ condor_ce_q -pool ce1.accre.vanderbilt.edu -name ce1.accre.vanderbilt.edu -- Failed to fetch ads from: <129.59.197.223:9620?sock`33630_8b33_4> : ce1.accre.vanderbilt.edu This may be due to network issues or bad HTCondor daemon permissions. To fix the latter issue, ensure that the ALLOW_READ configuration value is not set: user@host $ condor_ce_config_val -v ALLOW_READ Not defined: ALLOW_READ If it is defined, remove it from the file that is returned in the output. If a job is held: There should be an accompanying HoldReason that will tell you why it is being held. The HoldReason is in the job\u2019s ClassAd, so you can use the long form of condor_ce_q to extract its value: user@host $ condor_ce_q -name condorce.example.com -pool condorce.example.com:9619 -l | grep HoldReason If a job is idle: The most common cause is that it is not matching any routes in the CE\u2019s job router. To find out whether this is the case, use the condor_ce_job_router_info . condor_ce_history \u00b6 Usage \u00b6 condor_ce_history can display job status or specific job attributes for jobs that have that have left the CE\u2019s queue. To list jobs that have run on the CE: user@host $ condor_ce_history -name condorce.example.com -pool condorce.example.com:9619 To inspect the full ClassAd for a specific job, specify the -l flag and the job ID: user@host $ condor_ce_history -name condorce.example.com -pool condorce.example.com:9619 -l Note If you run the condor_ce_history command on the CE that you are testing, omit the -name and -pool options. condor_ce_history takes the same arguments as condor_history , which is documented in the HTCondor manual . condor_ce_job_router_info \u00b6 Usage \u00b6 Use the condor_ce_job_router_info command to help troubleshoot your routes and how jobs will match to them. To see all of your routes (the output is long because it combines your routes with the JOB_ROUTER_DEFAULTS configuration variable): root@host # condor_ce_job_router_info -config To see how the job router is handling a job that is currently in the CE\u2019s queue, analyze the output of condor_ce_q (replace the with the job ID that you are interested in): root@host # condor_ce_q -l | condor_ce_job_router_info -match-jobs -ignore-prior-routing -jobads - To inspect a job that has already left the queue, use condor_ce_history instead of condor_ce_q : root@host # condor_ce_history -l | condor_ce_job_router_info -match-jobs -ignore-prior-routing -jobads - Note If the proxy for the job has expired, the job will not match any routes. To work around this constraint: root@host # condor_ce_history -l | sed \"s/^\\(x509UserProxyExpiration\\) = .*/\\1 = `date +%s --date '+1 sec'`/\" | condor_ce_job_router_info -match-jobs -ignore-prior-routing -jobads - Alternatively, you can provide a file containing a job\u2019s ClassAd as the input and edit attributes within that file: root@host # condor_ce_job_router_info -match-jobs -ignore-prior-routing -jobads Troubleshooting \u00b6 If the job does not match any route: You can identify this case when you see 0 candidate jobs found in the condor_job_router_info output. This message means that, when compared to your job\u2019s ClassAd, the Umbrella constraint does not evaluate to true . When troubleshooting, look at all of the expressions prior to the target.ProcId >= 0 expression, because it and everything following it is logic that the job router added so that routed jobs do not get routed again. If your job matches more than one route: the tool will tell you by showing all matching routes after the job ID: Checking Job src=162,0 against all routes Route Matches: Local_PBS Route Matches: Condor_Test To troubleshoot why this is occuring, look at the combined Requirements expressions for all routes and compare it to the job\u2019s ClassAd provided. The combined Requirements expression is highlighted below: Umbrella constraint: ((target.x509userproxysubject =!= UNDEFINED) && (target.x509UserProxyExpiration =!= UNDEFINED) && (time() < target.x509UserProxyExpiration) && (target.JobUniverse =?= 5 || target.JobUniverse =?= 1)) && ( (target.osgTestPBS is true) || (true) ) && (target.ProcId >= 0 && target.JobStatus == 1 && (target.StageInStart is undefined || target.StageInFinish isnt undefined) && target.Managed isnt \"ScheddDone\" && target.Managed isnt \"Extenal\" && target.Owner isnt Undefined && target.RoutedBy isnt \"htcondor-ce\") Both routes evaluate to true for the job\u2019s ClassAd because it contained osgTestPBS = true . Make sure your routes are mutually exclusive, otherwise you may have jobs routed incorrectly! See the job route configuration page for more details. If it is unclear why jobs are matching a route: wrap the route's requirements expression in debug() and check the JobRouterLog for more information. condor_ce_router_q \u00b6 Usage \u00b6 If you have multiple job routes and many jobs, condor_ce_router_q is a useful tool to see how jobs are being routed and their statuses: user@host $ condor_ce_router_q condor_ce_router_q takes the same options as condor_router_q , which is documented in the HTCondor manual condor_ce_test_token \u00b6 Usage \u00b6 Use the condor_ce_test_token command to test SciTokens authentication in the CE. It will create a token with an issuer and subject that you specify and configure the CE daemons to accept that token as if it had been generated by the given issuer (for one hour). The token is printed to stdout; use it with condor_ce_submit to test that SciTokens authentication and user mapping operate correctly. To create a temporary SciToken that appears to be issued by the SciTokens demo issuer: root@host # condor_ce_token_test --issuer https://demo.scitokens.org --audience ANY --scope condor:/WRITE --subject alice@foo.edu Note You must run condor_ce_test_token on the CE that you are testing as the root user. condor_ce_test_token takes the same arguments as condor_test_token , which is documented in the HTCondor manual . condor_ce_status \u00b6 Usage \u00b6 To see the daemons running on a CE, run the following command: user@host $ condor_ce_status -any condor_ce_status takes the same arguments as condor_status , which is documented in the HTCondor manual . \"Missing\" Worker Nodes An HTCondor-CE will not show any worker nodes (e.g. Machine entries in the condor_ce_status -any output) if it does not have any running GlideinWMS pilot jobs. This is expected since HTCondor-CE only forwards incoming pilot jobs to your batch system and does not match jobs to worker nodes. Troubleshooting \u00b6 If the output of condor_ce_status -any does not show at least the following daemons: Collector Scheduler DaemonMaster Job_Router Increase the debug level and consult the HTCondor-CE logs for errors. condor_ce_config_val \u00b6 Usage \u00b6 To see the value of configuration variables and where they are set, use condor_ce_config_val . Primarily, This tool is used with the other troubleshooting tools to make sure your configuration is set properly. To see the value of a single variable and where it is set: user@host $ condor_ce_config_val -v To see a list of all configuration variables and their values: user@host $ condor_ce_config_val -dump To see a list of all the files that are used to create your configuration and the order that they are parsed, use the following command: user@host $ condor_ce_config_val -config condor_ce_config_val takes the same arguments as condor_config_val , which is documented in the HTCondor manual . condor_ce_reconfig \u00b6 Usage \u00b6 To ensure that your configuration changes have taken effect, run condor_ce_reconfig . user@host $ condor_ce_reconfig condor_ce_{on,off,restart} \u00b6 Usage \u00b6 To turn on/off/restart HTCondor-CE daemons, use the following commands: root@host # condor_ce_on root@host # condor_ce_off root@host # condor_ce_restart The HTCondor-CE service uses the previous commands with default values. Using these commands directly gives you more fine-grained control over the behavior of HTCondor-CE's on/off/restart: If you have installed a new version of HTCondor-CE and want to restart the CE under the new version, run the following command: root@host # condor_ce_restart -fast This will cause HTCondor-CE to restart and quickly reconnect to all running jobs. If you need to stop running new jobs, run the following: root@host # condor_ce_off -peaceful This will cause HTCondor-CE to accept new jobs without starting them and will wait for currently running jobs to complete before shutting down. Getting Help \u00b6 If you have any questions or issues about troubleshooting remote HTCondor-CEs, please contact us for assistance.","title":"Debugging Tools"},{"location":"v23/troubleshooting/debugging-tools/#debugging-tools","text":"HTCondor-CE has its own separate set of of the HTCondor tools with ce in the name (i.e., condor_ce_submit vs condor_submit ). Some of the the commands are only for the CE (e.g., condor_ce_run and condor_ce_trace ) but many of them are just HTCondor commands configured to interact with the CE (e.g., condor_ce_q , condor_ce_status ). It is important to differentiate the two: condor_ce_config_val will provide configuration values for your HTCondor-CE while condor_config_val will provide configuration values for your HTCondor batch system. If you are not running an HTCondor batch system, the non-CE commands will return errors.","title":"Debugging Tools"},{"location":"v23/troubleshooting/debugging-tools/#condor_ce_trace","text":"","title":"condor_ce_trace"},{"location":"v23/troubleshooting/debugging-tools/#usage","text":"condor_ce_trace is a useful tool for testing end-to-end job submission. It contacts both the CE\u2019s Schedd and Collector daemons to verify your permission to submit to the CE, displays the submit script that it submits to the CE, and tracks the resultant job. Note You must have generated a proxy (e.g., voms-proxy-init ) and your DN must be added to your chosen authentication method . user@host $ condor_ce_trace condorce.example.com Replacing the condorce.example.com with the hostname of the CE. If you are familiar with the output of condor commands, the command also takes a --debug option that displays verbose condor output.","title":"Usage"},{"location":"v23/troubleshooting/debugging-tools/#troubleshooting","text":"If the command fails with \u201cFailed ping\u2026\u201d: Make sure that the HTCondor-CE daemons are running on the CE If you see \u201cgsi@unmapped\u201d in the \u201cRemote Mapping\u201d line: Either your credentials are not mapped on the CE or authentication is not set up at all. To set up authentication, refer to our configuration document . If the job submits but does not complete: Look at the status of the job and perform the relevant troubleshooting steps .","title":"Troubleshooting"},{"location":"v23/troubleshooting/debugging-tools/#condor_ce_host_network_check","text":"","title":"condor_ce_host_network_check"},{"location":"v23/troubleshooting/debugging-tools/#usage_1","text":"condor_ce_host_network_check is a tool for testing an HTCondor-CE's networking configuration: root@host # condor_ce_host_network_check Starting analysis of host networking for HTCondor-CE System hostname: fermicloud360.fnal.gov FQDN matches hostname Forward resolution of hostname fermicloud360.fnal.gov is 131.225.155.96. Backward resolution of IPv4 131.225.155.96 is fermicloud360.fnal.gov. Forward and backward resolution match! HTCondor is considering all network interfaces and addresses. HTCondor would pick address of 131.225.155.96 as primary address. HTCondor primary address 131.225.155.96 matches system preferred address. Host network configuration should work with HTCondor-CE","title":"Usage"},{"location":"v23/troubleshooting/debugging-tools/#troubleshooting_1","text":"If the tool reports that Host network configuration not expected to work with HTCondor-CE , ensure that forward and reverse DNS resolution return the public IP and hostname.","title":"Troubleshooting"},{"location":"v23/troubleshooting/debugging-tools/#condor_ce_run","text":"","title":"condor_ce_run"},{"location":"v23/troubleshooting/debugging-tools/#usage_2","text":"condor_ce_run is a tool that submits a simple job to your CE, so it is useful for quickly submitting jobs through your CE. To submit a job to the CE and run the env command on the remote batch system: Note You must have generated a proxy (e.g., voms-proxy-init ) and your DN must be added to your chosen authentication method . user@host $ condor_ce_run -r condorce.example.com:9619 /bin/env Replacing the condorce.example.com with the hostname of the CE. If you are troubleshooting an HTCondor-CE that you do not have a login for and the CE accepts local universe jobs, you can run commands locally on the CE with condor_ce_run with the -l option. The following example outputs the JobRouterLog of the CE in question: user@host $ condor_ce_run -lr condorce.example.com:9619 cat /var/log/condor-ce/JobRouterLog Replacing the condorce.example.com text with the hostname of the CE. To disable this feature on your CE, consult this section of the install documentation.","title":"Usage"},{"location":"v23/troubleshooting/debugging-tools/#troubleshooting_2","text":"If you do not see any results: condor_ce_run does not display results until the job completes on the CE, which may take several minutes or longer if the CE is busy. In the meantime, can use condor_ce_q in a separate terminal to track the job on the CE. If you never see any results, use condor_ce_trace to pinpoint errors. If you see an error message that begins with \u201cFailed to\u2026\u201d: Check connectivity to the CE with condor_ce_trace or condor_ce_ping","title":"Troubleshooting"},{"location":"v23/troubleshooting/debugging-tools/#condor_ce_submit","text":"See this documentation for details","title":"condor_ce_submit"},{"location":"v23/troubleshooting/debugging-tools/#condor_ce_ping","text":"","title":"condor_ce_ping"},{"location":"v23/troubleshooting/debugging-tools/#usage_3","text":"Use the following condor_ce_ping command to test your ability to submit jobs to an HTCondor-CE, replacing condorce.example.com with the hostname of your CE: user@host $ condor_ce_ping -verbose -name condorce.example.com -pool condorce.example.com:9619 WRITE The following shows successful output where I am able to submit jobs ( Authorized: TRUE ) as the glow user ( Remote Mapping: glow@users.opensciencegrid.org ): Remote Version: $CondorVersion: 8.0.7 Sep 24 2014 $ Local Version: $CondorVersion: 8.0.7 Sep 24 2014 $ Session ID: condorce:27407:1412286981:3 Instruction: WRITE Command: 60021 Encryption: none Integrity: MD5 Authenticated using: GSI All authentication methods: GSI Remote Mapping: glow@users.opensciencegrid.org Authorized: TRUE Note If you run the condor_ce_ping command on the CE that you are testing, omit the -name and -pool options. condor_ce_ping takes the same arguments as condor_ping , which is documented in the HTCondor manual .","title":"Usage"},{"location":"v23/troubleshooting/debugging-tools/#troubleshooting_3","text":"If you see \u201cERROR: couldn\u2019t locate (null)\u201d , that means the HTCondor-CE schedd (the daemon that schedules jobs) cannot be reached. To track down the issue, increase debugging levels on the CE: MASTER_DEBUG = D_ALWAYS:2 D_CAT SCHEDD_DEBUG = D_ALWAYS:2 D_CAT Then look in the MasterLog and SchedLog for any errors. If you see \u201cgsi@unmapped\u201d in the \u201cRemote Mapping\u201d line , this means that either your credentials are not mapped on the CE or that authentication is not set up at all. To set up authentication, refer to our configuration document .","title":"Troubleshooting"},{"location":"v23/troubleshooting/debugging-tools/#condor_ce_q","text":"","title":"condor_ce_q"},{"location":"v23/troubleshooting/debugging-tools/#usage_4","text":"condor_ce_q can display job status or specific job attributes for jobs that are still in the CE\u2019s queue. To list jobs that are queued on a CE: user@host $ condor_ce_q -name condorce.example.com -pool condorce.example.com:9619 To inspect the full ClassAd for a specific job, specify the -l flag and the job ID: user@host $ condor_ce_q -name condorce.example.com -pool condorce.example.com:9619 -l Note If you run the condor_ce_q command on the CE that you are testing, omit the -name and -pool options. condor_ce_q takes the same arguments as condor_q , which is documented in the HTCondor manual .","title":"Usage"},{"location":"v23/troubleshooting/debugging-tools/#troubleshooting_4","text":"If the jobs that you are submiting to a CE are not completing, condor_ce_q can tell you the status of your jobs. If the schedd is not running: You will see a lengthy message about being unable to contact the schedd. To track down the issue, increase the debugging levels on the CE with: MASTER_DEBUG = D_ALWAYS:2 D_CAT SCHEDD_DEBUG = D_ALWAYS:2 D_CAT To apply these changes, reconfigure HTCondor-CE: root@host # condor_ce_reconfig Then look in the MasterLog and SchedLog on the CE for any errors. If there are issues with contacting the collector: You will see the following message: user@host $ condor_ce_q -pool ce1.accre.vanderbilt.edu -name ce1.accre.vanderbilt.edu -- Failed to fetch ads from: <129.59.197.223:9620?sock`33630_8b33_4> : ce1.accre.vanderbilt.edu This may be due to network issues or bad HTCondor daemon permissions. To fix the latter issue, ensure that the ALLOW_READ configuration value is not set: user@host $ condor_ce_config_val -v ALLOW_READ Not defined: ALLOW_READ If it is defined, remove it from the file that is returned in the output. If a job is held: There should be an accompanying HoldReason that will tell you why it is being held. The HoldReason is in the job\u2019s ClassAd, so you can use the long form of condor_ce_q to extract its value: user@host $ condor_ce_q -name condorce.example.com -pool condorce.example.com:9619 -l | grep HoldReason If a job is idle: The most common cause is that it is not matching any routes in the CE\u2019s job router. To find out whether this is the case, use the condor_ce_job_router_info .","title":"Troubleshooting"},{"location":"v23/troubleshooting/debugging-tools/#condor_ce_history","text":"","title":"condor_ce_history"},{"location":"v23/troubleshooting/debugging-tools/#usage_5","text":"condor_ce_history can display job status or specific job attributes for jobs that have that have left the CE\u2019s queue. To list jobs that have run on the CE: user@host $ condor_ce_history -name condorce.example.com -pool condorce.example.com:9619 To inspect the full ClassAd for a specific job, specify the -l flag and the job ID: user@host $ condor_ce_history -name condorce.example.com -pool condorce.example.com:9619 -l Note If you run the condor_ce_history command on the CE that you are testing, omit the -name and -pool options. condor_ce_history takes the same arguments as condor_history , which is documented in the HTCondor manual .","title":"Usage"},{"location":"v23/troubleshooting/debugging-tools/#condor_ce_job_router_info","text":"","title":"condor_ce_job_router_info"},{"location":"v23/troubleshooting/debugging-tools/#usage_6","text":"Use the condor_ce_job_router_info command to help troubleshoot your routes and how jobs will match to them. To see all of your routes (the output is long because it combines your routes with the JOB_ROUTER_DEFAULTS configuration variable): root@host # condor_ce_job_router_info -config To see how the job router is handling a job that is currently in the CE\u2019s queue, analyze the output of condor_ce_q (replace the with the job ID that you are interested in): root@host # condor_ce_q -l | condor_ce_job_router_info -match-jobs -ignore-prior-routing -jobads - To inspect a job that has already left the queue, use condor_ce_history instead of condor_ce_q : root@host # condor_ce_history -l | condor_ce_job_router_info -match-jobs -ignore-prior-routing -jobads - Note If the proxy for the job has expired, the job will not match any routes. To work around this constraint: root@host # condor_ce_history -l | sed \"s/^\\(x509UserProxyExpiration\\) = .*/\\1 = `date +%s --date '+1 sec'`/\" | condor_ce_job_router_info -match-jobs -ignore-prior-routing -jobads - Alternatively, you can provide a file containing a job\u2019s ClassAd as the input and edit attributes within that file: root@host # condor_ce_job_router_info -match-jobs -ignore-prior-routing -jobads ","title":"Usage"},{"location":"v23/troubleshooting/debugging-tools/#troubleshooting_5","text":"If the job does not match any route: You can identify this case when you see 0 candidate jobs found in the condor_job_router_info output. This message means that, when compared to your job\u2019s ClassAd, the Umbrella constraint does not evaluate to true . When troubleshooting, look at all of the expressions prior to the target.ProcId >= 0 expression, because it and everything following it is logic that the job router added so that routed jobs do not get routed again. If your job matches more than one route: the tool will tell you by showing all matching routes after the job ID: Checking Job src=162,0 against all routes Route Matches: Local_PBS Route Matches: Condor_Test To troubleshoot why this is occuring, look at the combined Requirements expressions for all routes and compare it to the job\u2019s ClassAd provided. The combined Requirements expression is highlighted below: Umbrella constraint: ((target.x509userproxysubject =!= UNDEFINED) && (target.x509UserProxyExpiration =!= UNDEFINED) && (time() < target.x509UserProxyExpiration) && (target.JobUniverse =?= 5 || target.JobUniverse =?= 1)) && ( (target.osgTestPBS is true) || (true) ) && (target.ProcId >= 0 && target.JobStatus == 1 && (target.StageInStart is undefined || target.StageInFinish isnt undefined) && target.Managed isnt \"ScheddDone\" && target.Managed isnt \"Extenal\" && target.Owner isnt Undefined && target.RoutedBy isnt \"htcondor-ce\") Both routes evaluate to true for the job\u2019s ClassAd because it contained osgTestPBS = true . Make sure your routes are mutually exclusive, otherwise you may have jobs routed incorrectly! See the job route configuration page for more details. If it is unclear why jobs are matching a route: wrap the route's requirements expression in debug() and check the JobRouterLog for more information.","title":"Troubleshooting"},{"location":"v23/troubleshooting/debugging-tools/#condor_ce_router_q","text":"","title":"condor_ce_router_q"},{"location":"v23/troubleshooting/debugging-tools/#usage_7","text":"If you have multiple job routes and many jobs, condor_ce_router_q is a useful tool to see how jobs are being routed and their statuses: user@host $ condor_ce_router_q condor_ce_router_q takes the same options as condor_router_q , which is documented in the HTCondor manual","title":"Usage"},{"location":"v23/troubleshooting/debugging-tools/#condor_ce_test_token","text":"","title":"condor_ce_test_token"},{"location":"v23/troubleshooting/debugging-tools/#usage_8","text":"Use the condor_ce_test_token command to test SciTokens authentication in the CE. It will create a token with an issuer and subject that you specify and configure the CE daemons to accept that token as if it had been generated by the given issuer (for one hour). The token is printed to stdout; use it with condor_ce_submit to test that SciTokens authentication and user mapping operate correctly. To create a temporary SciToken that appears to be issued by the SciTokens demo issuer: root@host # condor_ce_token_test --issuer https://demo.scitokens.org --audience ANY --scope condor:/WRITE --subject alice@foo.edu Note You must run condor_ce_test_token on the CE that you are testing as the root user. condor_ce_test_token takes the same arguments as condor_test_token , which is documented in the HTCondor manual .","title":"Usage"},{"location":"v23/troubleshooting/debugging-tools/#condor_ce_status","text":"","title":"condor_ce_status"},{"location":"v23/troubleshooting/debugging-tools/#usage_9","text":"To see the daemons running on a CE, run the following command: user@host $ condor_ce_status -any condor_ce_status takes the same arguments as condor_status , which is documented in the HTCondor manual . \"Missing\" Worker Nodes An HTCondor-CE will not show any worker nodes (e.g. Machine entries in the condor_ce_status -any output) if it does not have any running GlideinWMS pilot jobs. This is expected since HTCondor-CE only forwards incoming pilot jobs to your batch system and does not match jobs to worker nodes.","title":"Usage"},{"location":"v23/troubleshooting/debugging-tools/#troubleshooting_6","text":"If the output of condor_ce_status -any does not show at least the following daemons: Collector Scheduler DaemonMaster Job_Router Increase the debug level and consult the HTCondor-CE logs for errors.","title":"Troubleshooting"},{"location":"v23/troubleshooting/debugging-tools/#condor_ce_config_val","text":"","title":"condor_ce_config_val"},{"location":"v23/troubleshooting/debugging-tools/#usage_10","text":"To see the value of configuration variables and where they are set, use condor_ce_config_val . Primarily, This tool is used with the other troubleshooting tools to make sure your configuration is set properly. To see the value of a single variable and where it is set: user@host $ condor_ce_config_val -v To see a list of all configuration variables and their values: user@host $ condor_ce_config_val -dump To see a list of all the files that are used to create your configuration and the order that they are parsed, use the following command: user@host $ condor_ce_config_val -config condor_ce_config_val takes the same arguments as condor_config_val , which is documented in the HTCondor manual .","title":"Usage"},{"location":"v23/troubleshooting/debugging-tools/#condor_ce_reconfig","text":"","title":"condor_ce_reconfig"},{"location":"v23/troubleshooting/debugging-tools/#usage_11","text":"To ensure that your configuration changes have taken effect, run condor_ce_reconfig . user@host $ condor_ce_reconfig","title":"Usage"},{"location":"v23/troubleshooting/debugging-tools/#condor_ce_onoffrestart","text":"","title":"condor_ce_{on,off,restart}"},{"location":"v23/troubleshooting/debugging-tools/#usage_12","text":"To turn on/off/restart HTCondor-CE daemons, use the following commands: root@host # condor_ce_on root@host # condor_ce_off root@host # condor_ce_restart The HTCondor-CE service uses the previous commands with default values. Using these commands directly gives you more fine-grained control over the behavior of HTCondor-CE's on/off/restart: If you have installed a new version of HTCondor-CE and want to restart the CE under the new version, run the following command: root@host # condor_ce_restart -fast This will cause HTCondor-CE to restart and quickly reconnect to all running jobs. If you need to stop running new jobs, run the following: root@host # condor_ce_off -peaceful This will cause HTCondor-CE to accept new jobs without starting them and will wait for currently running jobs to complete before shutting down.","title":"Usage"},{"location":"v23/troubleshooting/debugging-tools/#getting-help","text":"If you have any questions or issues about troubleshooting remote HTCondor-CEs, please contact us for assistance.","title":"Getting Help"},{"location":"v23/troubleshooting/logs/","text":"Helpful Logs \u00b6 MasterLog \u00b6 The HTCondor-CE master log tracks status of all of the other HTCondor daemons and thus contains valuable information if they fail to start. Location: /var/log/condor-ce/MasterLog Key contents: Start-up, shut-down, and communication with other HTCondor daemons Increasing the debug level: Set the following value in /etc/condor-ce/config.d/99-local.conf on the CE host: MASTER_DEBUG = D_ALWAYS:2 D_CAT To apply these changes, reconfigure HTCondor-CE: root@host # condor_ce_reconfig What to look for \u00b6 Successful daemon start-up. The following line shows that the Collector daemon started successfully: 10/07/14 14:20:27 Started DaemonCore process \"/usr/sbin/condor_collector -f -port 9619\", pid and pgroup = 7318 SchedLog \u00b6 The HTCondor-CE schedd log contains information on all jobs that are submitted to the CE. It contains valuable information when trying to troubleshoot authentication issues. Location: /var/log/condor-ce/SchedLog Key contents: Every job submitted to the CE User authorization events Increasing the debug level: Set the following value in /etc/condor-ce/config.d/99-local.conf on the CE host: SCHEDD_DEBUG = D_ALWAYS:2 D_CAT To apply these changes, reconfigure HTCondor-CE: root@host # condor_ce_reconfig What to look for \u00b6 Job is submitted to the CE queue: 10/07/14 16:52:17 Submitting new job 234.0 In this example, the ID of the submitted job is 234.0 . Job owner is authorized and mapped: 10/07/14 16:52:17 Command=QMGMT_WRITE_CMD, peer=<131.225.154.68:42262> 10/07/14 16:52:17 AuthMethod=GSI, AuthId=/DC=com/DC=DigiCert-Grid/O=Open Science Grid/OU=People/CN=Brian Lin 1047, /GLOW/Role=NULL/Capability=NULL, In this example, the job is authorized with the job\u2019s proxy subject using GSI and is mapped to the glow user. User job submission fails due to improper authentication or authorization: 08/30/16 16:52:56 DC_AUTHENTICATE: required authentication of 72.33.0.189 failed: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using GSI|GSI:5002:Failed to authenticate because the remote (client) side was not able to acquire its credentials.|AUTHENTICATE:1004:Failed to authenticate using FS|FS:1004:Unable to lstat(/tmp/FS_XXXZpUlYa) 08/30/16 16:53:12 PERMISSION DENIED to from host 72.33.0.189 for command 60021 (DC_NOP_WRITE), access level WRITE: reason: WRITE authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 72.33.0.189,dyn-72-33-0-189.uwnet.wisc.edu, hostname size = 1, original ip address = 72.33.0.189 08/30/16 16:53:12 DC_AUTHENTICATE: Command not authorized, done Missing negotiator: 10/18/14 17:32:21 Can't find address for negotiator 10/18/14 17:32:21 Failed to send RESCHEDULE to unknown daemon: Since HTCondor-CE does not manage any resources, it does not run a negotiator daemon by default and this error message is expected. In the same vein, you may see messages that there are 0 worker nodes: 06/23/15 11:15:03 Number of Active Workers 0 Corrupted job_queue.log : 02/07/17 10:55:49 WARNING: Encountered corrupt log record _654 (byte offset 5046225) 02/07/17 10:55:49 103 1354325.0 PeriodicRemove ( StageInFinish > 0 ) 105 02/07/17 10:55:49 Lines following corrupt log record _654 (up to 3): 02/07/17 10:55:49 103 1346101.0 RemoteWallClockTime 116668.000000 02/07/17 10:55:49 104 1346101.0 WallClockCheckpoint 02/07/17 10:55:49 104 1346101.0 ShadowBday 02/07/17 10:55:49 ERROR \"Error: corrupt log record _654 (byte offset 5046225) occurred inside closed transaction, recovery failed\" at line 1080 in file /builddir/build/BUILD/condor-8.4.8/src/condor_utils/classad_log.cpp This means /var/lib/condor-ce/spool/job_queue.log has been corrupted and you will need to hand-remove the offending record by searching for the text specified after the Lines following corrupt log record... line. The most common culprit of the corruption is that the disk containing the job_queue.log has filled up. To avoid this problem, you can change the location of job_queue.log by setting JOB_QUEUE_LOG in /etc/condor-ce/config.d/ to a path, preferably one on a large SSD. JobRouterLog \u00b6 The HTCondor-CE job router log produced by the job router itself and thus contains valuable information when trying to troubleshoot issues with job routing. Location: /var/log/condor-ce/JobRouterLog Key contents: Every attempt to route a job Routing success messages Job attribute changes, based on chosen route Job submission errors to an HTCondor batch system Corresponding job IDs on an HTCondor batch system Increasing the debug level: Set the following value in /etc/condor-ce/config.d/99-local.conf on the CE host: JOB_ROUTER_DEBUG = D_ALWAYS:2 D_CAT Apply these changes, reconfigure HTCondor-CE: root@host # condor_ce_reconfig Known Errors \u00b6 (HTCondor batch systems only) If you see the following error message: Can't find address of schedd This means that HTCondor-CE cannot communicate with your HTCondor batch system. Verify that the condor service is running on the HTCondor-CE host and is configured for your central manager. (HTCondor batch systems only) If you see the following error message: JobRouter failure (src=2810466.0,dest=47968.0,route=MWT2_UCORE): giving up, because submitted job is still not in job queue mirror (submitted 605 seconds ago). Perhaps it has been removed? Ensure that condor_config_val SPOOL and condor_ce_config_val JOB_ROUTER_SCHEDD2_SPOOL return the same value. If they don't, change the value of JOB_ROUTER_SCHEDD2_SPOOL in your HTCondor-CE configuration to match SPOOL from your HTCondor configuration. If you have D_ALWAYS:2 turned on for the job router, you will see errors like the following: 06/12/15 14:00:28 HOOK_UPDATE_JOB_INFO not configured. You can safely ignore these. What to look for \u00b6 Job is considered for routing: 09/17/14 15:00:56 JobRouter (src=86.0,route=Local_LSF): found candidate job In parentheses are the original HTCondor-CE job ID (e.g., 86.0 ) and the route (e.g., Local_LSF ). Job is successfully routed: 09/17/14 15:00:57 JobRouter (src=86.0,route=Local_LSF): claimed job Finding the corresponding job ID on your HTCondor batch system: 09/17/14 15:00:57 JobRouter (src=86.0,dest=205.0,route=Local_Condor): claimed job In parentheses are the original HTCondor-CE job ID (e.g., 86.0 ) and the resultant job ID on the HTCondor batch system (e.g., 205.0 ) If your job is not routed, there will not be any evidence of it within the log itself. To investigate why your jobs are not being considered for routing, use the condor_ce_job_router_info HTCondor batch systems only : The following error occurs when the job router daemon cannot submit the routed job: 10/19/14 13:09:15 Can't resolve collector condorce.example.com; skipping 10/19/14 13:09:15 ERROR (pool condorce.example.com) Can't find address of schedd 10/19/14 13:09:15 JobRouter failure (src=5.0,route=Local_Condor): failed to submit job GridmanagerLog \u00b6 The HTCondor-CE grid manager log tracks the submission and status of jobs on non-HTCondor batch systems. It contains valuable information when trying to troubleshoot jobs that have been routed but failed to complete. Details on how to read the Gridmanager log can be found on the HTCondor Wiki . Location: /var/log/condor-ce/GridmanagerLog. Key contents: Every attempt to submit a job to a batch system or other grid resource Status updates of submitted jobs Corresponding job IDs on non-HTCondor batch systems Increasing the debug level: Set the following value in /etc/condor-ce/config.d/99-local.conf on the CE host: MAX_GRIDMANAGER_LOG = 6h MAX_NUM_GRIDMANAGER_LOG = 8 GRIDMANAGER_DEBUG = D_ALWAYS:2 D_CAT To apply these changes, reconfigure HTCondor-CE: root@host # condor_ce_reconfig What to look for \u00b6 Job is submitted to the batch system: 09/17/14 09:51:34 [12997] (85.0) gm state change: GM_SUBMIT_SAVE -> GM_SUBMITTED Every state change the Gridmanager tracks should have the job ID in parentheses (i.e.=(85.0)). Job status being updated: 09/17/14 15:07:24 [25543] (87.0) gm state change: GM_SUBMITTED -> GM_POLL_ACTIVE 09/17/14 15:07:24 [25543] GAHP[25563] <- 'BLAH_JOB_STATUS 3 lsf/20140917/482046' 09/17/14 15:07:24 [25543] GAHP[25563] -> 'S' 09/17/14 15:07:25 [25543] GAHP[25563] <- 'RESULTS' 09/17/14 15:07:25 [25543] GAHP[25563] -> 'R' 09/17/14 15:07:25 [25543] GAHP[25563] -> 'S' '1' 09/17/14 15:07:25 [25543] GAHP[25563] -> '3' '0' 'No Error' '4' '[ BatchjobId = \"482046\"; JobStatus = 4; ExitCode = 0; WorkerNode = \"atl-prod08\" ]' The first line tells us that the Gridmanager is initiating a status update and the following lines are the results. The most interesting line is the second highlighted section that notes the job ID on the batch system and its status. If there are errors querying the job on the batch system, they will appear here. Finding the corresponding job ID on your non-HTCondor batch system: 09/17/14 15:07:24 [25543] (87.0) gm state change: GM_SUBMITTED -> GM_POLL_ACTIVE 09/17/14 15:07:24 [25543] GAHP[25563] <- 'BLAH_JOB_STATUS 3 lsf/20140917/482046' On the first line, after the timestamp and PID of the Gridmanager process, you will find the CE\u2019s job ID in parentheses, (87.0) . At the end of the second line, you will find the batch system, date, and batch system job id separated by slashes, lsf/20140917/482046 . Job completion on the batch system: 09/17/14 15:07:25 [25543] (87.0) gm state change: GM_TRANSFER_OUTPUT -> GM_DONE_SAVE SharedPortLog \u00b6 The HTCondor-CE shared port log keeps track of all connections to all of the HTCondor-CE daemons other than the collector. This log is a good place to check if experiencing connectivity issues with HTCondor-CE. More information can be found here . Location: /var/log/condor-ce/SharedPortLog Key contents: Every attempt to connect to HTCondor-CE (except collector queries) Increasing the debug level: Set the following value in /etc/condor-ce/config.d/99-local.conf on the CE host: SHARED_PORT_DEBUG = D_ALWAYS:2 D_CAT To apply these changes, reconfigure HTCondor-CE: root@host # condor_ce_reconfig BLAHP Configuration File \u00b6 HTCondor-CE uses the BLAHP to submit jobs to your local non-HTCondor batch system using your batch system's client tools. Location: /etc/blah.config Key contents: Locations of the batch system's client binaries and logs Location to save files that are submitted to the local batch system You can also tell the BLAHP to save the files that are being submitted to the local batch system to by adding the following line: blah_debug_save_submit_info= The BLAHP will then create a directory with the format bl_* for each submission to the local jobmanager with the submit file and proxy used. Note Whitespace is important so do not put any spaces around the = sign. In addition, the directory must be created and HTCondor-CE should have sufficient permissions to create directories within . Getting Help \u00b6 If you have any questions or issues about troubleshooting remote HTCondor-CEs, please contact us for assistance.","title":"Helpful Logs"},{"location":"v23/troubleshooting/logs/#helpful-logs","text":"","title":"Helpful Logs"},{"location":"v23/troubleshooting/logs/#masterlog","text":"The HTCondor-CE master log tracks status of all of the other HTCondor daemons and thus contains valuable information if they fail to start. Location: /var/log/condor-ce/MasterLog Key contents: Start-up, shut-down, and communication with other HTCondor daemons Increasing the debug level: Set the following value in /etc/condor-ce/config.d/99-local.conf on the CE host: MASTER_DEBUG = D_ALWAYS:2 D_CAT To apply these changes, reconfigure HTCondor-CE: root@host # condor_ce_reconfig","title":"MasterLog"},{"location":"v23/troubleshooting/logs/#what-to-look-for","text":"Successful daemon start-up. The following line shows that the Collector daemon started successfully: 10/07/14 14:20:27 Started DaemonCore process \"/usr/sbin/condor_collector -f -port 9619\", pid and pgroup = 7318","title":"What to look for"},{"location":"v23/troubleshooting/logs/#schedlog","text":"The HTCondor-CE schedd log contains information on all jobs that are submitted to the CE. It contains valuable information when trying to troubleshoot authentication issues. Location: /var/log/condor-ce/SchedLog Key contents: Every job submitted to the CE User authorization events Increasing the debug level: Set the following value in /etc/condor-ce/config.d/99-local.conf on the CE host: SCHEDD_DEBUG = D_ALWAYS:2 D_CAT To apply these changes, reconfigure HTCondor-CE: root@host # condor_ce_reconfig","title":"SchedLog"},{"location":"v23/troubleshooting/logs/#what-to-look-for_1","text":"Job is submitted to the CE queue: 10/07/14 16:52:17 Submitting new job 234.0 In this example, the ID of the submitted job is 234.0 . Job owner is authorized and mapped: 10/07/14 16:52:17 Command=QMGMT_WRITE_CMD, peer=<131.225.154.68:42262> 10/07/14 16:52:17 AuthMethod=GSI, AuthId=/DC=com/DC=DigiCert-Grid/O=Open Science Grid/OU=People/CN=Brian Lin 1047, /GLOW/Role=NULL/Capability=NULL, In this example, the job is authorized with the job\u2019s proxy subject using GSI and is mapped to the glow user. User job submission fails due to improper authentication or authorization: 08/30/16 16:52:56 DC_AUTHENTICATE: required authentication of 72.33.0.189 failed: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using GSI|GSI:5002:Failed to authenticate because the remote (client) side was not able to acquire its credentials.|AUTHENTICATE:1004:Failed to authenticate using FS|FS:1004:Unable to lstat(/tmp/FS_XXXZpUlYa) 08/30/16 16:53:12 PERMISSION DENIED to from host 72.33.0.189 for command 60021 (DC_NOP_WRITE), access level WRITE: reason: WRITE authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 72.33.0.189,dyn-72-33-0-189.uwnet.wisc.edu, hostname size = 1, original ip address = 72.33.0.189 08/30/16 16:53:12 DC_AUTHENTICATE: Command not authorized, done Missing negotiator: 10/18/14 17:32:21 Can't find address for negotiator 10/18/14 17:32:21 Failed to send RESCHEDULE to unknown daemon: Since HTCondor-CE does not manage any resources, it does not run a negotiator daemon by default and this error message is expected. In the same vein, you may see messages that there are 0 worker nodes: 06/23/15 11:15:03 Number of Active Workers 0 Corrupted job_queue.log : 02/07/17 10:55:49 WARNING: Encountered corrupt log record _654 (byte offset 5046225) 02/07/17 10:55:49 103 1354325.0 PeriodicRemove ( StageInFinish > 0 ) 105 02/07/17 10:55:49 Lines following corrupt log record _654 (up to 3): 02/07/17 10:55:49 103 1346101.0 RemoteWallClockTime 116668.000000 02/07/17 10:55:49 104 1346101.0 WallClockCheckpoint 02/07/17 10:55:49 104 1346101.0 ShadowBday 02/07/17 10:55:49 ERROR \"Error: corrupt log record _654 (byte offset 5046225) occurred inside closed transaction, recovery failed\" at line 1080 in file /builddir/build/BUILD/condor-8.4.8/src/condor_utils/classad_log.cpp This means /var/lib/condor-ce/spool/job_queue.log has been corrupted and you will need to hand-remove the offending record by searching for the text specified after the Lines following corrupt log record... line. The most common culprit of the corruption is that the disk containing the job_queue.log has filled up. To avoid this problem, you can change the location of job_queue.log by setting JOB_QUEUE_LOG in /etc/condor-ce/config.d/ to a path, preferably one on a large SSD.","title":"What to look for"},{"location":"v23/troubleshooting/logs/#jobrouterlog","text":"The HTCondor-CE job router log produced by the job router itself and thus contains valuable information when trying to troubleshoot issues with job routing. Location: /var/log/condor-ce/JobRouterLog Key contents: Every attempt to route a job Routing success messages Job attribute changes, based on chosen route Job submission errors to an HTCondor batch system Corresponding job IDs on an HTCondor batch system Increasing the debug level: Set the following value in /etc/condor-ce/config.d/99-local.conf on the CE host: JOB_ROUTER_DEBUG = D_ALWAYS:2 D_CAT Apply these changes, reconfigure HTCondor-CE: root@host # condor_ce_reconfig","title":"JobRouterLog"},{"location":"v23/troubleshooting/logs/#known-errors","text":"(HTCondor batch systems only) If you see the following error message: Can't find address of schedd This means that HTCondor-CE cannot communicate with your HTCondor batch system. Verify that the condor service is running on the HTCondor-CE host and is configured for your central manager. (HTCondor batch systems only) If you see the following error message: JobRouter failure (src=2810466.0,dest=47968.0,route=MWT2_UCORE): giving up, because submitted job is still not in job queue mirror (submitted 605 seconds ago). Perhaps it has been removed? Ensure that condor_config_val SPOOL and condor_ce_config_val JOB_ROUTER_SCHEDD2_SPOOL return the same value. If they don't, change the value of JOB_ROUTER_SCHEDD2_SPOOL in your HTCondor-CE configuration to match SPOOL from your HTCondor configuration. If you have D_ALWAYS:2 turned on for the job router, you will see errors like the following: 06/12/15 14:00:28 HOOK_UPDATE_JOB_INFO not configured. You can safely ignore these.","title":"Known Errors"},{"location":"v23/troubleshooting/logs/#what-to-look-for_2","text":"Job is considered for routing: 09/17/14 15:00:56 JobRouter (src=86.0,route=Local_LSF): found candidate job In parentheses are the original HTCondor-CE job ID (e.g., 86.0 ) and the route (e.g., Local_LSF ). Job is successfully routed: 09/17/14 15:00:57 JobRouter (src=86.0,route=Local_LSF): claimed job Finding the corresponding job ID on your HTCondor batch system: 09/17/14 15:00:57 JobRouter (src=86.0,dest=205.0,route=Local_Condor): claimed job In parentheses are the original HTCondor-CE job ID (e.g., 86.0 ) and the resultant job ID on the HTCondor batch system (e.g., 205.0 ) If your job is not routed, there will not be any evidence of it within the log itself. To investigate why your jobs are not being considered for routing, use the condor_ce_job_router_info HTCondor batch systems only : The following error occurs when the job router daemon cannot submit the routed job: 10/19/14 13:09:15 Can't resolve collector condorce.example.com; skipping 10/19/14 13:09:15 ERROR (pool condorce.example.com) Can't find address of schedd 10/19/14 13:09:15 JobRouter failure (src=5.0,route=Local_Condor): failed to submit job","title":"What to look for"},{"location":"v23/troubleshooting/logs/#gridmanagerlog","text":"The HTCondor-CE grid manager log tracks the submission and status of jobs on non-HTCondor batch systems. It contains valuable information when trying to troubleshoot jobs that have been routed but failed to complete. Details on how to read the Gridmanager log can be found on the HTCondor Wiki . Location: /var/log/condor-ce/GridmanagerLog. Key contents: Every attempt to submit a job to a batch system or other grid resource Status updates of submitted jobs Corresponding job IDs on non-HTCondor batch systems Increasing the debug level: Set the following value in /etc/condor-ce/config.d/99-local.conf on the CE host: MAX_GRIDMANAGER_LOG = 6h MAX_NUM_GRIDMANAGER_LOG = 8 GRIDMANAGER_DEBUG = D_ALWAYS:2 D_CAT To apply these changes, reconfigure HTCondor-CE: root@host # condor_ce_reconfig","title":"GridmanagerLog"},{"location":"v23/troubleshooting/logs/#what-to-look-for_3","text":"Job is submitted to the batch system: 09/17/14 09:51:34 [12997] (85.0) gm state change: GM_SUBMIT_SAVE -> GM_SUBMITTED Every state change the Gridmanager tracks should have the job ID in parentheses (i.e.=(85.0)). Job status being updated: 09/17/14 15:07:24 [25543] (87.0) gm state change: GM_SUBMITTED -> GM_POLL_ACTIVE 09/17/14 15:07:24 [25543] GAHP[25563] <- 'BLAH_JOB_STATUS 3 lsf/20140917/482046' 09/17/14 15:07:24 [25543] GAHP[25563] -> 'S' 09/17/14 15:07:25 [25543] GAHP[25563] <- 'RESULTS' 09/17/14 15:07:25 [25543] GAHP[25563] -> 'R' 09/17/14 15:07:25 [25543] GAHP[25563] -> 'S' '1' 09/17/14 15:07:25 [25543] GAHP[25563] -> '3' '0' 'No Error' '4' '[ BatchjobId = \"482046\"; JobStatus = 4; ExitCode = 0; WorkerNode = \"atl-prod08\" ]' The first line tells us that the Gridmanager is initiating a status update and the following lines are the results. The most interesting line is the second highlighted section that notes the job ID on the batch system and its status. If there are errors querying the job on the batch system, they will appear here. Finding the corresponding job ID on your non-HTCondor batch system: 09/17/14 15:07:24 [25543] (87.0) gm state change: GM_SUBMITTED -> GM_POLL_ACTIVE 09/17/14 15:07:24 [25543] GAHP[25563] <- 'BLAH_JOB_STATUS 3 lsf/20140917/482046' On the first line, after the timestamp and PID of the Gridmanager process, you will find the CE\u2019s job ID in parentheses, (87.0) . At the end of the second line, you will find the batch system, date, and batch system job id separated by slashes, lsf/20140917/482046 . Job completion on the batch system: 09/17/14 15:07:25 [25543] (87.0) gm state change: GM_TRANSFER_OUTPUT -> GM_DONE_SAVE","title":"What to look for"},{"location":"v23/troubleshooting/logs/#sharedportlog","text":"The HTCondor-CE shared port log keeps track of all connections to all of the HTCondor-CE daemons other than the collector. This log is a good place to check if experiencing connectivity issues with HTCondor-CE. More information can be found here . Location: /var/log/condor-ce/SharedPortLog Key contents: Every attempt to connect to HTCondor-CE (except collector queries) Increasing the debug level: Set the following value in /etc/condor-ce/config.d/99-local.conf on the CE host: SHARED_PORT_DEBUG = D_ALWAYS:2 D_CAT To apply these changes, reconfigure HTCondor-CE: root@host # condor_ce_reconfig","title":"SharedPortLog"},{"location":"v23/troubleshooting/logs/#blahp-configuration-file","text":"HTCondor-CE uses the BLAHP to submit jobs to your local non-HTCondor batch system using your batch system's client tools. Location: /etc/blah.config Key contents: Locations of the batch system's client binaries and logs Location to save files that are submitted to the local batch system You can also tell the BLAHP to save the files that are being submitted to the local batch system to by adding the following line: blah_debug_save_submit_info= The BLAHP will then create a directory with the format bl_* for each submission to the local jobmanager with the submit file and proxy used. Note Whitespace is important so do not put any spaces around the = sign. In addition, the directory must be created and HTCondor-CE should have sufficient permissions to create directories within .","title":"BLAHP Configuration File"},{"location":"v23/troubleshooting/logs/#getting-help","text":"If you have any questions or issues about troubleshooting remote HTCondor-CEs, please contact us for assistance.","title":"Getting Help"},{"location":"v23/troubleshooting/remote-troubleshooting/","text":"Troubleshooting Remote HTCondor-CEs \u00b6 Since HTCondor-CE is built on top of HTCondor, it's possible to perform quite a bit of troubleshooting from a remote client with access to the HTCondor command-line tools. For testing end-to-end resource request submission or remote interactive-like access, a grid operator will need access to a host with the htcondor-ce-client installed. This document outlines the steps that a grid operator can perform in order to troubleshoot a remote HTCondor-CE. Verifying Network Connectivity \u00b6 Before performing any troubleshooting of the remote HTCondor-CE service, it's important to verify that the HTCondor-CE can be contacted on its HTCondor-CE port (default: 9619 ) at the specified fully qualified domain name (FQDN). Verifying DNS \u00b6 As noted in the HTCondor-CE installation document , an HTCondor-CE must have forward and reverse DNS records. To verify DNS, use a tool like nslookup : $ nslookup htcondor-ce.chtc.wisc.edu Server: 144.92.254.254 Address: 144.92.254.254#53 Non-authoritative answer: Name: htcondor-ce.chtc.wisc.edu Address: 128.104.100.65 Name: htcondor-ce.chtc.wisc.edu Address: 2607:f388:107c:501:216:3eff:fe89:aa3 $ nslookup 128 .104.100.65 65.100.104.128.in-addr.arpa name = htcondor-ce.chtc.wisc.edu. Authoritative answers can be found from: 104.128.in-addr.arpa nameserver = dns2.itd.umich.edu. 104.128.in-addr.arpa nameserver = adns1.doit.wisc.edu. 104.128.in-addr.arpa nameserver = adns3.doit.wisc.edu. 104.128.in-addr.arpa nameserver = adns2.doit.wisc.edu. adns2.doit.wisc.edu internet address = 144.92.20.99 dns2.itd.umich.edu internet address = 192.12.80.222 adns3.doit.wisc.edu internet address = 144.92.104.21 adns1.doit.wisc.edu internet address = 144.92.9.21 adns2.doit.wisc.edu has AAAA address 2607:f388:d:2::1006 adns2.doit.wisc.edu has AAAA address 2607:f388::a53:2 adns3.doit.wisc.edu has AAAA address 2607:f388:2:2001::100b adns3.doit.wisc.edu has AAAA address 2607:f388::a53:3 adns1.doit.wisc.edu has AAAA address 2607:f388:2:2001::100a adns1.doit.wisc.edu has AAAA address 2607:f388::a53:1 If not, the HTCondor-CE administrator will have to register the appropriate DNS records. Verifying service connectivity \u00b6 After verifying DNS, check to see if the remote HTCondor-CE is listening on the appropriate port: $ telnet htcondor-ce.chtc.wisc.edu 9619 Trying 128.104.100.65... Connected to htcondor-ce.chtc.wisc.edu. Escape character is '^]'. If not, the HTCondor-CE administrator will have to ensure that the service is running and/or open up their firewall. Verifying Configuration \u00b6 Once you've verified network connectivity, you can start verifying the HTCondor-CE daemons. Inspecting daemons \u00b6 To inspect the running daemons of a remote HTCondor-CE, use condor_status : $ condor_status -any -pool htcondor-ce.chtc.wisc.edu:9619 MyType TargetType Name Collector None My Pool - htcondor-ce.chtc.wisc.edu@htcondor-ce.c Job_Router None htcondor-ce@htcondor-ce.chtc.wisc.edu Scheduler None htcondor-ce.chtc.wisc.edu DaemonMaster None htcondor-ce.chtc.wisc.edu Submitter None nu_lhcb@users.htcondor.org If you don't see the appropriate daemons, ask the administrator to following these troubleshooting steps . Submitter ad When querying daemons for an HTCondor-CE, you may see Submitter ads for each user with jobs in the queue. These ads are used to collect per-user stats that are available to the HTCondor-CE administrator. You can inspect the details of a specific daemon with per-daemon and -long options: $ condor_status -pool htcondor-ce.chtc.wisc.edu:9619 -collector -long ActiveQueryWorkers = 0 ActiveQueryWorkersPeak = 1 AddressV1 = \"{[ p=\\\"primary\\\"; a=\\\"128.104.100.65\\\"; port=9619; n=\\\"Internet\\\"; alias=\\\"htcondor-ce.chtc.wisc.edu\\\"; spid=\\\"collector\\\"; noUDP=true; ], [ p=\\\"IPv4\\\"; a=\\\"128.104.100.65\\\"; port=9619; n=\\\"Internet\\\"; alias=\\\"htcondor-ce.chtc.wisc.edu\\\"; spid=\\\"collector\\\"; noUDP=true; ], [ p=\\\"IPv6\\\"; a=\\\"2607:f388:107c:501:216:3eff:fe89:aa3\\\"; port=9619; n=\\\"Internet\\\"; alias=\\\"htcondor-ce.chtc.wisc.edu\\\"; spid=\\\"collector\\\"; noUDP=true; ]}\" CollectorIpAddr = \"<128.104.100.65:9619?addrs=128.104.100.65-9619+[2607-f388-107c-501-216-3eff-fe89-aa3]-9619&alias=htcondor-ce.chtc.wisc.edu&noUDP&sock=collector>\" CondorAdmin = \"root@htcondor-ce.chtc.wisc.edu\" CondorPlatform = \"$CondorPlatform: x86_64_CentOS7 $\" CondorVersion = \"$CondorVersion: 8.9.8 Jun 29 2020 BuildID: 508520 PackageID: 8.9.8-0.508520 $\" CurrentJobsRunningAll = 0 CurrentJobsRunningGrid = 0 CurrentJobsRunningJava = 0 Inspecting resource requests \u00b6 To inspect resource requests submitted to a remote HTCondor-CE, use condor_q : $ condor_q -all -name htcondor-ce.chtc.wisc.edu -pool htcondor-ce.chtc.wisc.edu:9619 -- Schedd: htcondor-ce.chtc.wisc.edu : <128.104.100.65:9619?... @ 10/29/20 15:31:45 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE HOLD TOTAL JOB_IDS nu_lhcb ID: 24631 10/18 12:06 33 _ _ _ 35 24631.5-15 nu_lhcb ID: 24632 10/18 12:23 7 _ _ _ 9 24632.2-4 nu_lhcb ID: 24635 10/18 14:23 3 _ _ _ 5 24635.0-1 nu_lhcb ID: 24636 10/18 14:40 5 _ _ _ 9 24636.0-6 nu_lhcb ID: 24637 10/18 14:58 7 _ _ _ 8 24637.2 nu_lhcb ID: 24638 10/18 15:15 7 _ _ _ 8 24638.1 You can inspect the details of a specific resource request with the -long option: $ condor_q -all -name htcondor-ce.chtc.wisc.edu -pool htcondor-ce.chtc.wisc.edu:9619 -long 24631 .5 Arguments = \"\" BufferBlockSize = 32768 BufferSize = 524288 BytesRecvd = 58669.0 BytesSent = 184201.0 ClusterId = 24631 Cmd = \"DIRAC_Ullq7V_pilotwrapper.py\" CommittedSlotTime = 0 CommittedSuspensionTime = 0 CommittedTime = 0 Retrieving HTCondor-CE configuration \u00b6 To verify a remote HTCondor-CE configuration, use condor_config_val : $ condor_config_val -name htcondor-ce.chtc.wisc.edu -pool htcondor-ce.chtc.wisc.edu:9619 -dump # Configuration from master on htcondor-ce.chtc.wisc.edu < 128 .104.100.65:9619?addrs = 128 .104.100.65-9619+ [ 2607 -f388-107c-501-216-3eff-fe89-aa3 ] -9619 & alias = htcondor-ce.chtc.wisc.edu & noUDP & sock = master_1744571_b0a0> ABORT_ON_EXCEPTION = false ACCOUNTANT_HOST = ACCOUNTANT_LOCAL_DOMAIN = ActivationTimer = ifThenElse(JobStart =!= UNDEFINED, (time() - JobStart), 0) ActivityTimer = (time() - EnteredCurrentActivity) ADD_WINDOWS_FIREWALL_EXCEPTION = $(CondorIsAdmin) ADVERTISE_IPV4_FIRST = $(PREFER_IPV4) ALL_DEBUG = D:CAT D_ALWAYS:2 ALLOW_ADMIN_COMMANDS = true If you know the name of the configuration variable, you can query for it directly: $ condor_config_val -name htcondor-ce.chtc.wisc.edu -pool htcondor-ce.chtc.wisc.edu:9619 -verbose JOB_ROUTER_SCHEDD2_NAME JOB_ROUTER_SCHEDD2_NAME = htcondor-ce.chtc.wisc.edu # at: /etc/condor-ce/config.d/02-ce-condor.conf, line 20 # raw: JOB_ROUTER_SCHEDD2_NAME = $( FULL_HOSTNAME ) Verifying Resource Request Submission \u00b6 After verifying that all the remote HTCondor-CE daemons are up , you can start submitting resource requests! Verifying authentication \u00b6 Before submitting a successful resource request, you will want to verify that you have submit privileges. For this, you will need a credential such as a grid proxy: $ voms-proxy-info subject : /DC=org/DC=cilogon/C=US/O=University of Wisconsin-Madison/CN=Brian Lin A2266246/CN=41319870 issuer : /DC=org/DC=cilogon/C=US/O=University of Wisconsin-Madison/CN=Brian Lin A2266246 identity : /DC=org/DC=cilogon/C=US/O=University of Wisconsin-Madison/CN=Brian Lin A2266246 type : RFC compliant proxy strength : 1024 bits path : /tmp/x509up_u1000 timeleft : 3:55:22 After you have retrieved your credential, verify that you have the ability to submit requests to the remote HTCondor-CE (i.e., WRITE access) with condor_ping : $ export _condor_SEC_CLIENT_AUTHENTICATION_METHODS = SCITOKENS,GSI $ export _condor_SEC_TOOL_DEBUG = D_SECURITY:2 # Extremely verbose debugging for troubleshooting authentication issues $ condor_ping -name htcondor-ce.chtc.wisc.edu \\ -pool htcondor-ce.chtc.wisc.edu:9619 \\ -verbose \\ -debug \\ WRITE [...] Remote Version: $CondorVersion: 8.9.8 Jun 29 2020 BuildID: 508520 PackageID: 8.9.8-0.508520 $ Local Version: $CondorVersion: 8.9.9 Aug 26 2020 BuildID: 515894 PackageID: 8.9.9-0.515894 PRE-RELEASE-UWCS $ Session ID: htcondor-ce:2980451:1604006441:0 Instruction: WRITE Command: 60021 Encryption: none Integrity: MD5 Authenticated using: GSI All authentication methods: FS,GSI Remote Mapping: blin@users.htcondor.org Authorized: TRUE If condor_ping fails, ask the administrator to follow this troubleshooting section , set SCHEDD_DEBUG = $(SCHEDD_DEBUG) D_SECURITY:2 , and cross-check the HTCondor-CE SchedLog for authentication issues. Submitting a trace request \u00b6 HTCondor-CE client This section requires an installation of the htcondor-ce-client . The easiest way to troubleshoot end-to-end resource request submission is condor_ce_trace , available in the htcondor-ce-client . Follow this documentation for detailed instructions for installing and using condor_ce_trace . Advanced Troubleshooting \u00b6 HTCondor-CE client This section requires an installation of the htcondor-ce-client . If the issue at hand is complicated or the communication turnaround time with the administrator is too long, it is often more expedient to grant the operator direct access to the HTCondor-CE host. Instead of direct login access, HTCondor-CE has the ability to allow a remote operator to run commands on the host as an unprivileged user. This requires permission to submit resource requests as well as an HTCondor-CE that is configured to run local universe jobs: $ condor_config_val -name htcondor-ce.chtc.wisc.edu -pool htcondor-ce.chtc.wisc.edu:9619 -verbose START_LOCAL_UNIVERSE START_LOCAL_UNIVERSE = TotalLocalJobsRunning + TotalSchedulerJobsRunning < 20 # at: /etc/condor-ce/config.d/03-managed-fork.conf, line 11 # raw: START_LOCAL_UNIVERSE = TotalLocalJobsRunning + TotalSchedulerJobsRunning < 20 # default: TotalLocalJobsRunning < 200 After verifying that you can submit resource requests and that the HTCondor-CE supports local universe, use condor_ce_run to run commands on the remote HTCondor-CE host: $ condor_ce_run -lr htcondor-ce.chtc.wisc.edu /bin/sh -c 'condor_q -all' -- Schedd: htcondor-ce.chtc.wisc.edu : <128.104.100.65:9618?... @ 10/29/20 17:42:27 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE HOLD TOTAL JOB_IDS nu_lhcb ID: 530251 7/13 22:20 _ _ _ _ 1 530251.0 nu_lhcb ID: 586158 10/28 05:15 _ _ _ 1 1 586158.0 nu_lhcb ID: 586213 10/28 06:23 _ 1 _ _ 1 586213.0 nu_lhcb ID: 586228 10/28 06:23 _ 1 _ _ 1 586228.0 nu_lhcb ID: 586254 10/28 06:23 _ 1 _ _ 1 586254.0 nu_lhcb ID: 586289 10/28 07:58 _ _ _ 1 1 586289.0 Getting Help \u00b6 If you have any questions or issues about troubleshooting remote HTCondor-CEs, please contact us for assistance.","title":"Remote Troubleshooting"},{"location":"v23/troubleshooting/remote-troubleshooting/#troubleshooting-remote-htcondor-ces","text":"Since HTCondor-CE is built on top of HTCondor, it's possible to perform quite a bit of troubleshooting from a remote client with access to the HTCondor command-line tools. For testing end-to-end resource request submission or remote interactive-like access, a grid operator will need access to a host with the htcondor-ce-client installed. This document outlines the steps that a grid operator can perform in order to troubleshoot a remote HTCondor-CE.","title":"Troubleshooting Remote HTCondor-CEs"},{"location":"v23/troubleshooting/remote-troubleshooting/#verifying-network-connectivity","text":"Before performing any troubleshooting of the remote HTCondor-CE service, it's important to verify that the HTCondor-CE can be contacted on its HTCondor-CE port (default: 9619 ) at the specified fully qualified domain name (FQDN).","title":"Verifying Network Connectivity"},{"location":"v23/troubleshooting/remote-troubleshooting/#verifying-dns","text":"As noted in the HTCondor-CE installation document , an HTCondor-CE must have forward and reverse DNS records. To verify DNS, use a tool like nslookup : $ nslookup htcondor-ce.chtc.wisc.edu Server: 144.92.254.254 Address: 144.92.254.254#53 Non-authoritative answer: Name: htcondor-ce.chtc.wisc.edu Address: 128.104.100.65 Name: htcondor-ce.chtc.wisc.edu Address: 2607:f388:107c:501:216:3eff:fe89:aa3 $ nslookup 128 .104.100.65 65.100.104.128.in-addr.arpa name = htcondor-ce.chtc.wisc.edu. Authoritative answers can be found from: 104.128.in-addr.arpa nameserver = dns2.itd.umich.edu. 104.128.in-addr.arpa nameserver = adns1.doit.wisc.edu. 104.128.in-addr.arpa nameserver = adns3.doit.wisc.edu. 104.128.in-addr.arpa nameserver = adns2.doit.wisc.edu. adns2.doit.wisc.edu internet address = 144.92.20.99 dns2.itd.umich.edu internet address = 192.12.80.222 adns3.doit.wisc.edu internet address = 144.92.104.21 adns1.doit.wisc.edu internet address = 144.92.9.21 adns2.doit.wisc.edu has AAAA address 2607:f388:d:2::1006 adns2.doit.wisc.edu has AAAA address 2607:f388::a53:2 adns3.doit.wisc.edu has AAAA address 2607:f388:2:2001::100b adns3.doit.wisc.edu has AAAA address 2607:f388::a53:3 adns1.doit.wisc.edu has AAAA address 2607:f388:2:2001::100a adns1.doit.wisc.edu has AAAA address 2607:f388::a53:1 If not, the HTCondor-CE administrator will have to register the appropriate DNS records.","title":"Verifying DNS"},{"location":"v23/troubleshooting/remote-troubleshooting/#verifying-service-connectivity","text":"After verifying DNS, check to see if the remote HTCondor-CE is listening on the appropriate port: $ telnet htcondor-ce.chtc.wisc.edu 9619 Trying 128.104.100.65... Connected to htcondor-ce.chtc.wisc.edu. Escape character is '^]'. If not, the HTCondor-CE administrator will have to ensure that the service is running and/or open up their firewall.","title":"Verifying service connectivity"},{"location":"v23/troubleshooting/remote-troubleshooting/#verifying-configuration","text":"Once you've verified network connectivity, you can start verifying the HTCondor-CE daemons.","title":"Verifying Configuration"},{"location":"v23/troubleshooting/remote-troubleshooting/#inspecting-daemons","text":"To inspect the running daemons of a remote HTCondor-CE, use condor_status : $ condor_status -any -pool htcondor-ce.chtc.wisc.edu:9619 MyType TargetType Name Collector None My Pool - htcondor-ce.chtc.wisc.edu@htcondor-ce.c Job_Router None htcondor-ce@htcondor-ce.chtc.wisc.edu Scheduler None htcondor-ce.chtc.wisc.edu DaemonMaster None htcondor-ce.chtc.wisc.edu Submitter None nu_lhcb@users.htcondor.org If you don't see the appropriate daemons, ask the administrator to following these troubleshooting steps . Submitter ad When querying daemons for an HTCondor-CE, you may see Submitter ads for each user with jobs in the queue. These ads are used to collect per-user stats that are available to the HTCondor-CE administrator. You can inspect the details of a specific daemon with per-daemon and -long options: $ condor_status -pool htcondor-ce.chtc.wisc.edu:9619 -collector -long ActiveQueryWorkers = 0 ActiveQueryWorkersPeak = 1 AddressV1 = \"{[ p=\\\"primary\\\"; a=\\\"128.104.100.65\\\"; port=9619; n=\\\"Internet\\\"; alias=\\\"htcondor-ce.chtc.wisc.edu\\\"; spid=\\\"collector\\\"; noUDP=true; ], [ p=\\\"IPv4\\\"; a=\\\"128.104.100.65\\\"; port=9619; n=\\\"Internet\\\"; alias=\\\"htcondor-ce.chtc.wisc.edu\\\"; spid=\\\"collector\\\"; noUDP=true; ], [ p=\\\"IPv6\\\"; a=\\\"2607:f388:107c:501:216:3eff:fe89:aa3\\\"; port=9619; n=\\\"Internet\\\"; alias=\\\"htcondor-ce.chtc.wisc.edu\\\"; spid=\\\"collector\\\"; noUDP=true; ]}\" CollectorIpAddr = \"<128.104.100.65:9619?addrs=128.104.100.65-9619+[2607-f388-107c-501-216-3eff-fe89-aa3]-9619&alias=htcondor-ce.chtc.wisc.edu&noUDP&sock=collector>\" CondorAdmin = \"root@htcondor-ce.chtc.wisc.edu\" CondorPlatform = \"$CondorPlatform: x86_64_CentOS7 $\" CondorVersion = \"$CondorVersion: 8.9.8 Jun 29 2020 BuildID: 508520 PackageID: 8.9.8-0.508520 $\" CurrentJobsRunningAll = 0 CurrentJobsRunningGrid = 0 CurrentJobsRunningJava = 0","title":"Inspecting daemons"},{"location":"v23/troubleshooting/remote-troubleshooting/#inspecting-resource-requests","text":"To inspect resource requests submitted to a remote HTCondor-CE, use condor_q : $ condor_q -all -name htcondor-ce.chtc.wisc.edu -pool htcondor-ce.chtc.wisc.edu:9619 -- Schedd: htcondor-ce.chtc.wisc.edu : <128.104.100.65:9619?... @ 10/29/20 15:31:45 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE HOLD TOTAL JOB_IDS nu_lhcb ID: 24631 10/18 12:06 33 _ _ _ 35 24631.5-15 nu_lhcb ID: 24632 10/18 12:23 7 _ _ _ 9 24632.2-4 nu_lhcb ID: 24635 10/18 14:23 3 _ _ _ 5 24635.0-1 nu_lhcb ID: 24636 10/18 14:40 5 _ _ _ 9 24636.0-6 nu_lhcb ID: 24637 10/18 14:58 7 _ _ _ 8 24637.2 nu_lhcb ID: 24638 10/18 15:15 7 _ _ _ 8 24638.1 You can inspect the details of a specific resource request with the -long option: $ condor_q -all -name htcondor-ce.chtc.wisc.edu -pool htcondor-ce.chtc.wisc.edu:9619 -long 24631 .5 Arguments = \"\" BufferBlockSize = 32768 BufferSize = 524288 BytesRecvd = 58669.0 BytesSent = 184201.0 ClusterId = 24631 Cmd = \"DIRAC_Ullq7V_pilotwrapper.py\" CommittedSlotTime = 0 CommittedSuspensionTime = 0 CommittedTime = 0","title":"Inspecting resource requests"},{"location":"v23/troubleshooting/remote-troubleshooting/#retrieving-htcondor-ce-configuration","text":"To verify a remote HTCondor-CE configuration, use condor_config_val : $ condor_config_val -name htcondor-ce.chtc.wisc.edu -pool htcondor-ce.chtc.wisc.edu:9619 -dump # Configuration from master on htcondor-ce.chtc.wisc.edu < 128 .104.100.65:9619?addrs = 128 .104.100.65-9619+ [ 2607 -f388-107c-501-216-3eff-fe89-aa3 ] -9619 & alias = htcondor-ce.chtc.wisc.edu & noUDP & sock = master_1744571_b0a0> ABORT_ON_EXCEPTION = false ACCOUNTANT_HOST = ACCOUNTANT_LOCAL_DOMAIN = ActivationTimer = ifThenElse(JobStart =!= UNDEFINED, (time() - JobStart), 0) ActivityTimer = (time() - EnteredCurrentActivity) ADD_WINDOWS_FIREWALL_EXCEPTION = $(CondorIsAdmin) ADVERTISE_IPV4_FIRST = $(PREFER_IPV4) ALL_DEBUG = D:CAT D_ALWAYS:2 ALLOW_ADMIN_COMMANDS = true If you know the name of the configuration variable, you can query for it directly: $ condor_config_val -name htcondor-ce.chtc.wisc.edu -pool htcondor-ce.chtc.wisc.edu:9619 -verbose JOB_ROUTER_SCHEDD2_NAME JOB_ROUTER_SCHEDD2_NAME = htcondor-ce.chtc.wisc.edu # at: /etc/condor-ce/config.d/02-ce-condor.conf, line 20 # raw: JOB_ROUTER_SCHEDD2_NAME = $( FULL_HOSTNAME )","title":"Retrieving HTCondor-CE configuration"},{"location":"v23/troubleshooting/remote-troubleshooting/#verifying-resource-request-submission","text":"After verifying that all the remote HTCondor-CE daemons are up , you can start submitting resource requests!","title":"Verifying Resource Request Submission"},{"location":"v23/troubleshooting/remote-troubleshooting/#verifying-authentication","text":"Before submitting a successful resource request, you will want to verify that you have submit privileges. For this, you will need a credential such as a grid proxy: $ voms-proxy-info subject : /DC=org/DC=cilogon/C=US/O=University of Wisconsin-Madison/CN=Brian Lin A2266246/CN=41319870 issuer : /DC=org/DC=cilogon/C=US/O=University of Wisconsin-Madison/CN=Brian Lin A2266246 identity : /DC=org/DC=cilogon/C=US/O=University of Wisconsin-Madison/CN=Brian Lin A2266246 type : RFC compliant proxy strength : 1024 bits path : /tmp/x509up_u1000 timeleft : 3:55:22 After you have retrieved your credential, verify that you have the ability to submit requests to the remote HTCondor-CE (i.e., WRITE access) with condor_ping : $ export _condor_SEC_CLIENT_AUTHENTICATION_METHODS = SCITOKENS,GSI $ export _condor_SEC_TOOL_DEBUG = D_SECURITY:2 # Extremely verbose debugging for troubleshooting authentication issues $ condor_ping -name htcondor-ce.chtc.wisc.edu \\ -pool htcondor-ce.chtc.wisc.edu:9619 \\ -verbose \\ -debug \\ WRITE [...] Remote Version: $CondorVersion: 8.9.8 Jun 29 2020 BuildID: 508520 PackageID: 8.9.8-0.508520 $ Local Version: $CondorVersion: 8.9.9 Aug 26 2020 BuildID: 515894 PackageID: 8.9.9-0.515894 PRE-RELEASE-UWCS $ Session ID: htcondor-ce:2980451:1604006441:0 Instruction: WRITE Command: 60021 Encryption: none Integrity: MD5 Authenticated using: GSI All authentication methods: FS,GSI Remote Mapping: blin@users.htcondor.org Authorized: TRUE If condor_ping fails, ask the administrator to follow this troubleshooting section , set SCHEDD_DEBUG = $(SCHEDD_DEBUG) D_SECURITY:2 , and cross-check the HTCondor-CE SchedLog for authentication issues.","title":"Verifying authentication"},{"location":"v23/troubleshooting/remote-troubleshooting/#submitting-a-trace-request","text":"HTCondor-CE client This section requires an installation of the htcondor-ce-client . The easiest way to troubleshoot end-to-end resource request submission is condor_ce_trace , available in the htcondor-ce-client . Follow this documentation for detailed instructions for installing and using condor_ce_trace .","title":"Submitting a trace request"},{"location":"v23/troubleshooting/remote-troubleshooting/#advanced-troubleshooting","text":"HTCondor-CE client This section requires an installation of the htcondor-ce-client . If the issue at hand is complicated or the communication turnaround time with the administrator is too long, it is often more expedient to grant the operator direct access to the HTCondor-CE host. Instead of direct login access, HTCondor-CE has the ability to allow a remote operator to run commands on the host as an unprivileged user. This requires permission to submit resource requests as well as an HTCondor-CE that is configured to run local universe jobs: $ condor_config_val -name htcondor-ce.chtc.wisc.edu -pool htcondor-ce.chtc.wisc.edu:9619 -verbose START_LOCAL_UNIVERSE START_LOCAL_UNIVERSE = TotalLocalJobsRunning + TotalSchedulerJobsRunning < 20 # at: /etc/condor-ce/config.d/03-managed-fork.conf, line 11 # raw: START_LOCAL_UNIVERSE = TotalLocalJobsRunning + TotalSchedulerJobsRunning < 20 # default: TotalLocalJobsRunning < 200 After verifying that you can submit resource requests and that the HTCondor-CE supports local universe, use condor_ce_run to run commands on the remote HTCondor-CE host: $ condor_ce_run -lr htcondor-ce.chtc.wisc.edu /bin/sh -c 'condor_q -all' -- Schedd: htcondor-ce.chtc.wisc.edu : <128.104.100.65:9618?... @ 10/29/20 17:42:27 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE HOLD TOTAL JOB_IDS nu_lhcb ID: 530251 7/13 22:20 _ _ _ _ 1 530251.0 nu_lhcb ID: 586158 10/28 05:15 _ _ _ 1 1 586158.0 nu_lhcb ID: 586213 10/28 06:23 _ 1 _ _ 1 586213.0 nu_lhcb ID: 586228 10/28 06:23 _ 1 _ _ 1 586228.0 nu_lhcb ID: 586254 10/28 06:23 _ 1 _ _ 1 586254.0 nu_lhcb ID: 586289 10/28 07:58 _ _ _ 1 1 586289.0","title":"Advanced Troubleshooting"},{"location":"v23/troubleshooting/remote-troubleshooting/#getting-help","text":"If you have any questions or issues about troubleshooting remote HTCondor-CEs, please contact us for assistance.","title":"Getting Help"},{"location":"v6/operation/","text":"Operating an HTCondor-CE \u00b6 To verify that you have a working installation of HTCondor-CE, ensure that all the relevant services are started and enabled then perform the validation steps below. Managing HTCondor-CE services \u00b6 In addition to the HTCondor-CE job gateway service itself, there are a number of supporting services in your installation. The specific services are: Software Service name Your batch system condor or pbs_server or \u2026 HTCondor-CE condor-ce (Optional) APEL uploader condor-ce-apel and condor-ce-apel.timer Start and enable the services in the order listed and stop them in reverse order. As a reminder, here are common service commands (all run as root ): To... On EL7, run the command... Start a service systemctl start Stop a service systemctl stop Enable a service to start on boot systemctl enable Disable a service from starting on boot systemctl disable Validating HTCondor-CE \u00b6 To validate an HTCondor-CE, perform the following steps: Verify that local job submissions complete successfully from the CE host. For example, if you have a Slurm cluster, run sbatch from the CE and verify that it runs and completes with scontrol and sacct . Verify that all the necessary daemons are running with condor_ce_status -any . Verify the CE's network configuration using condor_ce_host_network_check . Verify that jobs can complete successfully using condor_ce_trace . Draining an HTCondor-CE \u00b6 To drain an HTCondor-CE of jobs, perform the following steps: Set CONDORCE_MAX_JOBS = 0 in /etc/condor-ce/config.d Run condor_ce_reconfig to apply the configuration change Use condor_ce_rm as needed to stop and remove any jobs that should stop running Once draining is completed, don't forget to restore the value of CONDORCE_MAX_JOBS to its previous value before trying to operate the HTCondor-CE again. Checking User Authentication \u00b6 The authentication method for submitting jobs to an HTCondor-CE is SciTokens. To see which authentication method and identity were used to submit a particular job (or modify existing jobs), you can look in /var/log/condor-ce/AuditLog . If SciTokens authentication was used, you'll see a set of lines like this: 10/15/21 17:54:08 (cid:130) (D_AUDIT) Command=QMGMT_WRITE_CMD, peer=<172.17.0.2:37869> 10/15/21 17:54:08 (cid:130) (D_AUDIT) AuthMethod=SCITOKENS, AuthId=https://demo.scitokens.org,htcondor-ce-dev, CondorId=testuser@users.htcondor.org 10/15/21 17:54:08 (cid:130) (D_AUDIT) Submitting new job 2.0 Lines pertaining to the same client request will have the same cid value. Lines from different client requests may be interleaved. Getting Help \u00b6 If any of the above validation steps fail, consult the troubleshooting guide . If that still doesn't resolve your issue, please contact us for assistance.","title":"Operation"},{"location":"v6/operation/#operating-an-htcondor-ce","text":"To verify that you have a working installation of HTCondor-CE, ensure that all the relevant services are started and enabled then perform the validation steps below.","title":"Operating an HTCondor-CE"},{"location":"v6/operation/#managing-htcondor-ce-services","text":"In addition to the HTCondor-CE job gateway service itself, there are a number of supporting services in your installation. The specific services are: Software Service name Your batch system condor or pbs_server or \u2026 HTCondor-CE condor-ce (Optional) APEL uploader condor-ce-apel and condor-ce-apel.timer Start and enable the services in the order listed and stop them in reverse order. As a reminder, here are common service commands (all run as root ): To... On EL7, run the command... Start a service systemctl start Stop a service systemctl stop Enable a service to start on boot systemctl enable Disable a service from starting on boot systemctl disable ","title":"Managing HTCondor-CE services"},{"location":"v6/operation/#validating-htcondor-ce","text":"To validate an HTCondor-CE, perform the following steps: Verify that local job submissions complete successfully from the CE host. For example, if you have a Slurm cluster, run sbatch from the CE and verify that it runs and completes with scontrol and sacct . Verify that all the necessary daemons are running with condor_ce_status -any . Verify the CE's network configuration using condor_ce_host_network_check . Verify that jobs can complete successfully using condor_ce_trace .","title":"Validating HTCondor-CE"},{"location":"v6/operation/#draining-an-htcondor-ce","text":"To drain an HTCondor-CE of jobs, perform the following steps: Set CONDORCE_MAX_JOBS = 0 in /etc/condor-ce/config.d Run condor_ce_reconfig to apply the configuration change Use condor_ce_rm as needed to stop and remove any jobs that should stop running Once draining is completed, don't forget to restore the value of CONDORCE_MAX_JOBS to its previous value before trying to operate the HTCondor-CE again.","title":"Draining an HTCondor-CE"},{"location":"v6/operation/#checking-user-authentication","text":"The authentication method for submitting jobs to an HTCondor-CE is SciTokens. To see which authentication method and identity were used to submit a particular job (or modify existing jobs), you can look in /var/log/condor-ce/AuditLog . If SciTokens authentication was used, you'll see a set of lines like this: 10/15/21 17:54:08 (cid:130) (D_AUDIT) Command=QMGMT_WRITE_CMD, peer=<172.17.0.2:37869> 10/15/21 17:54:08 (cid:130) (D_AUDIT) AuthMethod=SCITOKENS, AuthId=https://demo.scitokens.org,htcondor-ce-dev, CondorId=testuser@users.htcondor.org 10/15/21 17:54:08 (cid:130) (D_AUDIT) Submitting new job 2.0 Lines pertaining to the same client request will have the same cid value. Lines from different client requests may be interleaved.","title":"Checking User Authentication"},{"location":"v6/operation/#getting-help","text":"If any of the above validation steps fail, consult the troubleshooting guide . If that still doesn't resolve your issue, please contact us for assistance.","title":"Getting Help"},{"location":"v6/reference/","text":"Reference \u00b6 Configuration \u00b6 The following directories contain the configuration for HTCondor-CE. The directories are parsed in the order presented and thus configuration within the final directory will override configuration specified in the previous directories. Location Comment /usr/share/condor-ce/config.d/ Configuration defaults (overwritten on package updates) /etc/condor-ce/config.d/ Files in this directory are parsed in alphanumeric order (i.e., 99-local.conf will override values in 01-ce-auth.conf ) For a detailed order of the way configuration files are parsed, run the following command: user@host $ condor_ce_config_val -config Users \u00b6 The following users are needed by HTCondor-CE at all sites: User Comment condor The HTCondor-CE will be run as root, but perform most of its operations as the condor user. Certificates \u00b6 File User that owns certificate Path to certificate Host certificate root /etc/grid-security/hostcert.pem Host key root /grid-security/hostkey.pem Networking \u00b6 Service Name Protocol Port Number Inbound Outbound Comment Htcondor-CE tcp 9619 X HTCondor-CE shared port Allow inbound and outbound network connection to all internal site servers, such as the batch system head-node only ephemeral outgoing ports are necessary.","title":"Reference"},{"location":"v6/reference/#reference","text":"","title":"Reference"},{"location":"v6/reference/#configuration","text":"The following directories contain the configuration for HTCondor-CE. The directories are parsed in the order presented and thus configuration within the final directory will override configuration specified in the previous directories. Location Comment /usr/share/condor-ce/config.d/ Configuration defaults (overwritten on package updates) /etc/condor-ce/config.d/ Files in this directory are parsed in alphanumeric order (i.e., 99-local.conf will override values in 01-ce-auth.conf ) For a detailed order of the way configuration files are parsed, run the following command: user@host $ condor_ce_config_val -config","title":"Configuration"},{"location":"v6/reference/#users","text":"The following users are needed by HTCondor-CE at all sites: User Comment condor The HTCondor-CE will be run as root, but perform most of its operations as the condor user.","title":"Users"},{"location":"v6/reference/#certificates","text":"File User that owns certificate Path to certificate Host certificate root /etc/grid-security/hostcert.pem Host key root /grid-security/hostkey.pem","title":"Certificates"},{"location":"v6/reference/#networking","text":"Service Name Protocol Port Number Inbound Outbound Comment Htcondor-CE tcp 9619 X HTCondor-CE shared port Allow inbound and outbound network connection to all internal site servers, such as the batch system head-node only ephemeral outgoing ports are necessary.","title":"Networking"},{"location":"v6/releases/","text":"Releases \u00b6 HTCondor-CE 6 is distributed via RPM and are available from the following Yum repositories: HTCondor LTS and Feature Releases The OSG Consortium Known Issues \u00b6 Known bugs affecting HTCondor-CEs can be found in Jira Updating to HTCondor-CE 6 \u00b6 Finding relevant configuration changes When updating HTCondor-CE RPMs, .rpmnew and .rpmsave files may be created containing new defaults that you should merge or new defaults that have replaced your customzations, respectively. To find these files for HTCondor-CE, run the following command: root@host # find /etc/condor-ce/ -name '*.rpmnew' -name '*.rpmsave' HTCondor-CE 6 is a major release that aligns its security model with HTCondor 9.0's improved security model . As such, upgrades from older versions of HTCondor-CE may require manual intervention. HTCondor-CE 6 Version History \u00b6 This section contains release notes for each version of HTCondor-CE 6. Full HTCondor-CE version history can be found on GitHub . 6.0.1 \u00b6 This release includes the following new features: Add grid CA and host certificate/key locations to default SSL search paths Verifies that HTCondor-CE can access the local HTCondor's SPOOL directory Can use condor_ce_trace without SciToken to test batch system integration condor_ce_upgrade_check checks compatibility with HTCondor 23.0 Adds deprecation warnings for old job router configuration syntax 6.0.0 \u00b6 This release includes the following new features: Align HTCondor-CE security configuration with HTCondor defaults Add example configuration on how to ban users Add condor_ce_transform_ads command Improve essential directory checking and creation at startup Getting Help \u00b6 If you have any questions about the release process or run into issues with an upgrade, please contact us for assistance.","title":"Releases"},{"location":"v6/releases/#releases","text":"HTCondor-CE 6 is distributed via RPM and are available from the following Yum repositories: HTCondor LTS and Feature Releases The OSG Consortium","title":"Releases"},{"location":"v6/releases/#known-issues","text":"Known bugs affecting HTCondor-CEs can be found in Jira","title":"Known Issues"},{"location":"v6/releases/#updating-to-htcondor-ce-6","text":"Finding relevant configuration changes When updating HTCondor-CE RPMs, .rpmnew and .rpmsave files may be created containing new defaults that you should merge or new defaults that have replaced your customzations, respectively. To find these files for HTCondor-CE, run the following command: root@host # find /etc/condor-ce/ -name '*.rpmnew' -name '*.rpmsave' HTCondor-CE 6 is a major release that aligns its security model with HTCondor 9.0's improved security model . As such, upgrades from older versions of HTCondor-CE may require manual intervention.","title":"Updating to HTCondor-CE 6"},{"location":"v6/releases/#htcondor-ce-6-version-history","text":"This section contains release notes for each version of HTCondor-CE 6. Full HTCondor-CE version history can be found on GitHub .","title":"HTCondor-CE 6 Version History"},{"location":"v6/releases/#601","text":"This release includes the following new features: Add grid CA and host certificate/key locations to default SSL search paths Verifies that HTCondor-CE can access the local HTCondor's SPOOL directory Can use condor_ce_trace without SciToken to test batch system integration condor_ce_upgrade_check checks compatibility with HTCondor 23.0 Adds deprecation warnings for old job router configuration syntax","title":"6.0.1"},{"location":"v6/releases/#600","text":"This release includes the following new features: Align HTCondor-CE security configuration with HTCondor defaults Add example configuration on how to ban users Add condor_ce_transform_ads command Improve essential directory checking and creation at startup","title":"6.0.0"},{"location":"v6/releases/#getting-help","text":"If you have any questions about the release process or run into issues with an upgrade, please contact us for assistance.","title":"Getting Help"},{"location":"v6/remote-job-submission/","text":"Submitting Jobs Remotely to an HTCondor-CE \u00b6 This document outlines how to submit jobs to an HTCondor-CE from a remote client using two different methods: With dedicated tools for quickly verifying end-to-end job submission, and From an existing HTCondor submit host, useful for developing pilot submission infrastructure If you are the administrator of an HTCondor-CE, consider verifying your HTCondor-CE using the administrator-focused documentation . Before Starting \u00b6 Before attempting to submit jobs to an HTCondor-CE as documented below, ensure the following: The HTCondor-CE administrator has independently verified their HTCondor-CE The HTCondor-CE administrator has added your credential information (e.g. SciToken or grid proxy) to the HTCondor-CE authentication configuration Your credentials are valid and unexpired Submission with Debugging Tools \u00b6 The HTCondor-CE client contains debugging tools designed to quickly test an HTCondor-CE. To use these tools, install the RPM package from the relevant Yum repository : root@host # yum install htcondor-ce-client Verify end-to-end submission \u00b6 The HTCondor-CE client package includes a debugging tool that perform tests of end-to-end job submission called condor_ce_trace . To submit a diagnostic job with condor_ce_trace , run the following command: user@host $ condor_ce_trace --debug Replacing with the hostname of the CE you wish to test. On success, you will see Job status: Completed and the job's environment on the worker node where it ran. If you do not see the expected output, refer to the troubleshooting guide . CONDOR_CE_TRACE_ATTEMPTS For a busy site cluster, it may take longer than the default 5 minutes to test end-to-end submission. To extend the length of time that condor_ce_trace waits for the job to complete, prepend the command with _condor_CONDOR_CE_TRACE_ATTEMPTS= . (Optional) Requesting resources \u00b6 condor_ce_trace doesn't make any specific resource requests so its jobs are only given the default resources as configured by the HTCondor-CE you are debugging. To request specific resources (or other job attributes), you can specify the --attribute option on the command line: user@host $ condor_ce_trace --debug \\ --attribute='+resource1=value1'... \\ --attribute='+resourceN=valueN' \\ ce.htcondor.org For example, the following command submits a test job requesting 4 cores, 4 GB of RAM, a wall clock time of 2 hours, and the 'osg' queue, run the following command: user@host $ condor_ce_trace --debug \\ --attribute='+xcount=4' \\ --attribute='+maxMemory=4000' \\ --attribute='+maxWallTime=120' \\ --attribute='+remote_queue=osg' \\ ce.htcondor.org For a list of other attributes that can be set with the --attribute option, consult the submit file commands section. Note Non-HTCondor batch systems may need additional HTCondor-CE configuration to support these job attributes. See the batch system integration for details on how to support them. Submission with HTCondor Submit \u00b6 If you need to submit more complicated jobs than a trace job as described above (e.g. for developing piilot job submission infrastructures) and have access to an HTCondor submit host, you can use standard HTCondor submission tools. Submit the job \u00b6 To submit jobs to a remote HTCondor-CE (or any other externally facing HTCondor SchedD) from an HTCondor submit host, you need to construct an HTCondor submit file describing an HTCondor-C job : Write a submit file, ce_test.sub : # Required for remote HTCondor-CE submission universe = grid use_x509userproxy = true grid_resource = condor ce.htcondor.org ce.htcondor.org:9619 # Files executable = ce_test.sh output = ce_test.out error = ce_test.err log = ce_test.log # File transfer behavior ShouldTransferFiles = YES WhenToTransferOutput = ON_EXIT # Optional resource requests #+xcount = 4 # Request 4 cores #+maxMemory = 4000 # Request 4GB of RAM #+maxWallTime = 120 # Request 2 hrs of wall clock time #+remote_queue = \"osg\" # Request the OSG queue # Submit a single job queue Replacing ce_test.sh with the path to the executable you wish to run and ce.htcondor.org with the hostname of the CE you wish to test. Note The grid_resource line should start with condor and is not related to which batch system you are using. Submit the job: user@host $ condor_submit ce_test.sub Tracking job progress \u00b6 You can track job progress by by querying the local queue: user@host $ condor__q As well as the remote HTCondor-CE queue: user@host $ condor__q -name -pool :9619 Replacing with the FQDN of the HTCondor-CE. For reference, condor_q -help status will provide details of job status codes. user@host $ condor_q -help status | tail JobStatus codes: 1 I IDLE 2 R RUNNING 3 X REMOVED 4 C COMPLETED 5 H HELD 6 > TRANSFERRING_OUTPUT 7 S SUSPENDED Troubleshooting \u00b6 All interactions between condor_submit and the HTCondor-CE will be recorded in the file specified by the log command in your submit file. This includes acknowledgement of the job in your local queue, connection to the CE, and a record of job completion: 000 (786.000.000) 12/09 16:49:55 Job submitted from host: <131.225.154.68:53134> ... 027 (786.000.000) 12/09 16:50:09 Job submitted to grid resource GridResource: condor ce.htcondor.org ce.htcondor.org:9619 GridJobId: condor ce.htcondor.org ce.htcondor.org:9619 796.0 ... 005 (786.000.000) 12/09 16:52:19 Job terminated. (1) Normal termination (return value 0) Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage 0 - Run Bytes Sent By Job 0 - Run Bytes Received By Job 0 - Total Bytes Sent By Job 0 - Total Bytes Received By Job If there are issues contacting the HTCondor-CE, you will see error messages about a Down Globus Resource : 020 (788.000.000) 12/09 16:56:17 Detected Down Globus Resource RM-Contact: ce.htcondor.org ... 026 (788.000.000) 12/09 16:56:17 Detected Down Grid Resource GridResource: condor ce.htcondor.org ce.htcondor.org:9619 This indicates a communication issue with the HTCondor-CE that may be diagnosed with condor_ce_ping . Submit File Commands \u00b6 The following table is a reference of commands that are commonly included in HTCondor submit files used for HTCondor-CE resource allocation requests. A more comprehensive list of submit file commands specific to HTCondor can be found in the HTCondor manual . HTCondor string values If you are setting an attribute to a string value, make sure enclose the string in double-quotes ( \" ) Command Description arguments Arguments that will be provided to the executable for the resource allocation request. error Path to the file on the client host that stores stderr from the resource allocation request. executable Path to the file on the client host that the resource allocation request will execute. input Path to the file on the client host that stores input to be piped into the stdin of the resource allocation request. +maxMemory The amount of memory in MB that you wish to allocate to the resource allocation request. +maxWallTime The maximum walltime (in minutes) the resource allocation request is allowed to run before it is removed. output Path to the file on the client host that stores stdout from the resource allocation request. +remote_queue Assign resource allocation request to the target queue in the scheduler. transfer_input_files A comma-delimited list of all the files and directories to be transferred into the working directory for the resource allocation request, before the resource allocation request is started. transfer_output_files A comma-delimited list of all the files and directories to be transferred back to the client, after the resource allocation request completes. +WantWholeNode When set to True , request entire node for the resource allocation request (HTCondor batch systems only) +xcount The number of cores to allocate for the resource allocation request. Getting Help \u00b6 If you have any questions or issues with job submission, please contact us for assistance.","title":"Submit Jobs Remotely"},{"location":"v6/remote-job-submission/#submitting-jobs-remotely-to-an-htcondor-ce","text":"This document outlines how to submit jobs to an HTCondor-CE from a remote client using two different methods: With dedicated tools for quickly verifying end-to-end job submission, and From an existing HTCondor submit host, useful for developing pilot submission infrastructure If you are the administrator of an HTCondor-CE, consider verifying your HTCondor-CE using the administrator-focused documentation .","title":"Submitting Jobs Remotely to an HTCondor-CE"},{"location":"v6/remote-job-submission/#before-starting","text":"Before attempting to submit jobs to an HTCondor-CE as documented below, ensure the following: The HTCondor-CE administrator has independently verified their HTCondor-CE The HTCondor-CE administrator has added your credential information (e.g. SciToken or grid proxy) to the HTCondor-CE authentication configuration Your credentials are valid and unexpired","title":"Before Starting"},{"location":"v6/remote-job-submission/#submission-with-debugging-tools","text":"The HTCondor-CE client contains debugging tools designed to quickly test an HTCondor-CE. To use these tools, install the RPM package from the relevant Yum repository : root@host # yum install htcondor-ce-client","title":"Submission with Debugging Tools"},{"location":"v6/remote-job-submission/#verify-end-to-end-submission","text":"The HTCondor-CE client package includes a debugging tool that perform tests of end-to-end job submission called condor_ce_trace . To submit a diagnostic job with condor_ce_trace , run the following command: user@host $ condor_ce_trace --debug Replacing with the hostname of the CE you wish to test. On success, you will see Job status: Completed and the job's environment on the worker node where it ran. If you do not see the expected output, refer to the troubleshooting guide . CONDOR_CE_TRACE_ATTEMPTS For a busy site cluster, it may take longer than the default 5 minutes to test end-to-end submission. To extend the length of time that condor_ce_trace waits for the job to complete, prepend the command with _condor_CONDOR_CE_TRACE_ATTEMPTS= .","title":"Verify end-to-end submission"},{"location":"v6/remote-job-submission/#optional-requesting-resources","text":"condor_ce_trace doesn't make any specific resource requests so its jobs are only given the default resources as configured by the HTCondor-CE you are debugging. To request specific resources (or other job attributes), you can specify the --attribute option on the command line: user@host $ condor_ce_trace --debug \\ --attribute='+resource1=value1'... \\ --attribute='+resourceN=valueN' \\ ce.htcondor.org For example, the following command submits a test job requesting 4 cores, 4 GB of RAM, a wall clock time of 2 hours, and the 'osg' queue, run the following command: user@host $ condor_ce_trace --debug \\ --attribute='+xcount=4' \\ --attribute='+maxMemory=4000' \\ --attribute='+maxWallTime=120' \\ --attribute='+remote_queue=osg' \\ ce.htcondor.org For a list of other attributes that can be set with the --attribute option, consult the submit file commands section. Note Non-HTCondor batch systems may need additional HTCondor-CE configuration to support these job attributes. See the batch system integration for details on how to support them.","title":"(Optional) Requesting resources"},{"location":"v6/remote-job-submission/#submission-with-htcondor-submit","text":"If you need to submit more complicated jobs than a trace job as described above (e.g. for developing piilot job submission infrastructures) and have access to an HTCondor submit host, you can use standard HTCondor submission tools.","title":"Submission with HTCondor Submit"},{"location":"v6/remote-job-submission/#submit-the-job","text":"To submit jobs to a remote HTCondor-CE (or any other externally facing HTCondor SchedD) from an HTCondor submit host, you need to construct an HTCondor submit file describing an HTCondor-C job : Write a submit file, ce_test.sub : # Required for remote HTCondor-CE submission universe = grid use_x509userproxy = true grid_resource = condor ce.htcondor.org ce.htcondor.org:9619 # Files executable = ce_test.sh output = ce_test.out error = ce_test.err log = ce_test.log # File transfer behavior ShouldTransferFiles = YES WhenToTransferOutput = ON_EXIT # Optional resource requests #+xcount = 4 # Request 4 cores #+maxMemory = 4000 # Request 4GB of RAM #+maxWallTime = 120 # Request 2 hrs of wall clock time #+remote_queue = \"osg\" # Request the OSG queue # Submit a single job queue Replacing ce_test.sh with the path to the executable you wish to run and ce.htcondor.org with the hostname of the CE you wish to test. Note The grid_resource line should start with condor and is not related to which batch system you are using. Submit the job: user@host $ condor_submit ce_test.sub","title":"Submit the job"},{"location":"v6/remote-job-submission/#tracking-job-progress","text":"You can track job progress by by querying the local queue: user@host $ condor__q As well as the remote HTCondor-CE queue: user@host $ condor__q -name -pool :9619 Replacing with the FQDN of the HTCondor-CE. For reference, condor_q -help status will provide details of job status codes. user@host $ condor_q -help status | tail JobStatus codes: 1 I IDLE 2 R RUNNING 3 X REMOVED 4 C COMPLETED 5 H HELD 6 > TRANSFERRING_OUTPUT 7 S SUSPENDED","title":"Tracking job progress"},{"location":"v6/remote-job-submission/#troubleshooting","text":"All interactions between condor_submit and the HTCondor-CE will be recorded in the file specified by the log command in your submit file. This includes acknowledgement of the job in your local queue, connection to the CE, and a record of job completion: 000 (786.000.000) 12/09 16:49:55 Job submitted from host: <131.225.154.68:53134> ... 027 (786.000.000) 12/09 16:50:09 Job submitted to grid resource GridResource: condor ce.htcondor.org ce.htcondor.org:9619 GridJobId: condor ce.htcondor.org ce.htcondor.org:9619 796.0 ... 005 (786.000.000) 12/09 16:52:19 Job terminated. (1) Normal termination (return value 0) Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage 0 - Run Bytes Sent By Job 0 - Run Bytes Received By Job 0 - Total Bytes Sent By Job 0 - Total Bytes Received By Job If there are issues contacting the HTCondor-CE, you will see error messages about a Down Globus Resource : 020 (788.000.000) 12/09 16:56:17 Detected Down Globus Resource RM-Contact: ce.htcondor.org ... 026 (788.000.000) 12/09 16:56:17 Detected Down Grid Resource GridResource: condor ce.htcondor.org ce.htcondor.org:9619 This indicates a communication issue with the HTCondor-CE that may be diagnosed with condor_ce_ping .","title":"Troubleshooting"},{"location":"v6/remote-job-submission/#submit-file-commands","text":"The following table is a reference of commands that are commonly included in HTCondor submit files used for HTCondor-CE resource allocation requests. A more comprehensive list of submit file commands specific to HTCondor can be found in the HTCondor manual . HTCondor string values If you are setting an attribute to a string value, make sure enclose the string in double-quotes ( \" ) Command Description arguments Arguments that will be provided to the executable for the resource allocation request. error Path to the file on the client host that stores stderr from the resource allocation request. executable Path to the file on the client host that the resource allocation request will execute. input Path to the file on the client host that stores input to be piped into the stdin of the resource allocation request. +maxMemory The amount of memory in MB that you wish to allocate to the resource allocation request. +maxWallTime The maximum walltime (in minutes) the resource allocation request is allowed to run before it is removed. output Path to the file on the client host that stores stdout from the resource allocation request. +remote_queue Assign resource allocation request to the target queue in the scheduler. transfer_input_files A comma-delimited list of all the files and directories to be transferred into the working directory for the resource allocation request, before the resource allocation request is started. transfer_output_files A comma-delimited list of all the files and directories to be transferred back to the client, after the resource allocation request completes. +WantWholeNode When set to True , request entire node for the resource allocation request (HTCondor batch systems only) +xcount The number of cores to allocate for the resource allocation request.","title":"Submit File Commands"},{"location":"v6/remote-job-submission/#getting-help","text":"If you have any questions or issues with job submission, please contact us for assistance.","title":"Getting Help"},{"location":"v6/configuration/authentication/","text":"Configuring Authentication \u00b6 To authenticate job submission from external users and VOs, the HTCondor-CE service uses X.509 certificates for SciTokens and SSL authentication. Built-in Mapfiles \u00b6 HTCondor-CE uses unified HTCondor mapfiles stored in /etc/condor-ce/mapfiles.d/*.conf to map incoming jobs with credentials to local Unix accounts. These files are parsed in lexicographic order and HTCondor-CE will use the first line that matches for the authentication method that the client and your HTCondor-CE negotiates. Each mapfile line consists of three fields: HTCondor authentication method Incoming credential principal formatted as a Perl Compatible Regular Expression (PCRE) Local account Applying mapping changes When changing your HTCondor-CE mappings, run condor_ce_reconfig to apply your changes. SciTokens \u00b6 To allow clients with SciToken or WLCG tokens to submit jobs to your HTCondor-CE, add lines of the following format: SCITOKENS /,/ Replacing (escaping any / with \\/ , , and with the token issuer ( iss ), token subject ( sub ), and the unix account under which the job should run, respectively. For example, to map any token from the OSG VO regardless of the token sub , add the following line to a *.conf file in /etc/condor-ce/mapfiles.d/ : SCITOKENS /^https:\\/\\/scitokens.org\\/osg-connect,.*/ osg Configuring Certificates \u00b6 HTCondor-CE uses X.509 host certificates and certificate authorities (CAs) when authenticating SciToken and SSL connections. By default, HTCondor-CE uses the default system locations to locate CAs and host certificate when authenticating SciToken and SSL connections. But traditionally, CEs and their clients have authenticated with each other using specialized grid certificates (e.g. certificates issued by IGTF CAs ) located in /etc/grid-security/ . Choose one of the following options to configure your HTCondor-CE to use grid or system certificates for authentication: If your SSL or SciTokens clients will be interacting with your CE using grid certificates or you are using a grid certificate as your host certificate: Set the following configuration in /etc/condor-ce/config.d/01-ce-auth.conf : AUTH_SSL_SERVER_CERTFILE = /etc/grid-security/hostcert.pem AUTH_SSL_SERVER_KEYFILE = /etc/grid-security/hostkey.pem AUTH_SSL_SERVER_CADIR = /etc/grid-security/certificates AUTH_SSL_SERVER_CAFILE = AUTH_SSL_CLIENT_CERTFILE = /etc/grid-security/hostcert.pem AUTH_SSL_CLIENT_KEYFILE = /etc/grid-security/hostkey.pem AUTH_SSL_CLIENT_CADIR = /etc/grid-security/certificates AUTH_SSL_CLIENT_CAFILE = Install your host certificate and key into /etc/grid-security/hostcert.pem and /etc/grid-security/hostkey.pem , respectively Set the ownership and Unix permissions of the host certificate and key root@host # chown root:root /etc/grid-security/hostcert.pem /etc/grid-security/hostkey.pem root@host # chmod 644 /etc/grid-security/hostcert.pem root@host # chmod 600 /etc/grid-security/hostkey.pem Otherwise, use the default system locations: Install your host certificate and key into /etc/pki/tls/certs/localhost.crt and /etc/pki/tls/private/localhost.key , respectively Set the ownership and Unix permissions of the host certificate and key root@host # chown root:root /etc/pki/tls/certs/localhost.crt /etc/pki/tls/private/localhost.key root@host # chmod 644 /etc/pki/tls/certs/localhost.crt root@host # chmod 600 /etc/pki/tls/private/localhost.key Next Steps \u00b6 At this point, you should have an HTCondor-CE that will take credentials from incoming jobs and map them to local Unix accounts. The next step is to configure the CE for your local batch system so that HTCondor-CE knows where to route your jobs.","title":"Authentication"},{"location":"v6/configuration/authentication/#configuring-authentication","text":"To authenticate job submission from external users and VOs, the HTCondor-CE service uses X.509 certificates for SciTokens and SSL authentication.","title":"Configuring Authentication"},{"location":"v6/configuration/authentication/#built-in-mapfiles","text":"HTCondor-CE uses unified HTCondor mapfiles stored in /etc/condor-ce/mapfiles.d/*.conf to map incoming jobs with credentials to local Unix accounts. These files are parsed in lexicographic order and HTCondor-CE will use the first line that matches for the authentication method that the client and your HTCondor-CE negotiates. Each mapfile line consists of three fields: HTCondor authentication method Incoming credential principal formatted as a Perl Compatible Regular Expression (PCRE) Local account Applying mapping changes When changing your HTCondor-CE mappings, run condor_ce_reconfig to apply your changes.","title":"Built-in Mapfiles"},{"location":"v6/configuration/authentication/#scitokens","text":"To allow clients with SciToken or WLCG tokens to submit jobs to your HTCondor-CE, add lines of the following format: SCITOKENS /,/ Replacing (escaping any / with \\/ , , and with the token issuer ( iss ), token subject ( sub ), and the unix account under which the job should run, respectively. For example, to map any token from the OSG VO regardless of the token sub , add the following line to a *.conf file in /etc/condor-ce/mapfiles.d/ : SCITOKENS /^https:\\/\\/scitokens.org\\/osg-connect,.*/ osg","title":"SciTokens"},{"location":"v6/configuration/authentication/#configuring-certificates","text":"HTCondor-CE uses X.509 host certificates and certificate authorities (CAs) when authenticating SciToken and SSL connections. By default, HTCondor-CE uses the default system locations to locate CAs and host certificate when authenticating SciToken and SSL connections. But traditionally, CEs and their clients have authenticated with each other using specialized grid certificates (e.g. certificates issued by IGTF CAs ) located in /etc/grid-security/ . Choose one of the following options to configure your HTCondor-CE to use grid or system certificates for authentication: If your SSL or SciTokens clients will be interacting with your CE using grid certificates or you are using a grid certificate as your host certificate: Set the following configuration in /etc/condor-ce/config.d/01-ce-auth.conf : AUTH_SSL_SERVER_CERTFILE = /etc/grid-security/hostcert.pem AUTH_SSL_SERVER_KEYFILE = /etc/grid-security/hostkey.pem AUTH_SSL_SERVER_CADIR = /etc/grid-security/certificates AUTH_SSL_SERVER_CAFILE = AUTH_SSL_CLIENT_CERTFILE = /etc/grid-security/hostcert.pem AUTH_SSL_CLIENT_KEYFILE = /etc/grid-security/hostkey.pem AUTH_SSL_CLIENT_CADIR = /etc/grid-security/certificates AUTH_SSL_CLIENT_CAFILE = Install your host certificate and key into /etc/grid-security/hostcert.pem and /etc/grid-security/hostkey.pem , respectively Set the ownership and Unix permissions of the host certificate and key root@host # chown root:root /etc/grid-security/hostcert.pem /etc/grid-security/hostkey.pem root@host # chmod 644 /etc/grid-security/hostcert.pem root@host # chmod 600 /etc/grid-security/hostkey.pem Otherwise, use the default system locations: Install your host certificate and key into /etc/pki/tls/certs/localhost.crt and /etc/pki/tls/private/localhost.key , respectively Set the ownership and Unix permissions of the host certificate and key root@host # chown root:root /etc/pki/tls/certs/localhost.crt /etc/pki/tls/private/localhost.key root@host # chmod 644 /etc/pki/tls/certs/localhost.crt root@host # chmod 600 /etc/pki/tls/private/localhost.key","title":"Configuring Certificates"},{"location":"v6/configuration/authentication/#next-steps","text":"At this point, you should have an HTCondor-CE that will take credentials from incoming jobs and map them to local Unix accounts. The next step is to configure the CE for your local batch system so that HTCondor-CE knows where to route your jobs.","title":"Next Steps"},{"location":"v6/configuration/htcondor-routes/","text":"For HTCondor Batch Systems \u00b6 This page contains information about job routes that can be used if you are running an HTCondor pool at your site. Setting periodic hold or release \u00b6 Avoid setting PERIODIC_REMOVE expressions The HTCondor Job Router will automatically resubmit jobs that are removed by the underlying batch system, which can result in unintended churn. Therefore, it is recommended to append removal expressions to HTCondor-CE's configuration by adding the following to a file in /etc/condor-ce/config.d/ SYSTEM_PERIODIC_REMOVE = $(SYSTEM_PERIODIC_REMOVE) || To release or put routed jobs on hold if they meet certain criteria, use the Periodic* family of attributes. By default, periodic expressions are evaluated once every 300 seconds but this can be changed by setting PERIODIC_EXPR_INTERVAL in your local HTCondor configuration. In this example, we set the routed job on hold if the job is idle and has been started at least once or if the job has tried to start more than once. This will catch jobs which are starting and stopping multiple times. ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA # Puts the routed job on hold if the job's been idle and has been started at least # once or if the job has tried to start more than once SET PeriodicHold ((NumJobStarts >= 1 && JobStatus == 1) || NumJobStarts > 1) # Release routed jobs if the condor_starter couldn't start the executable and # 'VMGAHP_ERR_INTERNAL' is in the HoldReason SET PeriodicRelease = (HoldReasonCode == 6 && regexp(\"VMGAHP_ERR_INTERNAL\", HoldReason)) @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; # Puts the routed job on hold if the job's been idle and has been started at least # once or if the job has tried to start more than once set_PeriodicHold = (NumJobStarts >= 1 && JobStatus == 1) || NumJobStarts > 1; # Release routed jobs if the condor_starter couldn't start the executable and # 'VMGAHP_ERR_INTERNAL' is in the HoldReason set_PeriodicRelease = HoldReasonCode == 6 && regexp(\"VMGAHP_ERR_INTERNAL\", HoldReason); ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool Setting routed job requirements \u00b6 If you need to set requirements on your routed job, you will need to use SET REQUIREMENTS or set_Requirements instead of Requirements for ClassAd transform and deprecated syntaxes, respectively. The Requirements attribute filters jobs coming into your CE into different job routes whereas the set function will set conditions on the routed job that must be met by the worker node it lands on. For more information on requirements, consult the HTCondor manual . To ensure that your job lands on a Linux machine in your pool: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @jrt UNIVERSE VANILLA SET Requirements = (TARGET.OpSys == \"LINUX\") @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; set_Requirements = (TARGET.OpSys == \"LINUX\"); ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool Preserving original job requirements \u00b6 To preserve and include the original job requirements, rather than just setting new requirements, you can use COPY Requirements or copy_Requirements to store the current value of Requirements to another variable, which we'll call original_requirements . To do this, replace the above SET Requirements or set_Requirements lines with: ClassAd Transform SET Requirements = ($(MY.Requirements)) && () Deprecated Syntax copy_Requirements = \"original_requirements\"; set_Requirements = original_requirements && ...; Setting the accounting group based on the credential of the submitted job \u00b6 A common need in the CE is to want to set the accounting identity of the routed job using information from the credential of the submitter of the job. This originally was done using information from the x509 certificate, in particular X509UserProxyVOName and x509UserProxySubject . With the switch to SCITOKENs, the equivalent job attributes are AuthTokenIssuer and AuthTokenSubject . It is important to understand that the condor_schedd treats AuthTokenSubject and AuthTokenIssuer as secure attributes. The values of these attributes cannot be supplied by the condor_job_router directly, they will be set based on what credential the condor_job_router uses to submit the routed job. Because of this the value of these attributes in the routed job is almost never the same as the value in the original job. This is different from the way the x509* job attributes behaved. Because of this, the default CE config will copy all attributes that match AuthToken* to orig_AuthToken* before the route transforms are applied. Example of setting the accounting group from AuthToken or x509 attributes. ClassAd Transform JOB_ROUTER_CLASSAD_USER_MAP_NAMES = $(JOB_ROUTER_CLASSAD_USER_MAP_NAMES) AcctGroupMap CLASSAD_USER_MAPFILE_AcctGroupMap = JOB_ROUTER_TRANSFORM_SetAcctGroup @=end REQUIREMENTS (orig_AuthTokenSubject ?: x509UserProxySubject) isnt undefined EVALSET AcctGroup UserMap(\"AcctGroupMap\", orig_AuthTokenSubject ?: x509UserProxySubject, AcctGroup) EVALSET AccountingGroup join(\".\", AcctGroup, Owner) @end JOB_ROUTER_PRE_ROUTE_TRANSFORMS = $(JOB_ROUTER_PRE_ROUTE_TRANSFORMS) SetAcctGroup Refer to the HTCondor documentation for information on mapfiles . Getting Help \u00b6 If you have any questions or issues with configuring job routes, please contact us for assistance.","title":"For HTCondor Batch Systems"},{"location":"v6/configuration/htcondor-routes/#for-htcondor-batch-systems","text":"This page contains information about job routes that can be used if you are running an HTCondor pool at your site.","title":"For HTCondor Batch Systems"},{"location":"v6/configuration/htcondor-routes/#setting-periodic-hold-or-release","text":"Avoid setting PERIODIC_REMOVE expressions The HTCondor Job Router will automatically resubmit jobs that are removed by the underlying batch system, which can result in unintended churn. Therefore, it is recommended to append removal expressions to HTCondor-CE's configuration by adding the following to a file in /etc/condor-ce/config.d/ SYSTEM_PERIODIC_REMOVE = $(SYSTEM_PERIODIC_REMOVE) || To release or put routed jobs on hold if they meet certain criteria, use the Periodic* family of attributes. By default, periodic expressions are evaluated once every 300 seconds but this can be changed by setting PERIODIC_EXPR_INTERVAL in your local HTCondor configuration. In this example, we set the routed job on hold if the job is idle and has been started at least once or if the job has tried to start more than once. This will catch jobs which are starting and stopping multiple times. ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA # Puts the routed job on hold if the job's been idle and has been started at least # once or if the job has tried to start more than once SET PeriodicHold ((NumJobStarts >= 1 && JobStatus == 1) || NumJobStarts > 1) # Release routed jobs if the condor_starter couldn't start the executable and # 'VMGAHP_ERR_INTERNAL' is in the HoldReason SET PeriodicRelease = (HoldReasonCode == 6 && regexp(\"VMGAHP_ERR_INTERNAL\", HoldReason)) @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; # Puts the routed job on hold if the job's been idle and has been started at least # once or if the job has tried to start more than once set_PeriodicHold = (NumJobStarts >= 1 && JobStatus == 1) || NumJobStarts > 1; # Release routed jobs if the condor_starter couldn't start the executable and # 'VMGAHP_ERR_INTERNAL' is in the HoldReason set_PeriodicRelease = HoldReasonCode == 6 && regexp(\"VMGAHP_ERR_INTERNAL\", HoldReason); ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool","title":"Setting periodic hold or release"},{"location":"v6/configuration/htcondor-routes/#setting-routed-job-requirements","text":"If you need to set requirements on your routed job, you will need to use SET REQUIREMENTS or set_Requirements instead of Requirements for ClassAd transform and deprecated syntaxes, respectively. The Requirements attribute filters jobs coming into your CE into different job routes whereas the set function will set conditions on the routed job that must be met by the worker node it lands on. For more information on requirements, consult the HTCondor manual . To ensure that your job lands on a Linux machine in your pool: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @jrt UNIVERSE VANILLA SET Requirements = (TARGET.OpSys == \"LINUX\") @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; set_Requirements = (TARGET.OpSys == \"LINUX\"); ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool","title":"Setting routed job requirements"},{"location":"v6/configuration/htcondor-routes/#preserving-original-job-requirements","text":"To preserve and include the original job requirements, rather than just setting new requirements, you can use COPY Requirements or copy_Requirements to store the current value of Requirements to another variable, which we'll call original_requirements . To do this, replace the above SET Requirements or set_Requirements lines with: ClassAd Transform SET Requirements = ($(MY.Requirements)) && () Deprecated Syntax copy_Requirements = \"original_requirements\"; set_Requirements = original_requirements && ...;","title":"Preserving original job requirements"},{"location":"v6/configuration/htcondor-routes/#setting-the-accounting-group-based-on-the-credential-of-the-submitted-job","text":"A common need in the CE is to want to set the accounting identity of the routed job using information from the credential of the submitter of the job. This originally was done using information from the x509 certificate, in particular X509UserProxyVOName and x509UserProxySubject . With the switch to SCITOKENs, the equivalent job attributes are AuthTokenIssuer and AuthTokenSubject . It is important to understand that the condor_schedd treats AuthTokenSubject and AuthTokenIssuer as secure attributes. The values of these attributes cannot be supplied by the condor_job_router directly, they will be set based on what credential the condor_job_router uses to submit the routed job. Because of this the value of these attributes in the routed job is almost never the same as the value in the original job. This is different from the way the x509* job attributes behaved. Because of this, the default CE config will copy all attributes that match AuthToken* to orig_AuthToken* before the route transforms are applied. Example of setting the accounting group from AuthToken or x509 attributes. ClassAd Transform JOB_ROUTER_CLASSAD_USER_MAP_NAMES = $(JOB_ROUTER_CLASSAD_USER_MAP_NAMES) AcctGroupMap CLASSAD_USER_MAPFILE_AcctGroupMap = JOB_ROUTER_TRANSFORM_SetAcctGroup @=end REQUIREMENTS (orig_AuthTokenSubject ?: x509UserProxySubject) isnt undefined EVALSET AcctGroup UserMap(\"AcctGroupMap\", orig_AuthTokenSubject ?: x509UserProxySubject, AcctGroup) EVALSET AccountingGroup join(\".\", AcctGroup, Owner) @end JOB_ROUTER_PRE_ROUTE_TRANSFORMS = $(JOB_ROUTER_PRE_ROUTE_TRANSFORMS) SetAcctGroup Refer to the HTCondor documentation for information on mapfiles .","title":"Setting the accounting group based on the credential of the submitted job"},{"location":"v6/configuration/htcondor-routes/#getting-help","text":"If you have any questions or issues with configuring job routes, please contact us for assistance.","title":"Getting Help"},{"location":"v6/configuration/job-router-overview/","text":"Job Router Configuration Overview \u00b6 The HTCondor Job Router is at the heart of HTCondor-CE and allows admins to transform and direct jobs to specific batch systems. Customizations are made in the form of job routes where each route corresponds to a separate job transformation: If an incoming job matches a job route's requirements, the route creates a transformed job (referred to as the 'routed job') that is then submitted to the batch system. The CE package comes with default routes located in /etc/condor-ce/config.d/02-ce-*.conf that provide enough basic functionality for a small site. If you have needs beyond delegating all incoming jobs to your batch system as they are, this document provides an overview of how to configure your HTCondor-CE Job Router Definitions Incoming Job : A job which was submitted to HTCondor-CE from an external source. Routed Job : A job that has been transformed by the Job Router. Route Syntaxes \u00b6 HTCondor-CE 5 introduces the ability to write job routes using ClassAd transform syntax in addition to the existing configuration syntax . The old route configuration syntax continues to be the default in HTCondor-CE 5 but there are benefits to transitioning to the new syntax as outlined below . ClassAd transforms \u00b6 The HTCondor ClassAd transforms were originally introduced to HTCondor to perform in-place transformations of user jobs submitted to an HTCondor pool. In the HTCondor 8.9 series, the Job Router was updated to support transforms and HTCondor-CE 5 adds the configuration necessary to support routes written as ClassAd transforms. If configured to use trasnform-based routes, HTCondor-CE routes and transforms jobs that by chaining ClassAd transforms in the following order: Each transform in JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES whose requirements are met by the job The first transform from JOB_ROUTER_ROUTE_NAMES whose requirements are met by the job. See the section on route matching below. Each transform in JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES whose requirements are met by the job Deprecated syntax \u00b6 Planned Removal of Deprecated Syntax JOB_ROUTER_DEFAULTS , JOB_ROUTER_ENTRIES , JOB_ROUTER_ENTRIES_CMD , and JOB_ROUTER_ENTRIES_FILE are deprecated and will be removed for V24 of the HTCondor Software Suite. New configuration syntax for the job router is defined using JOB_ROUTER_ROUTE_NAMES and JOB_ROUTER_ROUTE_[name] . For new syntax example vist: HTCondor Documentation - Job Router Note: The removal will occur during the lifetime of the HTCondor V23 feature series. Since the inception of HTCondor-CE, job routes have been written as a list of ClassAds . Each job route\u2019s ClassAd is constructed by combining each entry from the JOB_ROUTER_ENTRIES with the JOB_ROUTER_DEFAULTS : JOB_ROUTER_ENTRIES is a configuration variable whose default is set in /etc/condor-ce/config.d/02-ce-*.conf but may be overriden by the administrator in subsequent files in /etc/condor-ce/config.d/ . JOB_ROUTER_DEFAULTS is a generated configuration variable that sets default job route values that are required for HTCondor-CE's functionality. To view its contents in a readable format, run the following command: user@host $ condor_ce_config_val JOB_ROUTER_DEFAULTS | sed 's/;/;\\n/g' Take care when modifying attributes in JOB_ROUTER_DEFAULTS : you may add new attributes and override attributes that are set_* in JOB_ROUTER_DEFAULTS . The following may break your HTCondor-CE Do not set the JOB_ROUTER_DEFAULTS configuration variable yourself. This will cause the CE to stop functioning. If a value is set in JOB_ROUTER_DEFAULTS with eval_set_ , override it by using eval_set_ in the JOB_ROUTER_ENTRIES . Do this at your own risk as it may cause the CE to break. Choosing a syntax \u00b6 For existing HTCondor-CEs, it's recommended that administrators continue to use the deprecated syntax (the default) and transition to ClassAd transforms at their own pace. For new HTCondor-CEs, it's recommended that administrators start with ClassAd transforms. The ClassAd transform syntax provides many benefits including: Statements being evaluated in the order they are written Use of variables that are not included in the resultant job ad Use simple case-like logic Additionally, it is now easier to include job transformations that should be evaluated before or after your routes by including transforms in the lists of JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES and JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES , respectively. How Jobs Match to Routes \u00b6 The Job Router considers incoming jobs in the HTCondor-CE SchedD (i.e., jobs visible in condor_ce_q ) that meet the following constraints: The job has not already been considered by the Job Router The job's universe is standard or vanilla If the incoming job meets the above constraints, then the job is matched to the first route in JOB_ROUTER_ROUTE_NAMES whose requirements are satisfied by the job's ClassAd. Additionally: If you are using the ClassAd transform syntax , transforms in JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES and JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES may also have their own requirements that determine whether or not that transform is applied. If you are using the deprecated syntax , you may configure the Job Router to evenly distribute jobs across all matching routes (i.e., round-robin matching). To do so, add the following configuration to a file in /etc/condor-ce/config.d/ : JOB_ROUTER_ROUND_ROBIN_SELECTION = True Getting Help \u00b6 If you have any questions or issues with configuring job routes, please contact us for assistance.","title":"Overview"},{"location":"v6/configuration/job-router-overview/#job-router-configuration-overview","text":"The HTCondor Job Router is at the heart of HTCondor-CE and allows admins to transform and direct jobs to specific batch systems. Customizations are made in the form of job routes where each route corresponds to a separate job transformation: If an incoming job matches a job route's requirements, the route creates a transformed job (referred to as the 'routed job') that is then submitted to the batch system. The CE package comes with default routes located in /etc/condor-ce/config.d/02-ce-*.conf that provide enough basic functionality for a small site. If you have needs beyond delegating all incoming jobs to your batch system as they are, this document provides an overview of how to configure your HTCondor-CE Job Router Definitions Incoming Job : A job which was submitted to HTCondor-CE from an external source. Routed Job : A job that has been transformed by the Job Router.","title":"Job Router Configuration Overview"},{"location":"v6/configuration/job-router-overview/#route-syntaxes","text":"HTCondor-CE 5 introduces the ability to write job routes using ClassAd transform syntax in addition to the existing configuration syntax . The old route configuration syntax continues to be the default in HTCondor-CE 5 but there are benefits to transitioning to the new syntax as outlined below .","title":"Route Syntaxes"},{"location":"v6/configuration/job-router-overview/#classad-transforms","text":"The HTCondor ClassAd transforms were originally introduced to HTCondor to perform in-place transformations of user jobs submitted to an HTCondor pool. In the HTCondor 8.9 series, the Job Router was updated to support transforms and HTCondor-CE 5 adds the configuration necessary to support routes written as ClassAd transforms. If configured to use trasnform-based routes, HTCondor-CE routes and transforms jobs that by chaining ClassAd transforms in the following order: Each transform in JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES whose requirements are met by the job The first transform from JOB_ROUTER_ROUTE_NAMES whose requirements are met by the job. See the section on route matching below. Each transform in JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES whose requirements are met by the job","title":"ClassAd transforms"},{"location":"v6/configuration/job-router-overview/#deprecated-syntax","text":"Planned Removal of Deprecated Syntax JOB_ROUTER_DEFAULTS , JOB_ROUTER_ENTRIES , JOB_ROUTER_ENTRIES_CMD , and JOB_ROUTER_ENTRIES_FILE are deprecated and will be removed for V24 of the HTCondor Software Suite. New configuration syntax for the job router is defined using JOB_ROUTER_ROUTE_NAMES and JOB_ROUTER_ROUTE_[name] . For new syntax example vist: HTCondor Documentation - Job Router Note: The removal will occur during the lifetime of the HTCondor V23 feature series. Since the inception of HTCondor-CE, job routes have been written as a list of ClassAds . Each job route\u2019s ClassAd is constructed by combining each entry from the JOB_ROUTER_ENTRIES with the JOB_ROUTER_DEFAULTS : JOB_ROUTER_ENTRIES is a configuration variable whose default is set in /etc/condor-ce/config.d/02-ce-*.conf but may be overriden by the administrator in subsequent files in /etc/condor-ce/config.d/ . JOB_ROUTER_DEFAULTS is a generated configuration variable that sets default job route values that are required for HTCondor-CE's functionality. To view its contents in a readable format, run the following command: user@host $ condor_ce_config_val JOB_ROUTER_DEFAULTS | sed 's/;/;\\n/g' Take care when modifying attributes in JOB_ROUTER_DEFAULTS : you may add new attributes and override attributes that are set_* in JOB_ROUTER_DEFAULTS . The following may break your HTCondor-CE Do not set the JOB_ROUTER_DEFAULTS configuration variable yourself. This will cause the CE to stop functioning. If a value is set in JOB_ROUTER_DEFAULTS with eval_set_ , override it by using eval_set_ in the JOB_ROUTER_ENTRIES . Do this at your own risk as it may cause the CE to break.","title":"Deprecated syntax"},{"location":"v6/configuration/job-router-overview/#choosing-a-syntax","text":"For existing HTCondor-CEs, it's recommended that administrators continue to use the deprecated syntax (the default) and transition to ClassAd transforms at their own pace. For new HTCondor-CEs, it's recommended that administrators start with ClassAd transforms. The ClassAd transform syntax provides many benefits including: Statements being evaluated in the order they are written Use of variables that are not included in the resultant job ad Use simple case-like logic Additionally, it is now easier to include job transformations that should be evaluated before or after your routes by including transforms in the lists of JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES and JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES , respectively.","title":"Choosing a syntax"},{"location":"v6/configuration/job-router-overview/#how-jobs-match-to-routes","text":"The Job Router considers incoming jobs in the HTCondor-CE SchedD (i.e., jobs visible in condor_ce_q ) that meet the following constraints: The job has not already been considered by the Job Router The job's universe is standard or vanilla If the incoming job meets the above constraints, then the job is matched to the first route in JOB_ROUTER_ROUTE_NAMES whose requirements are satisfied by the job's ClassAd. Additionally: If you are using the ClassAd transform syntax , transforms in JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES and JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES may also have their own requirements that determine whether or not that transform is applied. If you are using the deprecated syntax , you may configure the Job Router to evenly distribute jobs across all matching routes (i.e., round-robin matching). To do so, add the following configuration to a file in /etc/condor-ce/config.d/ : JOB_ROUTER_ROUND_ROBIN_SELECTION = True","title":"How Jobs Match to Routes"},{"location":"v6/configuration/job-router-overview/#getting-help","text":"If you have any questions or issues with configuring job routes, please contact us for assistance.","title":"Getting Help"},{"location":"v6/configuration/local-batch-system/","text":"Configuring for the Local Batch System \u00b6 Before HTCondor-CE can submit jobs to your local batch system, it has to be configured to do so. The configuration will differ depending on if your local batch system is HTCondor or one of the other supported batch systems. Choose the section corresponding to your batch system below. HTCondor Batch Systems \u00b6 To configure HTCondor-CE for an HTCondor batch system, set JOB_ROUTER_SCHEDD2_POOL to your site's central manager host and port: JOB_ROUTER_SCHEDD2_POOL = cm.chtc.wisc.edu:9618 Additionally, set JOB_ROUTER_SCHEDD2_SPOOL to the location of the local batch SPOOL directory on the CE host if it is different than the default location ( /var/lib/condor/spool ). Non-HTCondor Batch Systems \u00b6 Configuring the BLAHP \u00b6 HTCondor-CE uses the Batch Language ASCII Helper Protocol (BLAHP) to submit and track jobs to non-HTCondor batch systems. If your batch system tools are installed in a non-standard location (i.e., outside of /usr/bin/ ), set the corresponding *_binpath variable in /etc/blah.config to the directory containing your batch system tools: If your batch system is... Then change the following configuration variable... LSF lsf_binpath PBS/Torque pbs_binpath SGE sge_binpath Slurm slurm_binpath For example, if your Slurm binaries (e.g. sbatch ) exist in /opt/slurm/bin , you would set the following: slurm_binpath=/opt/slurm/bin/ Sharing the SPOOL directory \u00b6 Non-HTCondor batch systems require a shared file system configuration to support file transfer from the HTCondor-CE to your site's worker nodes. The current recommendation is to run a dedicated NFS server on the CE host . In this setup, HTCondor-CE writes to the local spool directory, the NFS server shares the directory, and each worker node mounts the directory in the same location as on the CE. For example, if your spool directory is /var/lib/condor-ce (the default), you must mount the shared directory to /var/lib/condor-ce on the worker nodes. Note If you choose not to host the NFS server on your CE, you will need to turn off root squash so that the HTCondor-CE daemons can write to the spool directory. You can control the value of the spool directory by setting SPOOL in /etc/condor-ce/config.d/99-local.conf (create this file if it doesn't exist). For example, the following sets the SPOOL directory to /home/condor : SPOOL = /home/condor Note The shared spool directory must be readable and writeable by the condor user for HTCondor-CE to function correctly.","title":"Local Batch System"},{"location":"v6/configuration/local-batch-system/#configuring-for-the-local-batch-system","text":"Before HTCondor-CE can submit jobs to your local batch system, it has to be configured to do so. The configuration will differ depending on if your local batch system is HTCondor or one of the other supported batch systems. Choose the section corresponding to your batch system below.","title":"Configuring for the Local Batch System"},{"location":"v6/configuration/local-batch-system/#htcondor-batch-systems","text":"To configure HTCondor-CE for an HTCondor batch system, set JOB_ROUTER_SCHEDD2_POOL to your site's central manager host and port: JOB_ROUTER_SCHEDD2_POOL = cm.chtc.wisc.edu:9618 Additionally, set JOB_ROUTER_SCHEDD2_SPOOL to the location of the local batch SPOOL directory on the CE host if it is different than the default location ( /var/lib/condor/spool ).","title":"HTCondor Batch Systems"},{"location":"v6/configuration/local-batch-system/#non-htcondor-batch-systems","text":"","title":"Non-HTCondor Batch Systems"},{"location":"v6/configuration/local-batch-system/#configuring-the-blahp","text":"HTCondor-CE uses the Batch Language ASCII Helper Protocol (BLAHP) to submit and track jobs to non-HTCondor batch systems. If your batch system tools are installed in a non-standard location (i.e., outside of /usr/bin/ ), set the corresponding *_binpath variable in /etc/blah.config to the directory containing your batch system tools: If your batch system is... Then change the following configuration variable... LSF lsf_binpath PBS/Torque pbs_binpath SGE sge_binpath Slurm slurm_binpath For example, if your Slurm binaries (e.g. sbatch ) exist in /opt/slurm/bin , you would set the following: slurm_binpath=/opt/slurm/bin/","title":"Configuring the BLAHP"},{"location":"v6/configuration/local-batch-system/#sharing-the-spool-directory","text":"Non-HTCondor batch systems require a shared file system configuration to support file transfer from the HTCondor-CE to your site's worker nodes. The current recommendation is to run a dedicated NFS server on the CE host . In this setup, HTCondor-CE writes to the local spool directory, the NFS server shares the directory, and each worker node mounts the directory in the same location as on the CE. For example, if your spool directory is /var/lib/condor-ce (the default), you must mount the shared directory to /var/lib/condor-ce on the worker nodes. Note If you choose not to host the NFS server on your CE, you will need to turn off root squash so that the HTCondor-CE daemons can write to the spool directory. You can control the value of the spool directory by setting SPOOL in /etc/condor-ce/config.d/99-local.conf (create this file if it doesn't exist). For example, the following sets the SPOOL directory to /home/condor : SPOOL = /home/condor Note The shared spool directory must be readable and writeable by the condor user for HTCondor-CE to function correctly.","title":"Sharing the SPOOL directory"},{"location":"v6/configuration/non-htcondor-routes/","text":"For Non-HTCondor Batch Systems \u00b6 This page contains information about job routes that can be used if you are running a non-HTCondor pool at your site. Setting a default batch queue \u00b6 To set a default queue for routed jobs, set the variable or attribute default_queue for the ClassAd transform and deprecated syntax, respectively: ClassAd Transform JOB_ROUTER_ROUTE_Slurm_Cluster @=jrt GridResource = \"batch slurm\" default_queue = osg_queue @jrt JOB_ROUTER_ROUTE_NAMES = Slurm_Cluster Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ GridResource = \"batch slurm\"; name = \"Slurm_Cluster\"; set_default_queue = \"osg_queue\"; ] @jre JOB_ROUTER_ROUTE_NAMES = Slurm_Cluster Setting batch system directives \u00b6 To write batch system directives that are not supported in the route examples above, you will need to edit the job submit script for your local batch system in /etc/blahp/ (e.g., if your local batch system is Slurm, edit /etc/blahp/slurm_local_submit_attributes.sh ). This file is sourced during submit time and anything printed to stdout is appended to the generated batch system job submit script. ClassAd attributes can be passed from the routed job to the local submit attributes script via default_CERequirements attribute, which takes a comma-separated list of other attributes: ClassAd Transform SET foo = \"X\" SET bar = \"Y\" SET default_CERequirements = \"foo,bar\" Deprecated Syntax set_foo = \"X\"; set_bar = \"Y\"; set_default_CERequirements = \"foo,bar\"; This sets foo to the string X and bar to the string Y in the environment of the local submit attributes script. The following example sets the maximum walltime to 1 hour and the accounting group to the x509UserProxyFirstFQAN attribute of the job submitted to a PBS batch system: ClassAd Transform JOB_ROUTER_ROUTE_Slurm_Cluster @=jrt GridResource = \"batch slurm\" SET Walltime = 3600 SET AccountingGroup = x509UserProxyFirstFQAN SET default_CERequirements = \"WallTime,AccountingGroup\" @jrt JOB_ROUTER_ROUTE_NAMES = Slurm_Cluster Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ GridResource = \"batch slurm\"; name = \"Slurm_Cluster\"; set_Walltime = 3600; set_AccountingGroup = x509UserProxyFirstFQAN; set_default_CERequirements = \"WallTime,AccountingGroup\"; ] @jre JOB_ROUTER_ROUTE_NAMES = Slurm_Cluster With /etc/blahp/pbs_local_submit_attributes.sh containing: #!/bin/bash echo \"#PBS -l walltime=$Walltime\" echo \"#PBS -A $AccountingGroup\" This results in the following being appended to the script that gets submitted to your batch system: #PBS -l walltime=3600 #PBS -A Getting Help \u00b6 If you have any questions or issues with configuring job routes, please contact us for assistance.","title":"For Non-HTCondor Batch Systems"},{"location":"v6/configuration/non-htcondor-routes/#for-non-htcondor-batch-systems","text":"This page contains information about job routes that can be used if you are running a non-HTCondor pool at your site.","title":"For Non-HTCondor Batch Systems"},{"location":"v6/configuration/non-htcondor-routes/#setting-a-default-batch-queue","text":"To set a default queue for routed jobs, set the variable or attribute default_queue for the ClassAd transform and deprecated syntax, respectively: ClassAd Transform JOB_ROUTER_ROUTE_Slurm_Cluster @=jrt GridResource = \"batch slurm\" default_queue = osg_queue @jrt JOB_ROUTER_ROUTE_NAMES = Slurm_Cluster Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ GridResource = \"batch slurm\"; name = \"Slurm_Cluster\"; set_default_queue = \"osg_queue\"; ] @jre JOB_ROUTER_ROUTE_NAMES = Slurm_Cluster","title":"Setting a default batch queue"},{"location":"v6/configuration/non-htcondor-routes/#setting-batch-system-directives","text":"To write batch system directives that are not supported in the route examples above, you will need to edit the job submit script for your local batch system in /etc/blahp/ (e.g., if your local batch system is Slurm, edit /etc/blahp/slurm_local_submit_attributes.sh ). This file is sourced during submit time and anything printed to stdout is appended to the generated batch system job submit script. ClassAd attributes can be passed from the routed job to the local submit attributes script via default_CERequirements attribute, which takes a comma-separated list of other attributes: ClassAd Transform SET foo = \"X\" SET bar = \"Y\" SET default_CERequirements = \"foo,bar\" Deprecated Syntax set_foo = \"X\"; set_bar = \"Y\"; set_default_CERequirements = \"foo,bar\"; This sets foo to the string X and bar to the string Y in the environment of the local submit attributes script. The following example sets the maximum walltime to 1 hour and the accounting group to the x509UserProxyFirstFQAN attribute of the job submitted to a PBS batch system: ClassAd Transform JOB_ROUTER_ROUTE_Slurm_Cluster @=jrt GridResource = \"batch slurm\" SET Walltime = 3600 SET AccountingGroup = x509UserProxyFirstFQAN SET default_CERequirements = \"WallTime,AccountingGroup\" @jrt JOB_ROUTER_ROUTE_NAMES = Slurm_Cluster Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ GridResource = \"batch slurm\"; name = \"Slurm_Cluster\"; set_Walltime = 3600; set_AccountingGroup = x509UserProxyFirstFQAN; set_default_CERequirements = \"WallTime,AccountingGroup\"; ] @jre JOB_ROUTER_ROUTE_NAMES = Slurm_Cluster With /etc/blahp/pbs_local_submit_attributes.sh containing: #!/bin/bash echo \"#PBS -l walltime=$Walltime\" echo \"#PBS -A $AccountingGroup\" This results in the following being appended to the script that gets submitted to your batch system: #PBS -l walltime=3600 #PBS -A ","title":"Setting batch system directives"},{"location":"v6/configuration/non-htcondor-routes/#getting-help","text":"If you have any questions or issues with configuring job routes, please contact us for assistance.","title":"Getting Help"},{"location":"v6/configuration/optional-configuration/","text":"Optional Configuration \u00b6 The following configuration steps are optional and will not be required for all sites. If you do not need any of the following special configurations, skip to the page for verifying your HTCondor-CE . Configuring for Multiple Network Interfaces \u00b6 If you have multiple network interfaces with different hostnames, the HTCondor-CE daemons need to know which hostname and interface to use when communicating with each other. Set NETWORK_HOSTNAME and NETWORK_INTERFACE to the hostname and IP address of your public interface, respectively, in /etc/condor-ce/config.d/99-local.conf directory with the line: NETWORK_HOSTNAME = condorce.example.com NETWORK_INTERFACE = 127.0.0.1 Replacing condorce.example.com text with your public interface\u2019s hostname and 127.0.0.1 with your public interface\u2019s IP address. Limiting or Disabling Locally Running Jobs \u00b6 If you want to limit or disable jobs running locally on your CE, you will need to configure HTCondor-CE's local and scheduler universes. Local and scheduler universes allow jobs to be run on the CE itself, mainly for remote troubleshooting. Pilot jobs will not run as local/scheduler universe jobs so leaving them enabled does NOT turn your CE into another worker node. The two universes are effectively the same (scheduler universe launches a starter process for each job), so we will be configuring them in unison. To change the default limit on the number of locally run jobs (the current default is 20), add the following to /etc/condor-ce/config.d/99-local.conf : START_LOCAL_UNIVERSE = TotalLocalJobsRunning + TotalSchedulerJobsRunning < START_SCHEDULER_UNIVERSE = $(START_LOCAL_UNIVERSE) Where is the maximum number of jobs allowed to run locally To only allow a specific user to start locally run jobs, add the following to /etc/condor-ce/config.d/99-local.conf : START_LOCAL_UNIVERSE = target.Owner =?= \"\" START_SCHEDULER_UNIVERSE = $(START_LOCAL_UNIVERSE) Change for the username allowed to run jobs locally To disable locally run jobs, add the following to /etc/condor-ce/config.d/99-local.conf : START_LOCAL_UNIVERSE = False START_SCHEDULER_UNIVERSE = $(START_LOCAL_UNIVERSE) Inserting IDTOKENs into the routed job's sandbox \u00b6 If you want to insert IDTOKENS into the routed job's sandbox you can use the SendIDTokens route command, or the JOB_ROUTER_SEND_ROUTE_IDTOKENS global configuration variable. Tokens sent using this mechanism must be named and declared using the JOB_ROUTER_CREATE_IDTOKEN_NAMES and JOB_ROUTER_CREATE_IDTOKEN_ configuration variables. Tokens whose names are declared in the JOB_ROUTER_SEND_ROUTE_IDTOKENS configuration variable are sent by default for each route that does not have a SendIDTokens command. To declare IDTOKENS for inclusion in glide-in jobs for the purpose of advertising to a collector add something like the following to /etc/condor-ce/config.d/99-local-ce-token.conf : JOB_ROUTER_CREATE_IDTOKEN_NAMES = name1 name2 JOB_ROUTER_CREATE_IDTOKEN_name1 @=end sub = \"name1@users.htcondor.org\" kid = \"POOL\" lifetime = 3900 scope = \"ADVERTISE_STARTD, ADVERTISE_MASTER, READ\" dir = \"/etc/condor-ce/gltokens/name1\" filename = \"ce_name1.idtoken\" owner = \"owner1\" @end JOB_ROUTER_CREATE_IDTOKEN_Name2 @=end sub = \"name2@users.htcondor.org\" kid = \"POOL\" lifetime = 3900 scope = \"ADVERTISE_STARTD, ADVERTISE_MASTER, READ\" dir = \"/etc/condor-ce/gltokens/name2\" filename = \"ce_name2.idtoken\" owner = \"owner2\" @end To insert one of the above IDTOKENS in the sandbox of a routed job , include the token name in the SendIDTokens route command like this. SendIDTokens = \"Name2\" Route commands SendIDTokens is a route command, not a job attribute. This means that you will not be able to manipulate it through transform verbs such as EVALSET . To add an IDTOKEN to a routed job in addition to the default tokens , build a string containing the token name along with the value of the global configuration variable like this. SendIDTokens = \"Name2 $(JOB_ROUTER_SEND_ROUTE_IDTOKENS)\" You can use an attribute of the source job to choose the IDTOKEN by writing an expression like this. SendIDTokens = strcat( My.Owner, \" $(JOB_ROUTER_SEND_ROUTE_IDTOKENS)\") It is presumed that the value of My.Owner above is the same as the of an IDTOKEN and as the owner field of that token. For instance, the Fermilab CE config uses the above SendIDTokens expression and the following token declarations at the time of this guide. JOB_ROUTER_CREATE_IDTOKEN_NAMES = fermilab3 osg JOB_ROUTER_CREATE_IDTOKEN_fermilab3 @=end sub = \"fermilabpilot@fnal.gov\" kid = \"POOL\" lifetime = 3900 scope = \"ADVERTISE_STARTD, ADVERTISE_MASTER, READ\" dir = \"/etc/condor-ce/gltokens/fermilab\" filename = \"ce_fermilab3.idtoken\" owner = \"fermilab\" @end JOB_ROUTER_CREATE_IDTOKEN_osg @=end sub = \"osgpilot@fnal.gov\" kid = \"POOL\" lifetime = 600 scope = \"ADVERTISE_STARTD, ADVERTISE_MASTER, READ\" dir = \"/etc/condor-ce/gltokens/fermilab\" filename = \"ce_osg.idtoken\" owner = \"osg\" @end Enabling the Monitoring Web Interface \u00b6 The HTCondor-CE View is an optional web interface to the status of your CE. To run the HTCondor-CE View, install the appropriate package and set the relevant configuration. Begin by installing the htcondor-ce-view package: root@host # yum install htcondor-ce-view Restart the condor-ce service Verify the service by entering your CE's hostname into your web browser The website is served on port 80 by default. To change this default, edit the value of HTCONDORCE_VIEW_PORT in /etc/condor-ce/config.d/05-ce-view.conf . Uploading Accounting Records to APEL \u00b6 Batch System Support HTCondor-CE only supports generation of APEL accounting records for HTCondor batch systems. For sites outside of the OSG that need to upload the APEL accounting records, HTCondor-CE supports uploading batch and blah APEL records for HTCondor batch systems. Please refer to EGI's HTCondor-CE Accounting Documentation . Enabling BDII Integration \u00b6 Batch System Support HTCondor-CE only supports reporting BDII information for HTCondor batch systems. HTCondor-CE supports reporting BDII information for all HTCondor-CE endpoints and batch information for an HTCondor batch system. To make this information available, perform the following instructions on your site BDII host. Install the HTCondor-CE BDII package: root@host # yum install htcondor-ce-bdii Configure HTCondor ( /etc/condor/config.d/ ) on your site BDII host to point to your central manager: CONDOR_HOST = Replacing with the hostname of your HTCondor central manager Configure BDII static information by modifying /etc/condor/config.d/99-ce-bdii.conf Additionally, install the HTCondor-CE BDII package on each of your HTCondor-CE hosts: root@host # yum install htcondor-ce-bdii","title":"Optional Configuration"},{"location":"v6/configuration/optional-configuration/#optional-configuration","text":"The following configuration steps are optional and will not be required for all sites. If you do not need any of the following special configurations, skip to the page for verifying your HTCondor-CE .","title":"Optional Configuration"},{"location":"v6/configuration/optional-configuration/#configuring-for-multiple-network-interfaces","text":"If you have multiple network interfaces with different hostnames, the HTCondor-CE daemons need to know which hostname and interface to use when communicating with each other. Set NETWORK_HOSTNAME and NETWORK_INTERFACE to the hostname and IP address of your public interface, respectively, in /etc/condor-ce/config.d/99-local.conf directory with the line: NETWORK_HOSTNAME = condorce.example.com NETWORK_INTERFACE = 127.0.0.1 Replacing condorce.example.com text with your public interface\u2019s hostname and 127.0.0.1 with your public interface\u2019s IP address.","title":"Configuring for Multiple Network Interfaces"},{"location":"v6/configuration/optional-configuration/#limiting-or-disabling-locally-running-jobs","text":"If you want to limit or disable jobs running locally on your CE, you will need to configure HTCondor-CE's local and scheduler universes. Local and scheduler universes allow jobs to be run on the CE itself, mainly for remote troubleshooting. Pilot jobs will not run as local/scheduler universe jobs so leaving them enabled does NOT turn your CE into another worker node. The two universes are effectively the same (scheduler universe launches a starter process for each job), so we will be configuring them in unison. To change the default limit on the number of locally run jobs (the current default is 20), add the following to /etc/condor-ce/config.d/99-local.conf : START_LOCAL_UNIVERSE = TotalLocalJobsRunning + TotalSchedulerJobsRunning < START_SCHEDULER_UNIVERSE = $(START_LOCAL_UNIVERSE) Where is the maximum number of jobs allowed to run locally To only allow a specific user to start locally run jobs, add the following to /etc/condor-ce/config.d/99-local.conf : START_LOCAL_UNIVERSE = target.Owner =?= \"\" START_SCHEDULER_UNIVERSE = $(START_LOCAL_UNIVERSE) Change for the username allowed to run jobs locally To disable locally run jobs, add the following to /etc/condor-ce/config.d/99-local.conf : START_LOCAL_UNIVERSE = False START_SCHEDULER_UNIVERSE = $(START_LOCAL_UNIVERSE)","title":"Limiting or Disabling Locally Running Jobs"},{"location":"v6/configuration/optional-configuration/#inserting-idtokens-into-the-routed-jobs-sandbox","text":"If you want to insert IDTOKENS into the routed job's sandbox you can use the SendIDTokens route command, or the JOB_ROUTER_SEND_ROUTE_IDTOKENS global configuration variable. Tokens sent using this mechanism must be named and declared using the JOB_ROUTER_CREATE_IDTOKEN_NAMES and JOB_ROUTER_CREATE_IDTOKEN_ configuration variables. Tokens whose names are declared in the JOB_ROUTER_SEND_ROUTE_IDTOKENS configuration variable are sent by default for each route that does not have a SendIDTokens command. To declare IDTOKENS for inclusion in glide-in jobs for the purpose of advertising to a collector add something like the following to /etc/condor-ce/config.d/99-local-ce-token.conf : JOB_ROUTER_CREATE_IDTOKEN_NAMES = name1 name2 JOB_ROUTER_CREATE_IDTOKEN_name1 @=end sub = \"name1@users.htcondor.org\" kid = \"POOL\" lifetime = 3900 scope = \"ADVERTISE_STARTD, ADVERTISE_MASTER, READ\" dir = \"/etc/condor-ce/gltokens/name1\" filename = \"ce_name1.idtoken\" owner = \"owner1\" @end JOB_ROUTER_CREATE_IDTOKEN_Name2 @=end sub = \"name2@users.htcondor.org\" kid = \"POOL\" lifetime = 3900 scope = \"ADVERTISE_STARTD, ADVERTISE_MASTER, READ\" dir = \"/etc/condor-ce/gltokens/name2\" filename = \"ce_name2.idtoken\" owner = \"owner2\" @end To insert one of the above IDTOKENS in the sandbox of a routed job , include the token name in the SendIDTokens route command like this. SendIDTokens = \"Name2\" Route commands SendIDTokens is a route command, not a job attribute. This means that you will not be able to manipulate it through transform verbs such as EVALSET . To add an IDTOKEN to a routed job in addition to the default tokens , build a string containing the token name along with the value of the global configuration variable like this. SendIDTokens = \"Name2 $(JOB_ROUTER_SEND_ROUTE_IDTOKENS)\" You can use an attribute of the source job to choose the IDTOKEN by writing an expression like this. SendIDTokens = strcat( My.Owner, \" $(JOB_ROUTER_SEND_ROUTE_IDTOKENS)\") It is presumed that the value of My.Owner above is the same as the of an IDTOKEN and as the owner field of that token. For instance, the Fermilab CE config uses the above SendIDTokens expression and the following token declarations at the time of this guide. JOB_ROUTER_CREATE_IDTOKEN_NAMES = fermilab3 osg JOB_ROUTER_CREATE_IDTOKEN_fermilab3 @=end sub = \"fermilabpilot@fnal.gov\" kid = \"POOL\" lifetime = 3900 scope = \"ADVERTISE_STARTD, ADVERTISE_MASTER, READ\" dir = \"/etc/condor-ce/gltokens/fermilab\" filename = \"ce_fermilab3.idtoken\" owner = \"fermilab\" @end JOB_ROUTER_CREATE_IDTOKEN_osg @=end sub = \"osgpilot@fnal.gov\" kid = \"POOL\" lifetime = 600 scope = \"ADVERTISE_STARTD, ADVERTISE_MASTER, READ\" dir = \"/etc/condor-ce/gltokens/fermilab\" filename = \"ce_osg.idtoken\" owner = \"osg\" @end","title":"Inserting IDTOKENs into the routed job's sandbox"},{"location":"v6/configuration/optional-configuration/#enabling-the-monitoring-web-interface","text":"The HTCondor-CE View is an optional web interface to the status of your CE. To run the HTCondor-CE View, install the appropriate package and set the relevant configuration. Begin by installing the htcondor-ce-view package: root@host # yum install htcondor-ce-view Restart the condor-ce service Verify the service by entering your CE's hostname into your web browser The website is served on port 80 by default. To change this default, edit the value of HTCONDORCE_VIEW_PORT in /etc/condor-ce/config.d/05-ce-view.conf .","title":"Enabling the Monitoring Web Interface"},{"location":"v6/configuration/optional-configuration/#uploading-accounting-records-to-apel","text":"Batch System Support HTCondor-CE only supports generation of APEL accounting records for HTCondor batch systems. For sites outside of the OSG that need to upload the APEL accounting records, HTCondor-CE supports uploading batch and blah APEL records for HTCondor batch systems. Please refer to EGI's HTCondor-CE Accounting Documentation .","title":"Uploading Accounting Records to APEL"},{"location":"v6/configuration/optional-configuration/#enabling-bdii-integration","text":"Batch System Support HTCondor-CE only supports reporting BDII information for HTCondor batch systems. HTCondor-CE supports reporting BDII information for all HTCondor-CE endpoints and batch information for an HTCondor batch system. To make this information available, perform the following instructions on your site BDII host. Install the HTCondor-CE BDII package: root@host # yum install htcondor-ce-bdii Configure HTCondor ( /etc/condor/config.d/ ) on your site BDII host to point to your central manager: CONDOR_HOST = Replacing with the hostname of your HTCondor central manager Configure BDII static information by modifying /etc/condor/config.d/99-ce-bdii.conf Additionally, install the HTCondor-CE BDII package on each of your HTCondor-CE hosts: root@host # yum install htcondor-ce-bdii","title":"Enabling BDII Integration"},{"location":"v6/configuration/writing-job-routes/","text":"Writing Job Routes \u00b6 This document contains documentation for HTCondor-CE Job Router configurations with equivalent examples for the ClassAd transform and deprecated syntaxes. Configuration from this page should be written to files in /etc/condor-ce/config.d/ , whose contents are parsed in lexicographic order with subsequent variables overriding earlier ones. Each example is displayed in code blocks with tabs to switch between the two syntaxes: ClassAd Transform This is an example for the ClassAd transform syntax Deprecated syntax This is an example for the deprecated syntax Syntax Differences \u00b6 Planned Removal of Deprecated Syntax JOB_ROUTER_DEFAULTS , JOB_ROUTER_ENTRIES , JOB_ROUTER_ENTRIES_CMD , and JOB_ROUTER_ENTRIES_FILE are deprecated and will be removed for V24 of the HTCondor Software Suite. New configuration syntax for the job router is defined using JOB_ROUTER_ROUTE_NAMES and JOB_ROUTER_ROUTE_[name] . For new syntax example vist: HTCondor Documentation - Job Router Note: The removal will occur during the lifetime of the HTCondor V23 feature series. In HTCondor-CE 5, the deprecated syntax continues to be the default and administrator's can move to the ClassAd transform syntax by setting the following in a file in /etc/condor-ce/config.d/ : JOB_ROUTER_USE_DEPRECATED_ROUTER_ENTRIES = False The ClassAd transform syntax provides many benefits including: Statements being evaluated in the order they are written Use of variables that are not included in the resultant job ad Use simple case-like logic Additionally, it is now easier to include job transformations that should be evaluated before or after your routes by including transforms in the lists of JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES and JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES , respectively. For examples of the ClassAd transform syntax, you can inspect default job router transforms packaged with HTCondor-CE with the following command: user@host $ condor_ce_config_val -dump JOB_ROUTER_TRANSFORM_ Differences in MY. and TARGET. \u00b6 In addition to the above, the behavior of the MY. and TARGET. ClassAd attribute prefixes has changed between the two different syntaxes: In ClassAd transform syntax, MY. always refers to the incoming job's attributes and can be referenced within $() , e.g. $(MY.Owner) refers to the mapped user of the incoming job. TARGET is only used in SET expressions to refer to attributes in the slot ad (HTCondor pools only). In the deprecated syntax, MY. refers to attributes in the job route and TARGET. refers to attributes in the incoming job ad for copy_ , delete_ , and eval_set_ functions. However, in expressions defined by set_* , MY. refers to the attributes in the incoming job ad and TARGET. refers to the attribute in the slot ad (HTCondor pools only). Required Fields \u00b6 The minimum requirements for a route are that you specify the type of batch system that jobs should be routed to and a name for each route. Default routes can be found in /usr/share/condor-ce/config.d/02-ce--defaults.conf , provided by the htcondor-ce- packages. Route name \u00b6 To identify routes, you will need to assign a name to the route, either in the name of the configuration macro (i.e., JOB_ROUTER_ROUTE_ ) for the ClassAd transform syntax or with the name attribute for the deprecated syntax: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; ] JOB_ROUTER_ROUTE_NAMES = Condor_Pool Naming restrictions Route names should only contain alphanumeric and _ characters. Routes specified by JOB_ROUTER_ROUTE_* will override routes with the same name in JOB_ROUTER_ENTRIES The name of the route will be useful in debugging since it shows up in the output of condor_ce_job_router_info ; the JobRouterLog ; in the ClassAd of the routed job, which can be viewed with condor_q and condor_history for HTCondor batch systems; and in the ClassAd of the routed job, which can be vieweed with condor_ce_q or condor_ce_history for non-HTCondor batch systems. Batch system \u00b6 Each route needs to indicate the type of batch system that jobs should be routed to. For HTCondor batch systems, the UNIVERSE command or TargetUniverse attribute needs to be set to \"VANILLA\" or 5 , respectively. For all other batch systems, the GridResource attribute needs to be set to \"batch \" (where can be one of pbs , slurm , lsf , or sge ). ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA @jrt JOB_ROUTER_ROUTE_My_Slurm @=jrt GridResource = \"batch slurm\" @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool My_Slurm Deprecated syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; ] [ GridResource = \"batch slurm\"; name = \"My_Slurm\"; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool My_Slurm Writing Multiple Routes \u00b6 If your batch system needs incoming jobs to be sorted (e.g. if different VO's need to go to separate queues), you will need to write multiple job routes where each route is a separate JOB_ROUTER_ROUTE_* macro in the ClassAd transform syntax and enclosed by square brackets in the deprecated syntax. Additionally, the route names must be added to JOB_ROUTER_ROUTE_NAMES in the order that you want their requirements statements compared to incoming jobs. The following routes takes incoming jobs that have a queue attribute set to \"prod\" and sets IsProduction = True . All other jobs will be routed with IsProduction = False . ClassAd Transform JOB_ROUTER_ROUTE_Production_Jobs @=jrt REQUIREMENTS queue == \"prod\" UNIVERSE VANILLA SET IsProduction = True @jrt JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA SET IsProduction = False @jrt JOB_ROUTER_ROUTE_NAMES = Production_Jobs Condor_Pool Deprecated syntax JOB_ROUTER_ENTRIES @=jre [ Requirements = (TARGET.queue == \"prod\"); TargetUniverse = 5; set_IsProduction = True; name = \"Production_Jobs\"; ] [ TargetUniverse = 5; set_IsProduction = False; name = \"Condor_Pool\"; ] @jre JOB_ROUTER_ROUTE_NAMES = Production_Jobs Condor_Pool Writing Comments \u00b6 To write comments you can use # to comment a line: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt # This is a comment UNIVERSE VANILLA @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; # This is a comment ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool Setting Attributes for All Routes \u00b6 ClassAd transform \u00b6 With the ClassAd transform syntax, any function from the Editing Attributes section can be applied before or after your routes are considered by appending the names of transforms specified by JOB_ROUTER_TRANSFORM_ to the lists of JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES and JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES , respectively. The following configuration sets the Periodic_Hold attribute for all routed jobs before any route transforms are applied: JOB_ROUTER_TRANSFORM_Periodic_Hold SET Periodic_Hold = (NumJobStarts >= 1 && JobStatus == 1) || NumJobStarts > 1 @jrt JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES = $(JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES) Periodic_Hold To apply the same transform after your pre-route and route transforms, append the name of the transform to JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES instead: JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES = $(JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES) Periodic_Hold Deprecated syntax \u00b6 To set an attribute that will be applied to all routes, you will need to ensure that MERGE_JOB_ROUTER_DEFAULT_ADS is set to True (check the value with condor_ce_config_val ) and use the set_ function in the JOB_ROUTER_DEFAULTS . The following configuration sets the Periodic_Hold attribute for all routes: # Use the defaults generated by the condor_ce_router_defaults script. To add # additional defaults, add additional lines of the form: # # JOB_ROUTER_DEFAULTS = $(JOB_ROUTER_DEFAULTS) [set_foo = 1;] # MERGE_JOB_ROUTER_DEFAULT_ADS=True JOB_ROUTER_DEFAULTS = $(JOB_ROUTER_DEFAULTS) [set_Periodic_Hold = (NumJobStarts >= 1 && JobStatus == 1) || NumJobStarts > 1;] Filtering Jobs Based On\u2026 \u00b6 To filter jobs, use the route's REQUIREMENTS or Requirements attribute for ClassAd transforms and deprecated syntaxes, respectively. Incoming jobs will be evaluated against the ClassAd expression set in the route's requirements and if the expression evaluates to TRUE , the route will match. More information on the syntax of ClassAd's can be found in the HTCondor manual . For an example on how incoming jobs interact with filtering in job routes, consult this document . In the deprecated syntax, you may need to specify TARGET. to refer to differentiate between job and route attributes. See this section for more details. Note If you have an HTCondor batch system, note the difference with set_requirements : Pilot job queue \u00b6 To filter jobs based on their pilot job queue attribute, your routes will need a requirements expression using the incoming job's queue attribute. The following entry routes jobs to HTCondor if the incoming job (specified by TARGET ) is an analy (Analysis) glidein: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt REQUIREMENTS queue == \"prod\" UNIVERSE VANILLA @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated syntax JOB_ROUTER_ENTRIES @=jre [ Requirements = (TARGET.queue == \"prod\"); TargetUniverse = 5; name = \"Condor_Pool\"; ] @jre JOB_ROUTER_ROUTE_NAMES = My_HTCONDOR Mapped user \u00b6 To filter jobs based on what local account the incoming job was mapped to, your routes will need a requirements expression using the incoming job's Owner attribute. The following entry routes jobs to the HTCondor batch system if the mapped user is usatlas2 : ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt REQUIREMENTS Owner == \"usatlas2\" UNIVERSE VANILLA @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated syntax JOB_ROUTER_ENTRIES @=jre [ Requirements = (TARGET.Owner == \"usatlas2\"); TargetUniverse = 5; name = \"Condor_Pool\"; ] @jre JOB_ROUTER_ROUTE_NAMES = My_HTCONDOR Alternatively, you can match based on regular expression. The following entry routes jobs to the HTCondor batch system if the mapped user begins with usatlas : ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt REQUIREMENTS regexp(\"^usatlas\", Owner) UNIVERSE VANILLA @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated syntax JOB_ROUTER_ENTRIES @=jre [ Requirements = regexp(\"^usatlas\", TARGET.Owner); TargetUniverse = 5; name = \"Condor_Pool\"; ] @jre JOB_ROUTER_ROUTE_NAMES = My_HTCONDOR VOMS attribute \u00b6 To filter jobs based on the subject of the job's proxy, your routes will need a requirements expression using the incoming job's x509UserProxyFirstFQAN attribute. The following entry routes jobs to the HTCondor batch system if the proxy subject contains /cms/Role=Pilot : ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt REQUIREMENTS regexp(\"\\/cms\\/Role\\=pilot\", x509UserProxyFirstFQAN) UNIVERSE VANILLA @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated syntax JOB_ROUTER_ENTRIES @=jre [ Requirements = regexp(\"\\/cms\\/Role\\=pilot\", TARGET.x509UserProxyFirstFQAN); TargetUniverse = 5; name = \"Condor_Pool\"; ] @jre JOB_ROUTER_ROUTE_NAMES = My_HTCONDOR Setting a Default\u2026 \u00b6 This section outlines how to set default job limits, memory, cores, and maximum walltime. For an example on how users can override these defaults, consult this document . Maximum number of jobs \u00b6 To set a default limit to the maximum number of jobs per route, you can edit the configuration variable CONDORCE_MAX_JOBS in /etc/condor-ce/config.d/01-ce-router.conf : CONDORCE_MAX_JOBS = 10000 Note The above configuration is to be placed directly into the HTCondor-CE configuration instead of a job route or transform. Maximum memory \u00b6 To set a default maximum memory (in MB) for routed jobs, set the variable or attribute default_maxMemory for the ClassAd transform and deprecated syntax, respectively: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA # Set the requested memory to 1 GB default_maxMemory = 1000 @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; # Set the requested memory to 1 GB set_default_maxMemory = 1000; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool Number of cores to request \u00b6 To set a default number of cores for routed jobs, set the variable or attribute default_xcount for the ClassAd transform and deprecated syntax, respectively: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA # Set the requested memory to 1 GB default_xcount = 8 @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; # Set the requested cores to 8 set_default_xcount = 8; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool Number of gpus to request \u00b6 To set a default number of GPUs for routed jobs, set the job ClassAd attribute RequestGPUs in the route transform: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA # If the job does not already have a RequestGPUs value set it to 1 DEFAULT RequestGPUs = 1 @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool The DEFAULT keyword works for any job attribute other than those mentioned above that require the use of alternative names for defaulting in the CE. The deprecated syntax has no keyword for defaulting. Maximum walltime \u00b6 To set a default number of cores for routed jobs, set the variable or attribute default_maxWallTime for the ClassAd transform and deprecated syntax, respectively: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA # Set the max walltime to 1 hr default_maxWallTime = 60 @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; # Set the max walltime to 1 hr set_default_maxWallTime = 60; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool Setting Job Environments \u00b6 HTCondor-CE offers two different methods for setting environment variables of routed jobs: CONDORCE_PILOT_JOB_ENV configuration, which should be used for setting environment variables for all routed jobs to static strings. default_pilot_job_env or set_default_pilot_job_env job route configuration, which should be used for setting environment variables: Per job route To values based on incoming job attributes Using ClassAd functions Both of these methods use the new HTCondor format of the environment command , which is described by environment variable/value pairs separated by whitespace and enclosed in double-quotes. For example, the following HTCondor-CE configuration would result in the following environment for all routed jobs: HTCondor-CE Configuration CONDORCE_PILOT_JOB_ENV = \"WN_SCRATCH_DIR=/nobackup/ http_proxy=proxy.wisc.edu\" Resulting Environment WN_SCRATCH_DIR = /nobackup/ http_proxy = proxy.wisc.edu Contents of CONDORCE_PILOT_JOB_ENV can reference other HTCondor-CE configuration using HTCondor's configuration $() macro expansion . For example, the following HTCondor-CE configuration would result in the following environment for all routed jobs: HTCondor-CE Configuration LOCAL_PROXY = proxy.wisc.edu CONDORCE_PILOT_JOB_ENV = \"WN_SCRATCH_DIR=/nobackup/ http_proxy=$(LOCAL_PROXY)\" Resulting Environment WN_SCRATCH_DIR = /nobackup/ http_proxy = proxy.wisc.edu To set environment variables per job route, based on incoming job attributes, or using ClassAd functions, add default_pilot_job_env or set_default_pilot_job_env to your job route configuration for ClassAd transforms and deprecated syntax, respectively. For example, the following HTCondor-CE configuration would result in this environment for a job with these attributes: ClassAd Transform JOB_ROUTER_Condor_Pool @=jrt UNIVERSE VANILLA default_pilot_job_env = strcat(\"WN_SCRATCH_DIR=/nobackup\", \" PILOT_COLLECTOR=\", JOB_COLLECTOR, \" ACCOUNTING_GROUP=\", toLower(JOB_VO)) @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; set_default_pilot_job_env = strcat(\"WN_SCRATCH_DIR=/nobackup\", \" PILOT_COLLECTOR=\", JOB_COLLECTOR, \" ACCOUNTING_GROUP=\", toLower(JOB_VO)); ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool Incoming Job Attributes JOB_COLLECTOR = \"collector.wisc.edu\" JOB_VO = \"GLOW\" Resulting Environment WN_SCRATCH_DIR = /nobackup/ PILOT_COLLECTOR = collector.wisc.edu ACCOUNTING_GROUP = glow Debugging job route environment expressions While constructing default_pilot_job_env or set_default_pilot_job_env expressions, try wrapping your expression in debug() to help with any issues that may arise. Make sure to remove debug() after you're done! Editing Attributes\u2026 \u00b6 The following functions are operations that can be used to take incoming job attributes and modify them for the routed job for the ClassAd transform and deprecated syntax, respectively: COPY , copy_* DELETE , delete_* SET , set_* EVALSET , eval_set_* The above operations are evaluated in order differently depending on your chosen syntax: If you are using ClassAd transforms , each function is evaluated in order of appearance. For example, the following will set FOO in the routed job to the incoming job's Owner attribute and then subsequently remove FOO from the routed job: JOB_ROUTER_Condor_Pool @=jrt EVALSET FOO = \"$(MY.Owner)\" DELETE FOO @jrt If you are using the deprecated syntax , each class of operations is evaluated in the order specified above, i.e. all copy_* , before delete_* , etc. For example, if the attribute FOO is set using eval_set_FOO in the JOB_ROUTER_DEFAULTS , you'll be unable to use delete_foo to remove it from your jobs since the attribute is set using eval_set_foo after the deletion occurs according to the order of operations. To get around this, we can take advantage of the fact that operations defined in JOB_ROUTER_DEFAULTS get overridden by the same operation in JOB_ROUTER_ENTRIES . So to 'delete' FOO , you could add eval_set_foo = \"\" to the route in the JOB_ROUTER_ENTRIES , resulting in foo being set to the empty string in the routed job. More documentation can be found in the HTCondor manual Copying attributes \u00b6 To copy the value of an attribute of the incoming job to an attribute of the routed job, use COPY or copy_ for ClassAd transform and deprecated syntaxes, respectively.. The following route copies the Environment attribute of the incoming job and sets the attribute Original_Environment on the routed job to the same value: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA COPY Environment Original_Environment @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; copy_Environment = \"Original_Environment\"; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool Removing attributes \u00b6 To remove an attribute of the incoming job from the routed job, use DELETE or delete_ for ClassAd transform and deprecated syntaxes, respectively. The following route removes the Environment attribute from the routed job: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA DELETE Environment @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; delete_Environment = True; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool Setting attributes \u00b6 To set an attribute on the routed job, use SET or set_ for ClassAd transform and deprecated syntaxes, respectively. The following route sets the Job's Rank attribute to 5: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA SET Rank = 5 @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; set_Rank = 5; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool Setting attributes with ClassAd expressions \u00b6 To set an attribute to a ClassAd expression to be evaluated, use EVALSET or eval_set for ClassAd transform and deprecated syntaxes, respectively. The following route sets the Experiment attribute to atlas.osguser if the Owner of the incoming job is osguser : ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA EVALSET Experiment = strcat(\"atlas.\", Owner) @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; eval_set_Experiment = strcat(\"atlas.\", Owner); ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool Limiting the Number of Jobs \u00b6 This section outlines how to limit the number of total or idle jobs in a specific route (i.e., if this limit is reached, jobs will no longer be placed in this route). Note If you are using an HTCondor batch system, limiting the number of jobs is not the preferred solution: HTCondor manages fair share on its own via user priorities and group accounting . Total jobs \u00b6 To set a limit on the number of jobs for a specific route, set the MaxJobs attribute: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Poole @=jrt UNIVERSE VANILLA MaxJobs = 100 @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; MaxJobs = 100; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool Idle jobs \u00b6 To set a limit on the number of idle jobs for a specific route, set the MaxIdleJobs attribute: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Poole @=jrt UNIVERSE VANILLA MaxIdleJobs = 100 @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; MaxIdleJobs = 100; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool Debugging Routes \u00b6 To help debug expressions in your routes, you can use the debug() function. First, set the debug mode for the JobRouter by editing a file in /etc/condor-ce/config.d/ to read JOB_ROUTER_DEBUG = D_ALWAYS:2 D_CAT Then wrap the problematic attribute in debug() : ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt EVALSET Experiment = debug(strcat(\"atlas\", Name)) @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ name = \"Condor_Pool\"; eval_set_Experiment = debug(strcat(\"atlas\", Name)); ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool You will find the debugging output in /var/log/condor-ce/JobRouterLog . Getting Help \u00b6 If you have any questions or issues with configuring job routes, please contact us for assistance.","title":"Writing Job Routes"},{"location":"v6/configuration/writing-job-routes/#writing-job-routes","text":"This document contains documentation for HTCondor-CE Job Router configurations with equivalent examples for the ClassAd transform and deprecated syntaxes. Configuration from this page should be written to files in /etc/condor-ce/config.d/ , whose contents are parsed in lexicographic order with subsequent variables overriding earlier ones. Each example is displayed in code blocks with tabs to switch between the two syntaxes: ClassAd Transform This is an example for the ClassAd transform syntax Deprecated syntax This is an example for the deprecated syntax","title":"Writing Job Routes"},{"location":"v6/configuration/writing-job-routes/#syntax-differences","text":"Planned Removal of Deprecated Syntax JOB_ROUTER_DEFAULTS , JOB_ROUTER_ENTRIES , JOB_ROUTER_ENTRIES_CMD , and JOB_ROUTER_ENTRIES_FILE are deprecated and will be removed for V24 of the HTCondor Software Suite. New configuration syntax for the job router is defined using JOB_ROUTER_ROUTE_NAMES and JOB_ROUTER_ROUTE_[name] . For new syntax example vist: HTCondor Documentation - Job Router Note: The removal will occur during the lifetime of the HTCondor V23 feature series. In HTCondor-CE 5, the deprecated syntax continues to be the default and administrator's can move to the ClassAd transform syntax by setting the following in a file in /etc/condor-ce/config.d/ : JOB_ROUTER_USE_DEPRECATED_ROUTER_ENTRIES = False The ClassAd transform syntax provides many benefits including: Statements being evaluated in the order they are written Use of variables that are not included in the resultant job ad Use simple case-like logic Additionally, it is now easier to include job transformations that should be evaluated before or after your routes by including transforms in the lists of JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES and JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES , respectively. For examples of the ClassAd transform syntax, you can inspect default job router transforms packaged with HTCondor-CE with the following command: user@host $ condor_ce_config_val -dump JOB_ROUTER_TRANSFORM_","title":"Syntax Differences"},{"location":"v6/configuration/writing-job-routes/#differences-in-my-and-target","text":"In addition to the above, the behavior of the MY. and TARGET. ClassAd attribute prefixes has changed between the two different syntaxes: In ClassAd transform syntax, MY. always refers to the incoming job's attributes and can be referenced within $() , e.g. $(MY.Owner) refers to the mapped user of the incoming job. TARGET is only used in SET expressions to refer to attributes in the slot ad (HTCondor pools only). In the deprecated syntax, MY. refers to attributes in the job route and TARGET. refers to attributes in the incoming job ad for copy_ , delete_ , and eval_set_ functions. However, in expressions defined by set_* , MY. refers to the attributes in the incoming job ad and TARGET. refers to the attribute in the slot ad (HTCondor pools only).","title":"Differences in MY. and TARGET."},{"location":"v6/configuration/writing-job-routes/#required-fields","text":"The minimum requirements for a route are that you specify the type of batch system that jobs should be routed to and a name for each route. Default routes can be found in /usr/share/condor-ce/config.d/02-ce--defaults.conf , provided by the htcondor-ce- packages.","title":"Required Fields"},{"location":"v6/configuration/writing-job-routes/#route-name","text":"To identify routes, you will need to assign a name to the route, either in the name of the configuration macro (i.e., JOB_ROUTER_ROUTE_ ) for the ClassAd transform syntax or with the name attribute for the deprecated syntax: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; ] JOB_ROUTER_ROUTE_NAMES = Condor_Pool Naming restrictions Route names should only contain alphanumeric and _ characters. Routes specified by JOB_ROUTER_ROUTE_* will override routes with the same name in JOB_ROUTER_ENTRIES The name of the route will be useful in debugging since it shows up in the output of condor_ce_job_router_info ; the JobRouterLog ; in the ClassAd of the routed job, which can be viewed with condor_q and condor_history for HTCondor batch systems; and in the ClassAd of the routed job, which can be vieweed with condor_ce_q or condor_ce_history for non-HTCondor batch systems.","title":"Route name"},{"location":"v6/configuration/writing-job-routes/#batch-system","text":"Each route needs to indicate the type of batch system that jobs should be routed to. For HTCondor batch systems, the UNIVERSE command or TargetUniverse attribute needs to be set to \"VANILLA\" or 5 , respectively. For all other batch systems, the GridResource attribute needs to be set to \"batch \" (where can be one of pbs , slurm , lsf , or sge ). ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA @jrt JOB_ROUTER_ROUTE_My_Slurm @=jrt GridResource = \"batch slurm\" @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool My_Slurm Deprecated syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; ] [ GridResource = \"batch slurm\"; name = \"My_Slurm\"; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool My_Slurm","title":"Batch system"},{"location":"v6/configuration/writing-job-routes/#writing-multiple-routes","text":"If your batch system needs incoming jobs to be sorted (e.g. if different VO's need to go to separate queues), you will need to write multiple job routes where each route is a separate JOB_ROUTER_ROUTE_* macro in the ClassAd transform syntax and enclosed by square brackets in the deprecated syntax. Additionally, the route names must be added to JOB_ROUTER_ROUTE_NAMES in the order that you want their requirements statements compared to incoming jobs. The following routes takes incoming jobs that have a queue attribute set to \"prod\" and sets IsProduction = True . All other jobs will be routed with IsProduction = False . ClassAd Transform JOB_ROUTER_ROUTE_Production_Jobs @=jrt REQUIREMENTS queue == \"prod\" UNIVERSE VANILLA SET IsProduction = True @jrt JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA SET IsProduction = False @jrt JOB_ROUTER_ROUTE_NAMES = Production_Jobs Condor_Pool Deprecated syntax JOB_ROUTER_ENTRIES @=jre [ Requirements = (TARGET.queue == \"prod\"); TargetUniverse = 5; set_IsProduction = True; name = \"Production_Jobs\"; ] [ TargetUniverse = 5; set_IsProduction = False; name = \"Condor_Pool\"; ] @jre JOB_ROUTER_ROUTE_NAMES = Production_Jobs Condor_Pool","title":"Writing Multiple Routes"},{"location":"v6/configuration/writing-job-routes/#writing-comments","text":"To write comments you can use # to comment a line: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt # This is a comment UNIVERSE VANILLA @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; # This is a comment ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool","title":"Writing Comments"},{"location":"v6/configuration/writing-job-routes/#setting-attributes-for-all-routes","text":"","title":"Setting Attributes for All Routes"},{"location":"v6/configuration/writing-job-routes/#classad-transform","text":"With the ClassAd transform syntax, any function from the Editing Attributes section can be applied before or after your routes are considered by appending the names of transforms specified by JOB_ROUTER_TRANSFORM_ to the lists of JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES and JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES , respectively. The following configuration sets the Periodic_Hold attribute for all routed jobs before any route transforms are applied: JOB_ROUTER_TRANSFORM_Periodic_Hold SET Periodic_Hold = (NumJobStarts >= 1 && JobStatus == 1) || NumJobStarts > 1 @jrt JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES = $(JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES) Periodic_Hold To apply the same transform after your pre-route and route transforms, append the name of the transform to JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES instead: JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES = $(JOB_ROUTER_POST_ROUTE_TRANSFORM_NAMES) Periodic_Hold","title":"ClassAd transform"},{"location":"v6/configuration/writing-job-routes/#deprecated-syntax","text":"To set an attribute that will be applied to all routes, you will need to ensure that MERGE_JOB_ROUTER_DEFAULT_ADS is set to True (check the value with condor_ce_config_val ) and use the set_ function in the JOB_ROUTER_DEFAULTS . The following configuration sets the Periodic_Hold attribute for all routes: # Use the defaults generated by the condor_ce_router_defaults script. To add # additional defaults, add additional lines of the form: # # JOB_ROUTER_DEFAULTS = $(JOB_ROUTER_DEFAULTS) [set_foo = 1;] # MERGE_JOB_ROUTER_DEFAULT_ADS=True JOB_ROUTER_DEFAULTS = $(JOB_ROUTER_DEFAULTS) [set_Periodic_Hold = (NumJobStarts >= 1 && JobStatus == 1) || NumJobStarts > 1;]","title":"Deprecated syntax"},{"location":"v6/configuration/writing-job-routes/#filtering-jobs-based-on","text":"To filter jobs, use the route's REQUIREMENTS or Requirements attribute for ClassAd transforms and deprecated syntaxes, respectively. Incoming jobs will be evaluated against the ClassAd expression set in the route's requirements and if the expression evaluates to TRUE , the route will match. More information on the syntax of ClassAd's can be found in the HTCondor manual . For an example on how incoming jobs interact with filtering in job routes, consult this document . In the deprecated syntax, you may need to specify TARGET. to refer to differentiate between job and route attributes. See this section for more details. Note If you have an HTCondor batch system, note the difference with set_requirements :","title":"Filtering Jobs Based On\u2026"},{"location":"v6/configuration/writing-job-routes/#pilot-job-queue","text":"To filter jobs based on their pilot job queue attribute, your routes will need a requirements expression using the incoming job's queue attribute. The following entry routes jobs to HTCondor if the incoming job (specified by TARGET ) is an analy (Analysis) glidein: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt REQUIREMENTS queue == \"prod\" UNIVERSE VANILLA @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated syntax JOB_ROUTER_ENTRIES @=jre [ Requirements = (TARGET.queue == \"prod\"); TargetUniverse = 5; name = \"Condor_Pool\"; ] @jre JOB_ROUTER_ROUTE_NAMES = My_HTCONDOR","title":"Pilot job queue"},{"location":"v6/configuration/writing-job-routes/#mapped-user","text":"To filter jobs based on what local account the incoming job was mapped to, your routes will need a requirements expression using the incoming job's Owner attribute. The following entry routes jobs to the HTCondor batch system if the mapped user is usatlas2 : ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt REQUIREMENTS Owner == \"usatlas2\" UNIVERSE VANILLA @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated syntax JOB_ROUTER_ENTRIES @=jre [ Requirements = (TARGET.Owner == \"usatlas2\"); TargetUniverse = 5; name = \"Condor_Pool\"; ] @jre JOB_ROUTER_ROUTE_NAMES = My_HTCONDOR Alternatively, you can match based on regular expression. The following entry routes jobs to the HTCondor batch system if the mapped user begins with usatlas : ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt REQUIREMENTS regexp(\"^usatlas\", Owner) UNIVERSE VANILLA @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated syntax JOB_ROUTER_ENTRIES @=jre [ Requirements = regexp(\"^usatlas\", TARGET.Owner); TargetUniverse = 5; name = \"Condor_Pool\"; ] @jre JOB_ROUTER_ROUTE_NAMES = My_HTCONDOR","title":"Mapped user"},{"location":"v6/configuration/writing-job-routes/#voms-attribute","text":"To filter jobs based on the subject of the job's proxy, your routes will need a requirements expression using the incoming job's x509UserProxyFirstFQAN attribute. The following entry routes jobs to the HTCondor batch system if the proxy subject contains /cms/Role=Pilot : ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt REQUIREMENTS regexp(\"\\/cms\\/Role\\=pilot\", x509UserProxyFirstFQAN) UNIVERSE VANILLA @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated syntax JOB_ROUTER_ENTRIES @=jre [ Requirements = regexp(\"\\/cms\\/Role\\=pilot\", TARGET.x509UserProxyFirstFQAN); TargetUniverse = 5; name = \"Condor_Pool\"; ] @jre JOB_ROUTER_ROUTE_NAMES = My_HTCONDOR","title":"VOMS attribute"},{"location":"v6/configuration/writing-job-routes/#setting-a-default","text":"This section outlines how to set default job limits, memory, cores, and maximum walltime. For an example on how users can override these defaults, consult this document .","title":"Setting a Default\u2026"},{"location":"v6/configuration/writing-job-routes/#maximum-number-of-jobs","text":"To set a default limit to the maximum number of jobs per route, you can edit the configuration variable CONDORCE_MAX_JOBS in /etc/condor-ce/config.d/01-ce-router.conf : CONDORCE_MAX_JOBS = 10000 Note The above configuration is to be placed directly into the HTCondor-CE configuration instead of a job route or transform.","title":"Maximum number of jobs"},{"location":"v6/configuration/writing-job-routes/#maximum-memory","text":"To set a default maximum memory (in MB) for routed jobs, set the variable or attribute default_maxMemory for the ClassAd transform and deprecated syntax, respectively: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA # Set the requested memory to 1 GB default_maxMemory = 1000 @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; # Set the requested memory to 1 GB set_default_maxMemory = 1000; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool","title":"Maximum memory"},{"location":"v6/configuration/writing-job-routes/#number-of-cores-to-request","text":"To set a default number of cores for routed jobs, set the variable or attribute default_xcount for the ClassAd transform and deprecated syntax, respectively: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA # Set the requested memory to 1 GB default_xcount = 8 @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; # Set the requested cores to 8 set_default_xcount = 8; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool","title":"Number of cores to request"},{"location":"v6/configuration/writing-job-routes/#number-of-gpus-to-request","text":"To set a default number of GPUs for routed jobs, set the job ClassAd attribute RequestGPUs in the route transform: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA # If the job does not already have a RequestGPUs value set it to 1 DEFAULT RequestGPUs = 1 @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool The DEFAULT keyword works for any job attribute other than those mentioned above that require the use of alternative names for defaulting in the CE. The deprecated syntax has no keyword for defaulting.","title":"Number of gpus to request"},{"location":"v6/configuration/writing-job-routes/#maximum-walltime","text":"To set a default number of cores for routed jobs, set the variable or attribute default_maxWallTime for the ClassAd transform and deprecated syntax, respectively: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA # Set the max walltime to 1 hr default_maxWallTime = 60 @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; # Set the max walltime to 1 hr set_default_maxWallTime = 60; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool","title":"Maximum walltime"},{"location":"v6/configuration/writing-job-routes/#setting-job-environments","text":"HTCondor-CE offers two different methods for setting environment variables of routed jobs: CONDORCE_PILOT_JOB_ENV configuration, which should be used for setting environment variables for all routed jobs to static strings. default_pilot_job_env or set_default_pilot_job_env job route configuration, which should be used for setting environment variables: Per job route To values based on incoming job attributes Using ClassAd functions Both of these methods use the new HTCondor format of the environment command , which is described by environment variable/value pairs separated by whitespace and enclosed in double-quotes. For example, the following HTCondor-CE configuration would result in the following environment for all routed jobs: HTCondor-CE Configuration CONDORCE_PILOT_JOB_ENV = \"WN_SCRATCH_DIR=/nobackup/ http_proxy=proxy.wisc.edu\" Resulting Environment WN_SCRATCH_DIR = /nobackup/ http_proxy = proxy.wisc.edu Contents of CONDORCE_PILOT_JOB_ENV can reference other HTCondor-CE configuration using HTCondor's configuration $() macro expansion . For example, the following HTCondor-CE configuration would result in the following environment for all routed jobs: HTCondor-CE Configuration LOCAL_PROXY = proxy.wisc.edu CONDORCE_PILOT_JOB_ENV = \"WN_SCRATCH_DIR=/nobackup/ http_proxy=$(LOCAL_PROXY)\" Resulting Environment WN_SCRATCH_DIR = /nobackup/ http_proxy = proxy.wisc.edu To set environment variables per job route, based on incoming job attributes, or using ClassAd functions, add default_pilot_job_env or set_default_pilot_job_env to your job route configuration for ClassAd transforms and deprecated syntax, respectively. For example, the following HTCondor-CE configuration would result in this environment for a job with these attributes: ClassAd Transform JOB_ROUTER_Condor_Pool @=jrt UNIVERSE VANILLA default_pilot_job_env = strcat(\"WN_SCRATCH_DIR=/nobackup\", \" PILOT_COLLECTOR=\", JOB_COLLECTOR, \" ACCOUNTING_GROUP=\", toLower(JOB_VO)) @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; set_default_pilot_job_env = strcat(\"WN_SCRATCH_DIR=/nobackup\", \" PILOT_COLLECTOR=\", JOB_COLLECTOR, \" ACCOUNTING_GROUP=\", toLower(JOB_VO)); ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool Incoming Job Attributes JOB_COLLECTOR = \"collector.wisc.edu\" JOB_VO = \"GLOW\" Resulting Environment WN_SCRATCH_DIR = /nobackup/ PILOT_COLLECTOR = collector.wisc.edu ACCOUNTING_GROUP = glow Debugging job route environment expressions While constructing default_pilot_job_env or set_default_pilot_job_env expressions, try wrapping your expression in debug() to help with any issues that may arise. Make sure to remove debug() after you're done!","title":"Setting Job Environments"},{"location":"v6/configuration/writing-job-routes/#editing-attributes","text":"The following functions are operations that can be used to take incoming job attributes and modify them for the routed job for the ClassAd transform and deprecated syntax, respectively: COPY , copy_* DELETE , delete_* SET , set_* EVALSET , eval_set_* The above operations are evaluated in order differently depending on your chosen syntax: If you are using ClassAd transforms , each function is evaluated in order of appearance. For example, the following will set FOO in the routed job to the incoming job's Owner attribute and then subsequently remove FOO from the routed job: JOB_ROUTER_Condor_Pool @=jrt EVALSET FOO = \"$(MY.Owner)\" DELETE FOO @jrt If you are using the deprecated syntax , each class of operations is evaluated in the order specified above, i.e. all copy_* , before delete_* , etc. For example, if the attribute FOO is set using eval_set_FOO in the JOB_ROUTER_DEFAULTS , you'll be unable to use delete_foo to remove it from your jobs since the attribute is set using eval_set_foo after the deletion occurs according to the order of operations. To get around this, we can take advantage of the fact that operations defined in JOB_ROUTER_DEFAULTS get overridden by the same operation in JOB_ROUTER_ENTRIES . So to 'delete' FOO , you could add eval_set_foo = \"\" to the route in the JOB_ROUTER_ENTRIES , resulting in foo being set to the empty string in the routed job. More documentation can be found in the HTCondor manual","title":"Editing Attributes\u2026"},{"location":"v6/configuration/writing-job-routes/#copying-attributes","text":"To copy the value of an attribute of the incoming job to an attribute of the routed job, use COPY or copy_ for ClassAd transform and deprecated syntaxes, respectively.. The following route copies the Environment attribute of the incoming job and sets the attribute Original_Environment on the routed job to the same value: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA COPY Environment Original_Environment @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; copy_Environment = \"Original_Environment\"; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool","title":"Copying attributes"},{"location":"v6/configuration/writing-job-routes/#removing-attributes","text":"To remove an attribute of the incoming job from the routed job, use DELETE or delete_ for ClassAd transform and deprecated syntaxes, respectively. The following route removes the Environment attribute from the routed job: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA DELETE Environment @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; delete_Environment = True; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool","title":"Removing attributes"},{"location":"v6/configuration/writing-job-routes/#setting-attributes","text":"To set an attribute on the routed job, use SET or set_ for ClassAd transform and deprecated syntaxes, respectively. The following route sets the Job's Rank attribute to 5: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA SET Rank = 5 @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; set_Rank = 5; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool","title":"Setting attributes"},{"location":"v6/configuration/writing-job-routes/#setting-attributes-with-classad-expressions","text":"To set an attribute to a ClassAd expression to be evaluated, use EVALSET or eval_set for ClassAd transform and deprecated syntaxes, respectively. The following route sets the Experiment attribute to atlas.osguser if the Owner of the incoming job is osguser : ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt UNIVERSE VANILLA EVALSET Experiment = strcat(\"atlas.\", Owner) @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; eval_set_Experiment = strcat(\"atlas.\", Owner); ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool","title":"Setting attributes with ClassAd expressions"},{"location":"v6/configuration/writing-job-routes/#limiting-the-number-of-jobs","text":"This section outlines how to limit the number of total or idle jobs in a specific route (i.e., if this limit is reached, jobs will no longer be placed in this route). Note If you are using an HTCondor batch system, limiting the number of jobs is not the preferred solution: HTCondor manages fair share on its own via user priorities and group accounting .","title":"Limiting the Number of Jobs"},{"location":"v6/configuration/writing-job-routes/#total-jobs","text":"To set a limit on the number of jobs for a specific route, set the MaxJobs attribute: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Poole @=jrt UNIVERSE VANILLA MaxJobs = 100 @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; MaxJobs = 100; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool","title":"Total jobs"},{"location":"v6/configuration/writing-job-routes/#idle-jobs","text":"To set a limit on the number of idle jobs for a specific route, set the MaxIdleJobs attribute: ClassAd Transform JOB_ROUTER_ROUTE_Condor_Poole @=jrt UNIVERSE VANILLA MaxIdleJobs = 100 @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ TargetUniverse = 5; name = \"Condor_Pool\"; MaxIdleJobs = 100; ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool","title":"Idle jobs"},{"location":"v6/configuration/writing-job-routes/#debugging-routes","text":"To help debug expressions in your routes, you can use the debug() function. First, set the debug mode for the JobRouter by editing a file in /etc/condor-ce/config.d/ to read JOB_ROUTER_DEBUG = D_ALWAYS:2 D_CAT Then wrap the problematic attribute in debug() : ClassAd Transform JOB_ROUTER_ROUTE_Condor_Pool @=jrt EVALSET Experiment = debug(strcat(\"atlas\", Name)) @jrt JOB_ROUTER_ROUTE_NAMES = Condor_Pool Deprecated Syntax JOB_ROUTER_ENTRIES @=jre [ name = \"Condor_Pool\"; eval_set_Experiment = debug(strcat(\"atlas\", Name)); ] @jre JOB_ROUTER_ROUTE_NAMES = Condor_Pool You will find the debugging output in /var/log/condor-ce/JobRouterLog .","title":"Debugging Routes"},{"location":"v6/configuration/writing-job-routes/#getting-help","text":"If you have any questions or issues with configuring job routes, please contact us for assistance.","title":"Getting Help"},{"location":"v6/installation/central-collector/","text":"Installing an HTCondor-CE Central Collector \u00b6 The HTCondor-CE Central Collector is an information service designed to provide a an overview and descriptions of grid services. Based on the HTCondorView Server , the Central Collector accepts ClassAds from site HTCondor-CEs by default but may accept from other services using the HTCondor Python Bindings . By distributing configuration to each member site, a central grid team can coordinate the information that site HTCondor-CEs should advertise. Additionally, the the HTCondor-CE View web server may be installed alongside a Central Collector to display pilot job statistics across its grid, as well as information for each site HTCondor-CE. For example, the OSG Central Collector can be viewed at https://collector.opensciencegrid.org . Use this page to learn how to install, configure, and run an HTCondor-CE Central Collector as part of your central operations. Before Starting \u00b6 Before starting the installation process, consider the following points (consulting the reference page as necessary): User IDs: If they do not exist already, the installation will create the condor Linux user (UID 4716) SSL certificate: The HTCondor-CE Central Collector service uses a host certificate and key for SSL authentication DNS entries: Forward and reverse DNS must resolve for the HTCondor-CE Central Collector host Network ports: Site HTCondor-CEs must be able to contact the Central Collector on port 9619 (TCP). Additionally, the optional HTCondor-CE View web server should be accessible on port 80 (TCP). There are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system (Red Hat Enterprise Linux variant 7) Obtain root access to the host Prepare the EPEL and HTCondor Yum repositories Install CA certificates and VO data into /etc/grid-security/certificates and /etc/grid-security/vomsdir , respectively Installing a Central Collector \u00b6 Clean yum cache: root@host # yum clean all --enablerepo = * Update software: root@host # yum update This command will update all packages Install the Central Collector software: root@host # yum install htcondor-ce-collector Configuring a Central Collector \u00b6 Like a site HTCondor-CE, the Central Collector uses X.509 host certificates and certificate authorities (CAs) when authenticating SSL connections. By default, the Central Collector uses the default system locations to locate CAs and host certificate when authenticating SSL connections, i.e. for SSL authentication methods. But traditionally, the Central Collector and HTCondor-CEs have authenticated with each other using specialized grid certificates (e.g. certificates issued by IGTF CAs ) located in /etc/grid-security/ . Choose one of the following options to configure your Central Collector to use grid or system certificates for authentication: If your site HTCondor-CEs will be advertising to your Central Collector using grid certificates or you are using a grid certificate for your Central Collector's host certificate: Set the following configuration in /etc/condor-ce/config.d/01-ce-auth.conf : AUTH_SSL_SERVER_CERTFILE = /etc/grid-security/hostcert.pem AUTH_SSL_SERVER_KEYFILE = /etc/grid-security/hostkey.pem AUTH_SSL_SERVER_CADIR = /etc/grid-security/certificates AUTH_SSL_SERVER_CAFILE = AUTH_SSL_CLIENT_CERTFILE = /etc/grid-security/hostcert.pem AUTH_SSL_CLIENT_KEYFILE = /etc/grid-security/hostkey.pem AUTH_SSL_CLIENT_CADIR = /etc/grid-security/certificates AUTH_SSL_CLIENT_CAFILE = Install your host certificate and key into /etc/grid-security/hostcert.pem and /etc/grid-security/hostkey.pem , respectively Set the ownership and Unix permissions of the host certificate and key root@host # chown root:root /etc/grid-security/hostcert.pem /etc/grid-security/hostkey.pem root@host # chmod 644 /etc/grid-security/hostcert.pem root@host # chmod 600 /etc/grid-security/hostkey.pem Otherwise, use the default system locations: Install your host certificate and key into /etc/pki/tls/certs/localhost.crt and /etc/pki/tls/private/localhost.key , respectively Set the ownership and Unix permissions of the host certificate and key root@host # chown root:root /etc/pki/tls/certs/localhost.crt /etc/pki/tls/private/localhost.key root@host # chmod 644 /etc/pki/tls/certs/localhost.crt root@host # chmod 600 /etc/pki/tls/private/localhost.key Optional configuration \u00b6 The following configuration steps are optional and will not be required for all Central Collectors. If you do not need any of the following special configurations, skip to the section on next steps . Banning HTCondor-CEs \u00b6 By default, Central Collectors accept ClassAds from all HTCondor-CEs with a valid and accepted certificate. If you want to stop accepting ClassAds from a particular HTCondor-CE, add its hostname to DENY_ADVERTISE_SCHEDD in /etc/condor-ce/config.d/01-ce-collector.conf . For example: DENY_ADVERTISE_SCHEDD = $(DENY_ADVERTISE_SCHEDD), misbehaving-ce-1.bad-domain.com, misbehaving-ce-2.bad-domain.com Configuring HTCondor-CE View \u00b6 The HTCondor-CE View is an optional web interface to the status of all HTCondor-CEs advertising to your Central Collector. To run the HTCondor-CE View, install the appropriate package and set the relevant configuration. Begin by installing the htcondor-ce-view package: root@host # yum install htcondor-ce-view Restart the condor-ce-collector service Verify the service by entering your Central Collector's hostname into your web browser The website is served on port 80 by default. To change this default, edit the value of HTCONDORCE_VIEW_PORT in /etc/condor-ce/config.d/05-ce-view.conf . Distributing Configuration to Site HTCondor-CEs \u00b6 To make the Central Collector truly useful, each site HTCondor-CE in your organization will need to configure their HTCondor-CEs to advertise to your Central Collector(s) along with any custom information that may be of interest. For example, the OSG provides default configuration to OSG sites through an osg-ce metapackage and configuration tools. Following the Filesystem Hierarchy Standard , the following configuration should be set by HTCondor-CE administrators in /etc/condor-ce/config.d/ or by packagers in /usr/share/condor-ce/config.d/ : Set CONDOR_VIEW_HOST to a comma-separated list of Central Collectors: CONDOR_VIEW_HOST = collector.htcondor.org:9619, collector1.htcondor.org:9619, collector2.htcondor.org:9619 Append arbitrary attributes to SCHEDD_ATTRS containing custom information in any number of arbitrarily configuration attributes: ATTR_NAME_1 = value1 ATTR_NAME_2 = value2 SCHEDD_ATTRS = $(SCHEDD_ATTRS) ATTR_NAME_1 ATTR_NAME_2 For example, OSG sites advertise information describing their OSG Topology registrations, local batch system, and local resourcess: OSG_Resource = \"local\" OSG_ResourceGroup = \"\" OSG_BatchSystems = \"condor\" OSG_ResourceCatalog = { \\ [ \\ AllowedVOs = { \"osg\" }; \\ CPUs = 2; \\ MaxWallTime = 1440; \\ Memory = 10000; \\ Name = \"test\"; \\ Requirements = TARGET.RequestCPUs <= CPUs && TARGET.RequestMemory <= Memory && member(TARGET.VO, AllowedVOs); \\ Transform = [ set_MaxMemory = RequestMemory; set_xcount = RequestCPUs; ]; \\ ] \\ } SCHEDD_ATTRS = $(SCHEDD_ATTRS) OSG_Resource OSG_ResourceGroup OSG_BatchSystems OSG_ResourceCatalog Verifying a Central Collector \u00b6 To verify that you have a working installation of a Central Collector, ensure that all the relevant services are started and enabled then perform the validation steps below. Managing Central Collector services \u00b6 In addition to the Central Collector service itself, there are a number of supporting services in your installation. The specific services are: Software Service name HTCondor-CE condor-ce-collector Start and enable the services in the order listed and stop them in reverse order. As a reminder, here are common service commands (all run as root ): To... On EL7, run the command... Start a service systemctl start Stop a service systemctl stop Enable a service to start on boot systemctl enable Disable a service from starting on boot systemctl disable Validating a Central Collector \u00b6 Getting Help \u00b6 If you have any questions or issues with the installation process, please contact us for assistance.","title":"Install a Central Collector"},{"location":"v6/installation/central-collector/#installing-an-htcondor-ce-central-collector","text":"The HTCondor-CE Central Collector is an information service designed to provide a an overview and descriptions of grid services. Based on the HTCondorView Server , the Central Collector accepts ClassAds from site HTCondor-CEs by default but may accept from other services using the HTCondor Python Bindings . By distributing configuration to each member site, a central grid team can coordinate the information that site HTCondor-CEs should advertise. Additionally, the the HTCondor-CE View web server may be installed alongside a Central Collector to display pilot job statistics across its grid, as well as information for each site HTCondor-CE. For example, the OSG Central Collector can be viewed at https://collector.opensciencegrid.org . Use this page to learn how to install, configure, and run an HTCondor-CE Central Collector as part of your central operations.","title":"Installing an HTCondor-CE Central Collector"},{"location":"v6/installation/central-collector/#before-starting","text":"Before starting the installation process, consider the following points (consulting the reference page as necessary): User IDs: If they do not exist already, the installation will create the condor Linux user (UID 4716) SSL certificate: The HTCondor-CE Central Collector service uses a host certificate and key for SSL authentication DNS entries: Forward and reverse DNS must resolve for the HTCondor-CE Central Collector host Network ports: Site HTCondor-CEs must be able to contact the Central Collector on port 9619 (TCP). Additionally, the optional HTCondor-CE View web server should be accessible on port 80 (TCP). There are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system (Red Hat Enterprise Linux variant 7) Obtain root access to the host Prepare the EPEL and HTCondor Yum repositories Install CA certificates and VO data into /etc/grid-security/certificates and /etc/grid-security/vomsdir , respectively","title":"Before Starting"},{"location":"v6/installation/central-collector/#installing-a-central-collector","text":"Clean yum cache: root@host # yum clean all --enablerepo = * Update software: root@host # yum update This command will update all packages Install the Central Collector software: root@host # yum install htcondor-ce-collector","title":"Installing a Central Collector"},{"location":"v6/installation/central-collector/#configuring-a-central-collector","text":"Like a site HTCondor-CE, the Central Collector uses X.509 host certificates and certificate authorities (CAs) when authenticating SSL connections. By default, the Central Collector uses the default system locations to locate CAs and host certificate when authenticating SSL connections, i.e. for SSL authentication methods. But traditionally, the Central Collector and HTCondor-CEs have authenticated with each other using specialized grid certificates (e.g. certificates issued by IGTF CAs ) located in /etc/grid-security/ . Choose one of the following options to configure your Central Collector to use grid or system certificates for authentication: If your site HTCondor-CEs will be advertising to your Central Collector using grid certificates or you are using a grid certificate for your Central Collector's host certificate: Set the following configuration in /etc/condor-ce/config.d/01-ce-auth.conf : AUTH_SSL_SERVER_CERTFILE = /etc/grid-security/hostcert.pem AUTH_SSL_SERVER_KEYFILE = /etc/grid-security/hostkey.pem AUTH_SSL_SERVER_CADIR = /etc/grid-security/certificates AUTH_SSL_SERVER_CAFILE = AUTH_SSL_CLIENT_CERTFILE = /etc/grid-security/hostcert.pem AUTH_SSL_CLIENT_KEYFILE = /etc/grid-security/hostkey.pem AUTH_SSL_CLIENT_CADIR = /etc/grid-security/certificates AUTH_SSL_CLIENT_CAFILE = Install your host certificate and key into /etc/grid-security/hostcert.pem and /etc/grid-security/hostkey.pem , respectively Set the ownership and Unix permissions of the host certificate and key root@host # chown root:root /etc/grid-security/hostcert.pem /etc/grid-security/hostkey.pem root@host # chmod 644 /etc/grid-security/hostcert.pem root@host # chmod 600 /etc/grid-security/hostkey.pem Otherwise, use the default system locations: Install your host certificate and key into /etc/pki/tls/certs/localhost.crt and /etc/pki/tls/private/localhost.key , respectively Set the ownership and Unix permissions of the host certificate and key root@host # chown root:root /etc/pki/tls/certs/localhost.crt /etc/pki/tls/private/localhost.key root@host # chmod 644 /etc/pki/tls/certs/localhost.crt root@host # chmod 600 /etc/pki/tls/private/localhost.key","title":"Configuring a Central Collector"},{"location":"v6/installation/central-collector/#optional-configuration","text":"The following configuration steps are optional and will not be required for all Central Collectors. If you do not need any of the following special configurations, skip to the section on next steps .","title":"Optional configuration"},{"location":"v6/installation/central-collector/#banning-htcondor-ces","text":"By default, Central Collectors accept ClassAds from all HTCondor-CEs with a valid and accepted certificate. If you want to stop accepting ClassAds from a particular HTCondor-CE, add its hostname to DENY_ADVERTISE_SCHEDD in /etc/condor-ce/config.d/01-ce-collector.conf . For example: DENY_ADVERTISE_SCHEDD = $(DENY_ADVERTISE_SCHEDD), misbehaving-ce-1.bad-domain.com, misbehaving-ce-2.bad-domain.com","title":"Banning HTCondor-CEs"},{"location":"v6/installation/central-collector/#configuring-htcondor-ce-view","text":"The HTCondor-CE View is an optional web interface to the status of all HTCondor-CEs advertising to your Central Collector. To run the HTCondor-CE View, install the appropriate package and set the relevant configuration. Begin by installing the htcondor-ce-view package: root@host # yum install htcondor-ce-view Restart the condor-ce-collector service Verify the service by entering your Central Collector's hostname into your web browser The website is served on port 80 by default. To change this default, edit the value of HTCONDORCE_VIEW_PORT in /etc/condor-ce/config.d/05-ce-view.conf .","title":"Configuring HTCondor-CE View"},{"location":"v6/installation/central-collector/#distributing-configuration-to-site-htcondor-ces","text":"To make the Central Collector truly useful, each site HTCondor-CE in your organization will need to configure their HTCondor-CEs to advertise to your Central Collector(s) along with any custom information that may be of interest. For example, the OSG provides default configuration to OSG sites through an osg-ce metapackage and configuration tools. Following the Filesystem Hierarchy Standard , the following configuration should be set by HTCondor-CE administrators in /etc/condor-ce/config.d/ or by packagers in /usr/share/condor-ce/config.d/ : Set CONDOR_VIEW_HOST to a comma-separated list of Central Collectors: CONDOR_VIEW_HOST = collector.htcondor.org:9619, collector1.htcondor.org:9619, collector2.htcondor.org:9619 Append arbitrary attributes to SCHEDD_ATTRS containing custom information in any number of arbitrarily configuration attributes: ATTR_NAME_1 = value1 ATTR_NAME_2 = value2 SCHEDD_ATTRS = $(SCHEDD_ATTRS) ATTR_NAME_1 ATTR_NAME_2 For example, OSG sites advertise information describing their OSG Topology registrations, local batch system, and local resourcess: OSG_Resource = \"local\" OSG_ResourceGroup = \"\" OSG_BatchSystems = \"condor\" OSG_ResourceCatalog = { \\ [ \\ AllowedVOs = { \"osg\" }; \\ CPUs = 2; \\ MaxWallTime = 1440; \\ Memory = 10000; \\ Name = \"test\"; \\ Requirements = TARGET.RequestCPUs <= CPUs && TARGET.RequestMemory <= Memory && member(TARGET.VO, AllowedVOs); \\ Transform = [ set_MaxMemory = RequestMemory; set_xcount = RequestCPUs; ]; \\ ] \\ } SCHEDD_ATTRS = $(SCHEDD_ATTRS) OSG_Resource OSG_ResourceGroup OSG_BatchSystems OSG_ResourceCatalog","title":"Distributing Configuration to Site HTCondor-CEs"},{"location":"v6/installation/central-collector/#verifying-a-central-collector","text":"To verify that you have a working installation of a Central Collector, ensure that all the relevant services are started and enabled then perform the validation steps below.","title":"Verifying a Central Collector"},{"location":"v6/installation/central-collector/#managing-central-collector-services","text":"In addition to the Central Collector service itself, there are a number of supporting services in your installation. The specific services are: Software Service name HTCondor-CE condor-ce-collector Start and enable the services in the order listed and stop them in reverse order. As a reminder, here are common service commands (all run as root ): To... On EL7, run the command... Start a service systemctl start Stop a service systemctl stop Enable a service to start on boot systemctl enable Disable a service from starting on boot systemctl disable ","title":"Managing Central Collector services"},{"location":"v6/installation/central-collector/#validating-a-central-collector","text":"","title":"Validating a Central Collector"},{"location":"v6/installation/central-collector/#getting-help","text":"If you have any questions or issues with the installation process, please contact us for assistance.","title":"Getting Help"},{"location":"v6/installation/htcondor-ce/","text":"Installing HTCondor-CE 6 \u00b6 Joining the OSG Consortium (OSG)? If you are installing an HTCondor-CE for the OSG, consult the OSG-specific documentation . HTCondor-CE is a special configuration of the HTCondor software designed as a Compute Entrypoint solution for computing grids (e.g. European Grid Infrastructure , The OSG Consortium ). It is configured to use the Job Router daemon to delegate resource allocation requests by transforming and submitting them to the site\u2019s batch system. See the home page for more details on the features and architecture of HTCondor-CE. Use this page to learn how to install, configure, run, test, and troubleshoot HTCondor-CE 6 from the CHTC yum repositories . Before Starting \u00b6 Before starting the installation process, consider the following points (consulting the reference page as necessary): User IDs: If they do not exist already, the installation will create the condor Linux user (UID 4716) SSL certificate: The HTCondor-CE service uses a host certificate and key for SSL authentication DNS entries: Forward and reverse DNS must resolve for the HTCondor-CE host Network ports: The pilot factories must be able to contact your HTCondor-CE service on port 9619 (TCP) Submit host: HTCondor-CE should be installed on a host that already has the ability to submit jobs into your local cluster running supported batch system software (Grid Engine, HTCondor, LSF, PBS/Torque, Slurm) File Systems : Non-HTCondor batch systems require a shared file system between the HTCondor-CE host and the batch system worker nodes. There are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system (Red Hat Enterprise Linux variant 7) Obtain root access to the host Prepare the EPEL and HTCondor Development Yum repositories Install CA certificates and VO data into /etc/grid-security/certificates and /etc/grid-security/vomsdir , respectively Installing HTCondor-CE \u00b6 Important HTCondor-CE must be installed on a host that is configured to submit jobs to your batch system. The details of this setup is site-specific by nature and therefore beyond the scope of this document. Clean yum cache: root@host # yum clean all --enablerepo = * Update software: root@host # yum update This command will update all packages Select the appropriate convenience RPM: If your batch system is... Then use the following package... Grid Engine htcondor-ce-sge HTCondor htcondor-ce-condor LSF htcondor-ce-lsf PBS/Torque htcondor-ce-pbs SLURM htcondor-ce-slurm Install the CE software: root@host # yum install Where is the package you selected in the above step. Next Steps \u00b6 At this point, you should have all the necessary binaries, scripts, and default configurations. The next step is to configure authentication to allow for remote submission to your HTCondor-CE. Getting Help \u00b6 If you have any questions or issues with the installation process, please contact us for assistance.","title":"Installation"},{"location":"v6/installation/htcondor-ce/#installing-htcondor-ce-6","text":"Joining the OSG Consortium (OSG)? If you are installing an HTCondor-CE for the OSG, consult the OSG-specific documentation . HTCondor-CE is a special configuration of the HTCondor software designed as a Compute Entrypoint solution for computing grids (e.g. European Grid Infrastructure , The OSG Consortium ). It is configured to use the Job Router daemon to delegate resource allocation requests by transforming and submitting them to the site\u2019s batch system. See the home page for more details on the features and architecture of HTCondor-CE. Use this page to learn how to install, configure, run, test, and troubleshoot HTCondor-CE 6 from the CHTC yum repositories .","title":"Installing HTCondor-CE 6"},{"location":"v6/installation/htcondor-ce/#before-starting","text":"Before starting the installation process, consider the following points (consulting the reference page as necessary): User IDs: If they do not exist already, the installation will create the condor Linux user (UID 4716) SSL certificate: The HTCondor-CE service uses a host certificate and key for SSL authentication DNS entries: Forward and reverse DNS must resolve for the HTCondor-CE host Network ports: The pilot factories must be able to contact your HTCondor-CE service on port 9619 (TCP) Submit host: HTCondor-CE should be installed on a host that already has the ability to submit jobs into your local cluster running supported batch system software (Grid Engine, HTCondor, LSF, PBS/Torque, Slurm) File Systems : Non-HTCondor batch systems require a shared file system between the HTCondor-CE host and the batch system worker nodes. There are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system (Red Hat Enterprise Linux variant 7) Obtain root access to the host Prepare the EPEL and HTCondor Development Yum repositories Install CA certificates and VO data into /etc/grid-security/certificates and /etc/grid-security/vomsdir , respectively","title":"Before Starting"},{"location":"v6/installation/htcondor-ce/#installing-htcondor-ce","text":"Important HTCondor-CE must be installed on a host that is configured to submit jobs to your batch system. The details of this setup is site-specific by nature and therefore beyond the scope of this document. Clean yum cache: root@host # yum clean all --enablerepo = * Update software: root@host # yum update This command will update all packages Select the appropriate convenience RPM: If your batch system is... Then use the following package... Grid Engine htcondor-ce-sge HTCondor htcondor-ce-condor LSF htcondor-ce-lsf PBS/Torque htcondor-ce-pbs SLURM htcondor-ce-slurm Install the CE software: root@host # yum install Where is the package you selected in the above step.","title":"Installing HTCondor-CE"},{"location":"v6/installation/htcondor-ce/#next-steps","text":"At this point, you should have all the necessary binaries, scripts, and default configurations. The next step is to configure authentication to allow for remote submission to your HTCondor-CE.","title":"Next Steps"},{"location":"v6/installation/htcondor-ce/#getting-help","text":"If you have any questions or issues with the installation process, please contact us for assistance.","title":"Getting Help"},{"location":"v6/troubleshooting/common-issues/","text":"Common Issues \u00b6 Known Issues \u00b6 SUBMIT_ATTRS are not applied to jobs on the local HTCondor \u00b6 If you are adding attributes to jobs submitted to your HTCondor pool with SUBMIT_ATTRS , these will not be applied to jobs that are entering your pool from the HTCondor-CE. To get around this, you will want to add the attributes to your job routes . If the CE is the only entry point for jobs into your pool, you can get rid of SUBMIT_ATTRS on your backend. Otherwise, you will have to maintain your list of attributes both in your list of routes and in your SUBMIT_ATTRS . General Troubleshooting Items \u00b6 Making sure packages are up-to-date \u00b6 It is important to make sure that the HTCondor-CE and related RPMs are up-to-date. root@host # yum update \"htcondor-ce*\" blahp condor If you just want to see the packages to update, but do not want to perform the update now, answer N at the prompt. Verify package contents \u00b6 If the contents of your HTCondor-CE packages have been changed, the CE may cease to function properly. To verify the contents of your packages (ignoring changes to configuration files): user@host $ rpm -q --verify htcondor-ce htcondor-ce-client blahp | grep -v '/var/' | awk '$2 != \"c\" {print $0}' If the verification command returns output, this means that your packages have been changed. To fix this, you can reinstall the packages: user@host $ yum reinstall htcondor-ce htcondor-ce-client blahp Note The reinstall command may place original versions of configuration files alongside the versions that you have modified. If this is the case, the reinstall command will notify you that the original versions will have an .rpmnew suffix. Further inspection of these files may be required as to whether or not you need to merge them into your current configuration. Verify clocks are synchronized \u00b6 Like all network-based authentication, HTCondor-CE is sensitive to time skews. Make sure the clock on your CE is synchronized using a utility such as ntpd . Additionally, HTCondor itself is sensitive to time skews on the NFS server. If you see empty stdout / err being returned to the submitter, verify there is no NFS server time skew. HTCondor-CE Troubleshooting Items \u00b6 This section contains common issues you may encounter using HTCondor-CE and next actions to take when you do. Before troubleshooting, we recommend increasing the log level: Write the following into /etc/condor-ce/config.d/99-local.conf to increase the log level for all daemons: ALL_DEBUG = D_ALWAYS:2 D_CAT Ensure that the configuration is in place: root@host # condor_ce_reconfig Reproduce the issue Note Before spending any time on troubleshooting, you should ensure that the state of configuration is as expected by running condor_ce_reconfig . Daemons fail to start \u00b6 If there are errors in your configuration of HTCondor-CE, this may cause some of its required daemons to fail to startup. Check the following subsections in order: Symptoms Daemon startup failure may manifest in many ways, the following are few symptoms of the problem. The service fails to start: root@host # service condor-ce start Starting Condor-CE daemons: [ FAIL ] condor_ce_q fails with a lengthy error message: user@host $ condor_ce_q Error: Extra Info: You probably saw this error because the condor_schedd is not running on the machine you are trying to query. If the condor_schedd is not running, the Condor system will not be able to find an address and port to connect to and satisfy this request. Please make sure the Condor daemons are running and try again. Extra Info: If the condor_schedd is running on the machine you are trying to query and you still see the error, the most likely cause is that you have setup a personal Condor, you have not defined SCHEDD_NAME in your condor_config file, and something is wrong with your SCHEDD_ADDRESS_FILE setting. You must define either or both of those settings in your config file, or you must use the -name option to condor_q. Please see the Condor manual for details on SCHEDD_NAME and SCHEDD_ADDRESS_FILE. Next actions If the MasterLog is filled with ERROR:SECMAN...TCP connection to collector...failed : This is likely due to a misconfiguration for a host with multiple network interfaces. Verify that you have followed the instructions in this section of the optional configuration page. If the MasterLog is filled with DC_AUTHENTICATE errors: The HTCondor-CE daemons use the host certificate to authenticate with each other. Verify that your host certificate\u2019s DN matches one of the regular expressions found in /etc/condor-ce/condor_mapfile . If the SchedLog is filled with Can\u2019t find address for negotiator : You can ignore this error! The negotiator daemon is used in HTCondor batch systems to match jobs with resources but since HTCondor-CE does not manage any resources directly, it does not run one. Jobs fail to submit to the CE \u00b6 If a user is having issues submitting jobs to the CE and you've ruled out general connectivity or firewalls as the culprit, then you may have encountered an authentication or authorization issue. You may see error messages like the following in your SchedLog : 08/30/16 16:52:56 DC_AUTHENTICATE: required authentication of 72.33.0.189 failed: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using GSI|GSI:5002:Failed to authenticate because the remote (client) side was not able to acquire its credentials.|AUTHENTICATE:1004:Failed to authenticate using FS|FS:1004:Unable to lstat(/tmp/FS_XXXZpUlYa) 08/30/16 16:53:12 PERMISSION DENIED to gsi@unmapped from host 72.33.0.189 for command 60021 (DC_NOP_WRITE), access level WRITE: reason: WRITE authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 72.33.0.189,dyn-72-33-0-189.uwnet.wisc.edu, hostname size = 1, original ip address = 72.33.0.189 08/30/16 16:53:12 DC_AUTHENTICATE: Command not authorized, done! The detailed debug output of condor_ce_ping -d can provide useful data from the client side. The following are several potential causes and how to check and correct them. Jobs fail to submit: Verify SSL configuration on the CE \u00b6 Your machine must have a valid host certificate and private key, and the CE must be configured to use them. See the documentation about Configuring Certificates for details. If the CE can't read its host certificate and private key, you will see an error like the following in /var/log/condor-ce/SchedLog if D_SECURITY is enabled in SCHEDD_DEBUG 10/07/21 17:52:01 (D_SECURITY) SSL Auth: Error loading private key from file 10/07/21 17:52:01 (D_SECURITY) SSL Auth: Error initializing server security context 10/07/21 17:52:01 (D_SECURITY) SSL Auth: Error creating SSL context Next actions If your host certificate is installed under /etc/grid-security/ , ensure the CE is configured look for it there (see configuring certificates ). Jobs fail to submit: Verify SSL configuration on the client \u00b6 The CE client tools on the client machine must be configured to recognize the Certificate Authority (CA) that issued the CE's host certificate. If the client tools don't trust your CE's host certificate's CA, then the output of condor_ce_trace -d will include something like the following: 10/07/21 16:39:10 (D_SECURITY) -Error with certificate at depth: 0 10/07/21 16:39:10 (D_SECURITY) issuer = /DC=org/DC=opensciencegrid/C=US/O=OSG Software/CN=OSG Test CA 10/07/21 16:39:10 (D_SECURITY) subject = /DC=org/DC=opensciencegrid/C=US/O=OSG Software/OU=Services/CN=4c75de0db10c.htcondor.org 10/07/21 16:39:10 (D_SECURITY) err 20:unable to get local issuer certificate 10/07/21 16:39:10 (D_SECURITY) Tried to connect: -1 10/07/21 16:39:10 (D_SECURITY) SSL: library failure: error:14090086:SSL routines:ssl3_get_server_certificate:certificate verify failed If your CE is using a grid certificate (i.e. one installed under /etc/grid-security/ ), then the client machine will need an /etc/grid-security/certificates/ directory containing the CA files for your grid certificate, and the CE client tools must be configured to look there for the CA files. The CE configuration files on the client machine will need to include the following: AUTH_SSL_CLIENT_CADIR = /etc/grid-security/certificates Jobs fail to submit: Verify SciToken contents \u00b6 If SciTokens is the authentication method being used, you can examine the token's payload for some common errors. If you have access to the token itself, you can decode it at jwt.io . The token's payload will appear in /var/log/condor-ce/AuditLog* files, like so: 10/05/21 18:34:06 (D_AUDIT) Examining SciToken with payload {}. The token's payload will look something like this: { \"aud\": \"ANY\", \"ver\": \"scitokens:2.0\", \"scope\": \"condor:/READ condor:/WRITE\", \"exp\": 1633488473, \"sub\": \"htcondor-ce-dev\", \"iss\": \"https://demo.scitokens.org\", \"iat\": 1633459675, \"nbf\": 1633459675, \"jti\": \"cb84b7af-ed21-450d-a50e-552a5cd2904c\" } Next actions If any of the following checks fail, the user will need a new, corrected, token. Check that the aud (audience) value is either ANY , https://wlcg.cern.ch/jwt/v1/any , or matches one of the items from condor_ce_config_val SCITOKENS_SERVER_AUDIENCE (i.e. :9619 ). Tokens with an invalid aud value will appear in /var/log/condor-ce/SchedLog with the following errors if D_SECURITY is enabled in SCHEDD_DEBUG : 10/07/21 15:55:39 (D_SECURITY) SCITOKENS:2:Failed to verify token and generate ACLs: token verification failed: 'aud' claim verification failed. Check that the scope value includes the string condor:/READ condor:/WRITE or compute.cancel compute.create compute.modify compute.read . Tokens with an invalid scope value will appear in /var/log/condor-ce/SchedLog with the following errors: 10/05/21 18:41:50 (D_ALWAYS) DC_AUTHENTICATE: authentication of <172.17.0.3:40489> was successful but resulted in a limited authorization which did not include this command (60021 DC_NOP_WRITE), so aborting. Check that the exp (expiration) value is in the future. Tokens that have expired will appear in /var/log/condor-ce/SchedLog with the following errors if D_SECURITY is enabled in SCHEDD_DEBUG : 10/05/21 18:10:55 (D_SECURITY) SCITOKENS:2:Failed to deserialize scitoken: token verification failed: token expired Check that the nbf (not before) value is in the past. Jobs fail to submit: Check user mapping \u00b6 The CE must be able to map the identity of the job submitter to a local OS account, used for storing the job sandbox and running the job under the local batch system. This mapping is done via a set of mapfiles . If no mapping is available, then job submission will fail. If a SciToken can't be mapped and the D_SECURITY debug level is enabled, then you will see this in the SchedLog file: 10/05/21 18:56:04 (D_SECURITY) Failed to map SCITOKENS authenticated identity 'https://demo.scitokens.org,htcondor-ce-dev', failing authentication to give another authentication method a go. Next actions Check the files in /etc/condor-ce/mapfiles.d/ and ensure that the user's authentication method and identity are present (possibly via a regular expression), and that the mapped OS account exists on your CE and cluster. Jobs stay idle on the CE \u00b6 Check the following subsections in order, but note that jobs may take several minutes or longer to run if the CE is busy. Idle jobs on CE: Make sure the underlying batch system can run jobs \u00b6 HTCondor-CE delegates jobs to your batch system, which is then responsible for matching jobs to worker nodes. If you cannot manually submit jobs (e.g., condor_submit , qsub ) on the CE host to your batch system, then HTCondor-CE won't be able to either. Procedure Manually create and submit a simple job (e.g., one that runs sleep ) Check for errors in the submission itself Watch the job in the batch system queue (e.g., condor_q , qstat ) If the job does not run, check for errors on the batch system Next actions Consult troubleshooting documentation or support avenues for your batch system. Once you can run simple manual jobs on your batch system, try submitting to the HTCondor-CE again. Idle jobs on CE: Is the job router handling the incoming job? \u00b6 Jobs on the CE will be put on hold if they do not match any job routes after 30 minutes, but you can check a few things if you suspect that the jobs are not being matched. Check if the JobRouter sees a job before that by looking at the job router log and looking for the text src=\u2026claimed job . Next actions Use condor_ce_job_router_info to see why your idle job does not match any routes Idle jobs on CE: Verify correct operation between the CE and your local batch system \u00b6 For HTCondor batch systems \u00b6 HTCondor-CE submits jobs directly to an HTCondor batch system via the JobRouter, so any issues with the CE/local batch system interaction will appear in the JobRouterLog . Next actions Check the JobRouterLog for failures. Verify that the local HTCondor is functional. Use condor_ce_config_val to verify that the JOB_ROUTER_SCHEDD2_NAME , JOB_ROUTER_SCHEDD2_POOL , and JOB_ROUTER_SCHEDD2_SPOOL configuration variables are set to the hostname of your CE, the hostname and port of your local HTCondor\u2019s collector, and the location of your local HTCondor\u2019s spool directory, respectively. Use condor_config_val QUEUE_SUPER_USER_MAY_IMPERSONATE and verify that it is set to .* . For non-HTCondor batch systems \u00b6 HTCondor-CE submits jobs to a non-HTCondor batch system via the Gridmanager, so any issues with the CE/local batch system interaction will appear in the GridmanagerLog . Look for gm state change\u2026 lines to figure out where the issues are occurring. Next actions If you see failures in the GridmanagerLog during job submission: Save the submit files by adding the appropriate entry to blah.config and submit it manually to the batch system. If that succeeds, make sure that the BLAHP knows where your binaries are located by setting the _binpath in /etc/blah.config . If you see failures in the GridmanagerLog during queries for job status: Query the resultant job with your batch system tools from the CE. If you can, the BLAHP uses scripts to query for status in /usr/libexec/blahp/_status.sh (e.g., /usr/libexec/blahp/lsf_status.sh ) that take the argument batch system/YYYMMDD/job ID (e.g., lsf/20141008/65053 ). Run the appropriate status script for your batch system and upon success, you should see the following output: root@host # /usr/libexec/blahp/lsf_status.sh lsf/20141008/65053 [ BatchjobId = \"894862\"; JobStatus = 4; ExitCode = 0; WorkerNode = \"atl-prod08\" ] If the script fails, request help from the OSG. Idle jobs on CE: Verify ability to change permissions on key files \u00b6 HTCondor-CE needs the ability to write and chown files in its spool directory and if it cannot, jobs will not run at all. Spool permission errors can appear in the SchedLog and the JobRouterLog . Symptoms 09/17/14 14:45:42 Error: Unable to chown '/var/lib/condor-ce/spool/1/0/cluster1.proc0.subproc0/env' from 12345 to 54321 Next actions As root, try to change ownership of the file or directory in question. If the file does not exist, a parent directory may have improper permissions. Verify that there aren't any underlying file system issues in the specified location Jobs stay idle on a remote host submitting to the CE \u00b6 If you are submitting your job from a separate submit host to the CE, it stays idle in the queue forever, and you do not see a resultant job in the CE's queue, this means that your job cannot contact the CE for submission or it is not authorized to run there. Note that jobs may take several minutes or longer if the CE is busy. Remote idle jobs: Can you contact the CE? \u00b6 To check basic connectivity to a CE, use condor_ce_ping : Symptoms user@host $ condor_ping -verbose -name condorce.example.com -pool condorce.example.com:9619 WRITE ERROR: couldn't locate condorce.example.com! Next actions Make sure that the HTCondor-CE daemons are running with condor_ce_status . Verify that your CE is reachable from your submit host, replacing condorce.example.com with the hostname of your CE: user@host $ ping condorce.example.com Remote idle jobs: Are you authorized to run jobs on the CE? \u00b6 The CE will only run jobs from users that authenticate through the HTCondor-CE configuration . You can use condor_ce_ping to check if you are authorized and what user your proxy is being mapped to. Symptoms user@host $ condor_ping -verbose -name condorce.example.com -pool condorce.example.com:9619 WRITE Remote Version: $CondorVersion: 8.0.7 Sep 24 2014 $ Local Version: $CondorVersion: 8.0.7 Sep 24 2014 $ Session ID: condorce:3343:1412790611:0 Instruction: WRITE Command: 60021 Encryption: none Integrity: MD5 Authenticated using: GSI All authentication methods: GSI Remote Mapping: gsi@unmapped Authorized: FALSE Notice the failures in the above message: Remote Mapping: gsi@unmapped and Authorized: FALSE Next actions Verify that an authentication method is set up on the CE Verify that your user DN is mapped to an existing system user Jobs go on hold \u00b6 Jobs will be put on held with a HoldReason attribute that can be inspected with condor_ce_q : user@host $ condor_ce_q -l -attr HoldReason HoldReason = \"CE job in status 5 put on hold by SYSTEM_PERIODIC_HOLD due to no matching routes, route job limit, or route failure threshold.\" Held jobs: no matching routes, route job limit, or route failure threshold \u00b6 Jobs on the CE will be put on hold if they are not claimed by the job router within 30 minutes. The most common cases for this behavior are as follows: The job does not match any job routes: use condor_ce_job_router_info to see why your idle job does not match any routes . The route(s) that the job matches to are full: See limiting the number of jobs . The job router is throttling submission to your batch system due to submission failures: See the HTCondor manual for FailureRateThreshold . Check for errors in the JobRouterLog or GridmanagerLog for HTCondor and non-HTCondor batch systems, respectively. Note It is expected that jobs from remote submitters will temporarily be held with Spooling input data files as the reason. Once the input files have transferred the job will continue. Held jobs: Missing/expired user proxy \u00b6 HTCondor-CE requires a valid user proxy for each job that is submitted. You can check the status of your proxy with the following user@host $ voms-proxy-info -all Next actions Ensure that the owner of the job generates their proxy with voms-proxy-init . Held jobs: Invalid job universe \u00b6 The HTCondor-CE only accepts jobs that have universe in their submit files set to vanilla , standard , local , or scheduler . These universes also have corresponding integer values that can be found in the HTCondor manual . Next actions Ensure jobs submitted locally, from the CE host, are submitted with universe = vanilla Ensure jobs submitted from a remote submit point are submitted with: universe = grid grid_resource = condor condorce.example.com condorce.example.com:9619 replacing condorce.example.com with the hostname of the CE. Identifying the corresponding job ID on the local batch system \u00b6 When troubleshooting interactions between your CE and your local batch system, you will need to associate the CE job ID and the resultant job ID on the batch system. The methods for finding the resultant job ID differs between batch systems. HTCondor batch systems \u00b6 To inspect the CE\u2019s job ad, use condor_ce_q or condor_ce_history : Use condor_ce_q if the job is still in the CE\u2019s queue: user@host $ condor_ce_q -af RoutedToJobId Use condor_ce_history if the job has left the CE\u2019s queue: user@host $ condor_ce_history -af RoutedToJobId Parse the JobRouterLog for the CE\u2019s job ID. Non-HTCondor batch systems \u00b6 When HTCondor-CE records the corresponding batch system job ID, it is written in the form // : lsf/20141206/482046 To inspect the CE\u2019s job ad, use condor_ce_q : user@host $ condor_ce_q -af GridJobId Parse the GridmanagerLog for the CE\u2019s job ID. Jobs removed from the local HTCondor pool become resubmitted (HTCondor batch systems only) \u00b6 By design, HTCondor-CE will resubmit jobs that have been removed from the underlying HTCondor pool. Therefore, to remove misbehaving jobs, they will need to be removed on the CE level following these steps: Identify the misbehaving job ID in your batch system queue Find the job's corresponding CE job ID: user@host $ condor_q -af RoutedFromJobId Use condor_ce_rm to remove the CE job from the queue Missing HTCondor tools \u00b6 Most of the HTCondor-CE tools are just wrappers around existing HTCondor tools that load the CE-specific configuration. If you are trying to use HTCondor-CE tools and you see the following error: user@host $ condor_ce_job_router_info /usr/bin/condor_ce_job_router_info: line 6: exec: condor_job_router_info: not found This means that the condor_job_router_info (note this is not the CE version), is not in your PATH . Next Actions Either the condor RPM is missing or there are some other issues with it (try rpm --verify condor ). You have installed HTCondor in a non-standard location that is not in your PATH . The condor_job_router_info tool itself wasn't available until Condor-8.2.3-1.1 (available in osg-upcoming). Getting Help \u00b6 If you have any questions or issues about troubleshooting remote HTCondor-CEs, please contact us for assistance.","title":"Common Issues"},{"location":"v6/troubleshooting/common-issues/#common-issues","text":"","title":"Common Issues"},{"location":"v6/troubleshooting/common-issues/#known-issues","text":"","title":"Known Issues"},{"location":"v6/troubleshooting/common-issues/#submit_attrs-are-not-applied-to-jobs-on-the-local-htcondor","text":"If you are adding attributes to jobs submitted to your HTCondor pool with SUBMIT_ATTRS , these will not be applied to jobs that are entering your pool from the HTCondor-CE. To get around this, you will want to add the attributes to your job routes . If the CE is the only entry point for jobs into your pool, you can get rid of SUBMIT_ATTRS on your backend. Otherwise, you will have to maintain your list of attributes both in your list of routes and in your SUBMIT_ATTRS .","title":"SUBMIT_ATTRS are not applied to jobs on the local HTCondor"},{"location":"v6/troubleshooting/common-issues/#general-troubleshooting-items","text":"","title":"General Troubleshooting Items"},{"location":"v6/troubleshooting/common-issues/#making-sure-packages-are-up-to-date","text":"It is important to make sure that the HTCondor-CE and related RPMs are up-to-date. root@host # yum update \"htcondor-ce*\" blahp condor If you just want to see the packages to update, but do not want to perform the update now, answer N at the prompt.","title":"Making sure packages are up-to-date"},{"location":"v6/troubleshooting/common-issues/#verify-package-contents","text":"If the contents of your HTCondor-CE packages have been changed, the CE may cease to function properly. To verify the contents of your packages (ignoring changes to configuration files): user@host $ rpm -q --verify htcondor-ce htcondor-ce-client blahp | grep -v '/var/' | awk '$2 != \"c\" {print $0}' If the verification command returns output, this means that your packages have been changed. To fix this, you can reinstall the packages: user@host $ yum reinstall htcondor-ce htcondor-ce-client blahp Note The reinstall command may place original versions of configuration files alongside the versions that you have modified. If this is the case, the reinstall command will notify you that the original versions will have an .rpmnew suffix. Further inspection of these files may be required as to whether or not you need to merge them into your current configuration.","title":"Verify package contents"},{"location":"v6/troubleshooting/common-issues/#verify-clocks-are-synchronized","text":"Like all network-based authentication, HTCondor-CE is sensitive to time skews. Make sure the clock on your CE is synchronized using a utility such as ntpd . Additionally, HTCondor itself is sensitive to time skews on the NFS server. If you see empty stdout / err being returned to the submitter, verify there is no NFS server time skew.","title":"Verify clocks are synchronized"},{"location":"v6/troubleshooting/common-issues/#htcondor-ce-troubleshooting-items","text":"This section contains common issues you may encounter using HTCondor-CE and next actions to take when you do. Before troubleshooting, we recommend increasing the log level: Write the following into /etc/condor-ce/config.d/99-local.conf to increase the log level for all daemons: ALL_DEBUG = D_ALWAYS:2 D_CAT Ensure that the configuration is in place: root@host # condor_ce_reconfig Reproduce the issue Note Before spending any time on troubleshooting, you should ensure that the state of configuration is as expected by running condor_ce_reconfig .","title":"HTCondor-CE Troubleshooting Items"},{"location":"v6/troubleshooting/common-issues/#daemons-fail-to-start","text":"If there are errors in your configuration of HTCondor-CE, this may cause some of its required daemons to fail to startup. Check the following subsections in order: Symptoms Daemon startup failure may manifest in many ways, the following are few symptoms of the problem. The service fails to start: root@host # service condor-ce start Starting Condor-CE daemons: [ FAIL ] condor_ce_q fails with a lengthy error message: user@host $ condor_ce_q Error: Extra Info: You probably saw this error because the condor_schedd is not running on the machine you are trying to query. If the condor_schedd is not running, the Condor system will not be able to find an address and port to connect to and satisfy this request. Please make sure the Condor daemons are running and try again. Extra Info: If the condor_schedd is running on the machine you are trying to query and you still see the error, the most likely cause is that you have setup a personal Condor, you have not defined SCHEDD_NAME in your condor_config file, and something is wrong with your SCHEDD_ADDRESS_FILE setting. You must define either or both of those settings in your config file, or you must use the -name option to condor_q. Please see the Condor manual for details on SCHEDD_NAME and SCHEDD_ADDRESS_FILE. Next actions If the MasterLog is filled with ERROR:SECMAN...TCP connection to collector...failed : This is likely due to a misconfiguration for a host with multiple network interfaces. Verify that you have followed the instructions in this section of the optional configuration page. If the MasterLog is filled with DC_AUTHENTICATE errors: The HTCondor-CE daemons use the host certificate to authenticate with each other. Verify that your host certificate\u2019s DN matches one of the regular expressions found in /etc/condor-ce/condor_mapfile . If the SchedLog is filled with Can\u2019t find address for negotiator : You can ignore this error! The negotiator daemon is used in HTCondor batch systems to match jobs with resources but since HTCondor-CE does not manage any resources directly, it does not run one.","title":"Daemons fail to start"},{"location":"v6/troubleshooting/common-issues/#jobs-fail-to-submit-to-the-ce","text":"If a user is having issues submitting jobs to the CE and you've ruled out general connectivity or firewalls as the culprit, then you may have encountered an authentication or authorization issue. You may see error messages like the following in your SchedLog : 08/30/16 16:52:56 DC_AUTHENTICATE: required authentication of 72.33.0.189 failed: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using GSI|GSI:5002:Failed to authenticate because the remote (client) side was not able to acquire its credentials.|AUTHENTICATE:1004:Failed to authenticate using FS|FS:1004:Unable to lstat(/tmp/FS_XXXZpUlYa) 08/30/16 16:53:12 PERMISSION DENIED to gsi@unmapped from host 72.33.0.189 for command 60021 (DC_NOP_WRITE), access level WRITE: reason: WRITE authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 72.33.0.189,dyn-72-33-0-189.uwnet.wisc.edu, hostname size = 1, original ip address = 72.33.0.189 08/30/16 16:53:12 DC_AUTHENTICATE: Command not authorized, done! The detailed debug output of condor_ce_ping -d can provide useful data from the client side. The following are several potential causes and how to check and correct them.","title":"Jobs fail to submit to the CE"},{"location":"v6/troubleshooting/common-issues/#jobs-fail-to-submit-verify-ssl-configuration-on-the-ce","text":"Your machine must have a valid host certificate and private key, and the CE must be configured to use them. See the documentation about Configuring Certificates for details. If the CE can't read its host certificate and private key, you will see an error like the following in /var/log/condor-ce/SchedLog if D_SECURITY is enabled in SCHEDD_DEBUG 10/07/21 17:52:01 (D_SECURITY) SSL Auth: Error loading private key from file 10/07/21 17:52:01 (D_SECURITY) SSL Auth: Error initializing server security context 10/07/21 17:52:01 (D_SECURITY) SSL Auth: Error creating SSL context Next actions If your host certificate is installed under /etc/grid-security/ , ensure the CE is configured look for it there (see configuring certificates ).","title":"Jobs fail to submit: Verify SSL configuration on the CE"},{"location":"v6/troubleshooting/common-issues/#jobs-fail-to-submit-verify-ssl-configuration-on-the-client","text":"The CE client tools on the client machine must be configured to recognize the Certificate Authority (CA) that issued the CE's host certificate. If the client tools don't trust your CE's host certificate's CA, then the output of condor_ce_trace -d will include something like the following: 10/07/21 16:39:10 (D_SECURITY) -Error with certificate at depth: 0 10/07/21 16:39:10 (D_SECURITY) issuer = /DC=org/DC=opensciencegrid/C=US/O=OSG Software/CN=OSG Test CA 10/07/21 16:39:10 (D_SECURITY) subject = /DC=org/DC=opensciencegrid/C=US/O=OSG Software/OU=Services/CN=4c75de0db10c.htcondor.org 10/07/21 16:39:10 (D_SECURITY) err 20:unable to get local issuer certificate 10/07/21 16:39:10 (D_SECURITY) Tried to connect: -1 10/07/21 16:39:10 (D_SECURITY) SSL: library failure: error:14090086:SSL routines:ssl3_get_server_certificate:certificate verify failed If your CE is using a grid certificate (i.e. one installed under /etc/grid-security/ ), then the client machine will need an /etc/grid-security/certificates/ directory containing the CA files for your grid certificate, and the CE client tools must be configured to look there for the CA files. The CE configuration files on the client machine will need to include the following: AUTH_SSL_CLIENT_CADIR = /etc/grid-security/certificates","title":"Jobs fail to submit: Verify SSL configuration on the client"},{"location":"v6/troubleshooting/common-issues/#jobs-fail-to-submit-verify-scitoken-contents","text":"If SciTokens is the authentication method being used, you can examine the token's payload for some common errors. If you have access to the token itself, you can decode it at jwt.io . The token's payload will appear in /var/log/condor-ce/AuditLog* files, like so: 10/05/21 18:34:06 (D_AUDIT) Examining SciToken with payload {}. The token's payload will look something like this: { \"aud\": \"ANY\", \"ver\": \"scitokens:2.0\", \"scope\": \"condor:/READ condor:/WRITE\", \"exp\": 1633488473, \"sub\": \"htcondor-ce-dev\", \"iss\": \"https://demo.scitokens.org\", \"iat\": 1633459675, \"nbf\": 1633459675, \"jti\": \"cb84b7af-ed21-450d-a50e-552a5cd2904c\" } Next actions If any of the following checks fail, the user will need a new, corrected, token. Check that the aud (audience) value is either ANY , https://wlcg.cern.ch/jwt/v1/any , or matches one of the items from condor_ce_config_val SCITOKENS_SERVER_AUDIENCE (i.e. :9619 ). Tokens with an invalid aud value will appear in /var/log/condor-ce/SchedLog with the following errors if D_SECURITY is enabled in SCHEDD_DEBUG : 10/07/21 15:55:39 (D_SECURITY) SCITOKENS:2:Failed to verify token and generate ACLs: token verification failed: 'aud' claim verification failed. Check that the scope value includes the string condor:/READ condor:/WRITE or compute.cancel compute.create compute.modify compute.read . Tokens with an invalid scope value will appear in /var/log/condor-ce/SchedLog with the following errors: 10/05/21 18:41:50 (D_ALWAYS) DC_AUTHENTICATE: authentication of <172.17.0.3:40489> was successful but resulted in a limited authorization which did not include this command (60021 DC_NOP_WRITE), so aborting. Check that the exp (expiration) value is in the future. Tokens that have expired will appear in /var/log/condor-ce/SchedLog with the following errors if D_SECURITY is enabled in SCHEDD_DEBUG : 10/05/21 18:10:55 (D_SECURITY) SCITOKENS:2:Failed to deserialize scitoken: token verification failed: token expired Check that the nbf (not before) value is in the past.","title":"Jobs fail to submit: Verify SciToken contents"},{"location":"v6/troubleshooting/common-issues/#jobs-fail-to-submit-check-user-mapping","text":"The CE must be able to map the identity of the job submitter to a local OS account, used for storing the job sandbox and running the job under the local batch system. This mapping is done via a set of mapfiles . If no mapping is available, then job submission will fail. If a SciToken can't be mapped and the D_SECURITY debug level is enabled, then you will see this in the SchedLog file: 10/05/21 18:56:04 (D_SECURITY) Failed to map SCITOKENS authenticated identity 'https://demo.scitokens.org,htcondor-ce-dev', failing authentication to give another authentication method a go. Next actions Check the files in /etc/condor-ce/mapfiles.d/ and ensure that the user's authentication method and identity are present (possibly via a regular expression), and that the mapped OS account exists on your CE and cluster.","title":"Jobs fail to submit: Check user mapping"},{"location":"v6/troubleshooting/common-issues/#jobs-stay-idle-on-the-ce","text":"Check the following subsections in order, but note that jobs may take several minutes or longer to run if the CE is busy.","title":"Jobs stay idle on the CE"},{"location":"v6/troubleshooting/common-issues/#idle-jobs-on-ce-make-sure-the-underlying-batch-system-can-run-jobs","text":"HTCondor-CE delegates jobs to your batch system, which is then responsible for matching jobs to worker nodes. If you cannot manually submit jobs (e.g., condor_submit , qsub ) on the CE host to your batch system, then HTCondor-CE won't be able to either. Procedure Manually create and submit a simple job (e.g., one that runs sleep ) Check for errors in the submission itself Watch the job in the batch system queue (e.g., condor_q , qstat ) If the job does not run, check for errors on the batch system Next actions Consult troubleshooting documentation or support avenues for your batch system. Once you can run simple manual jobs on your batch system, try submitting to the HTCondor-CE again.","title":"Idle jobs on CE: Make sure the underlying batch system can run jobs"},{"location":"v6/troubleshooting/common-issues/#idle-jobs-on-ce-is-the-job-router-handling-the-incoming-job","text":"Jobs on the CE will be put on hold if they do not match any job routes after 30 minutes, but you can check a few things if you suspect that the jobs are not being matched. Check if the JobRouter sees a job before that by looking at the job router log and looking for the text src=\u2026claimed job . Next actions Use condor_ce_job_router_info to see why your idle job does not match any routes","title":"Idle jobs on CE: Is the job router handling the incoming job?"},{"location":"v6/troubleshooting/common-issues/#idle-jobs-on-ce-verify-correct-operation-between-the-ce-and-your-local-batch-system","text":"","title":"Idle jobs on CE: Verify correct operation between the CE and your local batch system"},{"location":"v6/troubleshooting/common-issues/#for-htcondor-batch-systems","text":"HTCondor-CE submits jobs directly to an HTCondor batch system via the JobRouter, so any issues with the CE/local batch system interaction will appear in the JobRouterLog . Next actions Check the JobRouterLog for failures. Verify that the local HTCondor is functional. Use condor_ce_config_val to verify that the JOB_ROUTER_SCHEDD2_NAME , JOB_ROUTER_SCHEDD2_POOL , and JOB_ROUTER_SCHEDD2_SPOOL configuration variables are set to the hostname of your CE, the hostname and port of your local HTCondor\u2019s collector, and the location of your local HTCondor\u2019s spool directory, respectively. Use condor_config_val QUEUE_SUPER_USER_MAY_IMPERSONATE and verify that it is set to .* .","title":"For HTCondor batch systems"},{"location":"v6/troubleshooting/common-issues/#for-non-htcondor-batch-systems","text":"HTCondor-CE submits jobs to a non-HTCondor batch system via the Gridmanager, so any issues with the CE/local batch system interaction will appear in the GridmanagerLog . Look for gm state change\u2026 lines to figure out where the issues are occurring. Next actions If you see failures in the GridmanagerLog during job submission: Save the submit files by adding the appropriate entry to blah.config and submit it manually to the batch system. If that succeeds, make sure that the BLAHP knows where your binaries are located by setting the