From 5567c6af51a7fbee7dcd4fba0ede239e22bacea4 Mon Sep 17 00:00:00 2001 From: <> Date: Tue, 11 Jun 2024 22:25:43 +0000 Subject: [PATCH] Deployed 29ca33935 with MkDocs version: 1.3.0 --- release/updating-to-osg-23/index.html | 3 +-- search/search_index.json | 2 +- sitemap.xml.gz | Bin 860 -> 860 bytes 3 files changed, 2 insertions(+), 3 deletions(-) diff --git a/release/updating-to-osg-23/index.html b/release/updating-to-osg-23/index.html index f22a9e386..1d7f0a03d 100644 --- a/release/updating-to-osg-23/index.html +++ b/release/updating-to-osg-23/index.html @@ -2115,8 +2115,7 @@
After updating your RPMs and updating your configuration, turn on the HTCondor-CE service:
- :::console
- root@host # systemctl start condor-ce
+root@host # systemctl start condor-ce
Updating Your HTCondor Hosts¶
diff --git a/search/search_index.json b/search/search_index.json
index 6e3e813d8..ef7e91199 100644
--- a/search/search_index.json
+++ b/search/search_index.json
@@ -1 +1 @@
-{"config":{"indexing":"full","lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"OSG Site Documentation \u00b6 User documentation If you are a researcher interested in accessing OSG computational capacity, please consult our user documentation instead. The OSG Consortium provides common service and support for capacity providers and scientific institutions (i.e., \"sites\") using a distributed fabric of high throughput computational services. The OSG Consortium does not own computational capacity but provides software and services to users and capacity providers alike to enable the opportunistic usage and sharing of capacity. This documentation aims to provide HTC/HPC system administrators with the necessary information to contribute computational capacity to the OSG Consortium. Contributing to the OSG \u00b6 We offer two models for sites to contribute capacity to the OSG Consortium: one where OSG staff hosts and maintains capacity provisioning services for users; and the traditional model where the site hosts and maintains these same services. In both of these cases, the following will be needed: An existing compute cluster running on a supported operating system with a supported batch system: Grid Engine , HTCondor , LSF , PBS Pro / Torque , or Slurm . Outbound network connectivity from your cluster's worker nodes Temporary scratch space on each worker node Don't meet the requirements? If your site does not meet the above conditions, please contact us to discuss your options for contributing to the OSG Consortium. OSG-hosted services \u00b6 To contribute computational capacity with OSG-hosted services, your site will also need the following: Allow SSH access to your local cluster's login host from a known IP address Shared home directories on each cluster node Next steps If you are interested in OSG-hosted services, please contact us for a consultation, even if your site does not meet the conditions as outlined above! Self-hosted services \u00b6 If you are interested in contributing capacity by hosting your own OSG services, please continue with the site planning page.","title":"Home"},{"location":"#osg-site-documentation","text":"User documentation If you are a researcher interested in accessing OSG computational capacity, please consult our user documentation instead. The OSG Consortium provides common service and support for capacity providers and scientific institutions (i.e., \"sites\") using a distributed fabric of high throughput computational services. The OSG Consortium does not own computational capacity but provides software and services to users and capacity providers alike to enable the opportunistic usage and sharing of capacity. This documentation aims to provide HTC/HPC system administrators with the necessary information to contribute computational capacity to the OSG Consortium.","title":"OSG Site Documentation"},{"location":"#contributing-to-the-osg","text":"We offer two models for sites to contribute capacity to the OSG Consortium: one where OSG staff hosts and maintains capacity provisioning services for users; and the traditional model where the site hosts and maintains these same services. In both of these cases, the following will be needed: An existing compute cluster running on a supported operating system with a supported batch system: Grid Engine , HTCondor , LSF , PBS Pro / Torque , or Slurm . Outbound network connectivity from your cluster's worker nodes Temporary scratch space on each worker node Don't meet the requirements? If your site does not meet the above conditions, please contact us to discuss your options for contributing to the OSG Consortium.","title":"Contributing to the OSG"},{"location":"#osg-hosted-services","text":"To contribute computational capacity with OSG-hosted services, your site will also need the following: Allow SSH access to your local cluster's login host from a known IP address Shared home directories on each cluster node Next steps If you are interested in OSG-hosted services, please contact us for a consultation, even if your site does not meet the conditions as outlined above!","title":"OSG-hosted services"},{"location":"#self-hosted-services","text":"If you are interested in contributing capacity by hosting your own OSG services, please continue with the site planning page.","title":"Self-hosted services"},{"location":"site-maintenance/","text":"Site Maintenance \u00b6 This document outlines how to maintain your OSG site, including steps to take if you suspect that OSG jobs are causing issues. Handle Misbehaving Jobs \u00b6 In rare instances, you may experience issues at your site caused by misbehaving jobs (e.g., over-utilization of memory) from an OSG community or Virtual Organization (VO). If this occurs, you should immediately stop accepting job submissions from the OSG and remove the offending jobs: Configure your batch system to stop accepting jobs from the VO: For HTCondor batch systems, set the following in /etc/condor/config.d/ on your HTCondor-CE or Access Point accepting jobs from an OSG Hosted CE: SUBMIT_REQUIREMENT_Ban_OSG = (Owner != \"\") SUBMIT_REQUIREMENT_Ban_OSG_REASON = \"OSG pilot job submission temporarily disabled\" SUBMIT_REQUIREMENT_NAMES = $(SUBMIT_REQUIREMENT_NAMES) Ban_OSG Replacing with the name of the local Unix account corresponding to the problematic VO. For Slurm batch systems, disable the relevant Slurm partition : [root@host] # scontrol update PartitionName = State = DOWN Replacing with the name of the partition where you are sending OSG jobs. Remove the VO's jobs: For HTCondor batch systems, run the following command on your HTCondor-CE or Access Point accepting jobs from an OSG Hosted CE: [root@access-point] # condor_rm Replacing with the name of the local Unix account corresponding to the problematic VO. For Slurm batch systems, run the following command: [root@host] # scancel -u Replacing with the name of the local Unix account corresponding to the problematic VO. Let us know so that we can track down the offending software or user: the same issue that you're experiencing may also be affecting other sites! Keep OSG Software Updated \u00b6 It is important to keep your software and data (e.g., CAs and VO client) up-to-date with the latest OSG release. See the release notes for your installed release series: OSG 3.6 release notes To stay abreast of software releases, we recommend subscribing to the osg-sites@opensciencegrid.org mailing list. Notify OSG of Major Changes \u00b6 To avoid potential issues with OSG job submissions, please notify us of major changes to your site, including: Major OS version changes on the worker nodes (e.g., upgraded from EL 7 to EL 8) Adding or removing container support through singularity or apptainer Policy changes regarding OSG resource requests (e.g., number of cores or GPUs, memory usage, or maximum walltime) Scheduled or unscheduled downtimes Site topology changes such as additions, modifications, or retirements of OSG services Changes to site contacts, such as administrative or security staff Help \u00b6 If you need help with your site, or need to report a security incident, follow the contact instructions .","title":"Site Maintenance"},{"location":"site-maintenance/#site-maintenance","text":"This document outlines how to maintain your OSG site, including steps to take if you suspect that OSG jobs are causing issues.","title":"Site Maintenance"},{"location":"site-maintenance/#handle-misbehaving-jobs","text":"In rare instances, you may experience issues at your site caused by misbehaving jobs (e.g., over-utilization of memory) from an OSG community or Virtual Organization (VO). If this occurs, you should immediately stop accepting job submissions from the OSG and remove the offending jobs: Configure your batch system to stop accepting jobs from the VO: For HTCondor batch systems, set the following in /etc/condor/config.d/ on your HTCondor-CE or Access Point accepting jobs from an OSG Hosted CE: SUBMIT_REQUIREMENT_Ban_OSG = (Owner != \"\") SUBMIT_REQUIREMENT_Ban_OSG_REASON = \"OSG pilot job submission temporarily disabled\" SUBMIT_REQUIREMENT_NAMES = $(SUBMIT_REQUIREMENT_NAMES) Ban_OSG Replacing with the name of the local Unix account corresponding to the problematic VO. For Slurm batch systems, disable the relevant Slurm partition : [root@host] # scontrol update PartitionName = State = DOWN Replacing with the name of the partition where you are sending OSG jobs. Remove the VO's jobs: For HTCondor batch systems, run the following command on your HTCondor-CE or Access Point accepting jobs from an OSG Hosted CE: [root@access-point] # condor_rm Replacing with the name of the local Unix account corresponding to the problematic VO. For Slurm batch systems, run the following command: [root@host] # scancel -u Replacing with the name of the local Unix account corresponding to the problematic VO. Let us know so that we can track down the offending software or user: the same issue that you're experiencing may also be affecting other sites!","title":"Handle Misbehaving Jobs"},{"location":"site-maintenance/#keep-osg-software-updated","text":"It is important to keep your software and data (e.g., CAs and VO client) up-to-date with the latest OSG release. See the release notes for your installed release series: OSG 3.6 release notes To stay abreast of software releases, we recommend subscribing to the osg-sites@opensciencegrid.org mailing list.","title":"Keep OSG Software Updated"},{"location":"site-maintenance/#notify-osg-of-major-changes","text":"To avoid potential issues with OSG job submissions, please notify us of major changes to your site, including: Major OS version changes on the worker nodes (e.g., upgraded from EL 7 to EL 8) Adding or removing container support through singularity or apptainer Policy changes regarding OSG resource requests (e.g., number of cores or GPUs, memory usage, or maximum walltime) Scheduled or unscheduled downtimes Site topology changes such as additions, modifications, or retirements of OSG services Changes to site contacts, such as administrative or security staff","title":"Notify OSG of Major Changes"},{"location":"site-maintenance/#help","text":"If you need help with your site, or need to report a security incident, follow the contact instructions .","title":"Help"},{"location":"site-planning/","text":"Site Planning \u00b6 The OSG vision is to integrate computing across different resource types and business models to allow campus IT to offer a maximally flexible high throughput computing (HTC) environment for their researchers. This document is for System Administrators and aims to provide an overview of the different options to consider when planning to share resources via the OSG. After reading, you should be able to understand what software or services you want to provide to support your researchers Note This document covers the most common options. OSG is a diverse infrastructure: depending on what groups you want to support, you may need to install additional services. Coordinate with your local researchers. OSG Site Services \u00b6 The OSG Software stack tries to provide a uniform computing and storage fabric across many independently-managed computing and storage resources. These individual services will be accessed by virtual organizations (VOs), which will delegate the resources to scientists, researchers, and students. Sharing is a fundamental principle for the OSG: your site is encouraged to support as many OSG-registered VOs as local conditions allow. Autonomy is another principle: you are not required to support any VOs you do not want. As the administrator, your task is to make your existing computing and storage resources available to and reliable for your supported VOs. We break this down into three tasks: Getting \"pilot jobs\" submitted to your site batch system. Establishing an OSG runtime environment for running jobs. Delivering data to payload applications to be processed. There are multiple approaches for each item, depending on the VOs you support, and time you have to invest in the OSG. Note An essential concept in the OSG is the \"pilot job\". The pilot, which arrives at your batch system, is sent by the VO to get a resource allocation. However, it does not contain any research payload. Once started, it will connect back to a resource pool and pull down individuals' research \"payload jobs\". Hence, we do not think about submitting \"jobs\" to sites but rather \"resource requests\". Pilot Jobs \u00b6 Traditionally, an OSG Compute Entrypoint (CE) provides remote access for VOs to submit pilot jobs to your local batch system . There are two options for accepting pilot jobs at your site: Hosted CE : OSG will run and operate the CE services at no cost; the site only needs to provide a SSH pubkey-based authentication access to the central OSG host. OSG will interface with the VO and submit pilots directly to your batch system via SSH. By far, this is the simplest option : however, it is less-scalable and the site delegates many of the scheduling decisions to the OSG. Contact help@osg-htc.org for more information on the hosted CE. OSG CE : The traditional option where the site installs and operates a HTCondor-based CE on a dedicated host. This provides the best scalability and flexibility, but may require an ongoing time investment from the site. The OSG CE install and operation is covered in this documentation page . There are additional ways that pilots can be started at a site (either by the site administrator or an end-user); see resource sharing for more details. Runtime environment \u00b6 The OSG requires a very minimal runtime environment that can be deployed via tarball , RPM , or through a global filesystem on your cluster's worker nodes. We believe that all research applications should be portable and self-contained, with no OS dependencies. This provides access to the most resources and minimizes the presence at sites. However, this ideal is often difficult to achieve in practice. For sites that want to support a uniform runtime environment, we provide a global filesystem called CVMFS that VOs can use to distribute their own software dependencies. Finally, many researchers use applications that require a specific OS environment - not just individual dependencies - that is distributed as a container. OSG supports the use of the Singularity container runtime with Docker-based image distribution. Data Services \u00b6 Whether accessed through CVMFS or command-line software like curl , the majority of software is moved via HTTP in cache-friendly patterns. All sites are highly encouraged to use an HTTP proxy to reduce the load on the WAN from the cluster. Depending on the VOs you want to support, additional data services may be necessary: Some VOs elect to stream their larger input data from offsite using OSG's Data Federation . User jobs can make use of the OSG Data Federation without any services at your site but you may wish to run one or more of the following services: Data Cache to further reduce load on your connection to the WAN. Data Origin to allow local users to stage their data into the OSG Data Federation. The largest sites will additionally run large-scale data services such as a \"storage element\". This is often required for sites that want to support more complex organizations such as ATLAS or CMS. Site Policies \u00b6 Sites are encouraged to clearly specify and communicate their local policies regarding resource access. One common mechanism to do this is post them on a web page and make this page part of your site registration . Written policies help external entities understand what your site wants to accomplish with the OSG -- and are often internally clarifying. In line of our principle of sharing , we encourage you to allow virtual organizations registered with the OSG \"opportunistic use\" of your resources. You may need to preempt those jobs when higher priority jobs come around. The end-users using the OSG generally prefer having access to your site subject to preemption over having no access at all. Getting Help \u00b6 If you need help with planning your site, follow the contact instructions .","title":"Site Planning"},{"location":"site-planning/#site-planning","text":"The OSG vision is to integrate computing across different resource types and business models to allow campus IT to offer a maximally flexible high throughput computing (HTC) environment for their researchers. This document is for System Administrators and aims to provide an overview of the different options to consider when planning to share resources via the OSG. After reading, you should be able to understand what software or services you want to provide to support your researchers Note This document covers the most common options. OSG is a diverse infrastructure: depending on what groups you want to support, you may need to install additional services. Coordinate with your local researchers.","title":"Site Planning"},{"location":"site-planning/#osg-site-services","text":"The OSG Software stack tries to provide a uniform computing and storage fabric across many independently-managed computing and storage resources. These individual services will be accessed by virtual organizations (VOs), which will delegate the resources to scientists, researchers, and students. Sharing is a fundamental principle for the OSG: your site is encouraged to support as many OSG-registered VOs as local conditions allow. Autonomy is another principle: you are not required to support any VOs you do not want. As the administrator, your task is to make your existing computing and storage resources available to and reliable for your supported VOs. We break this down into three tasks: Getting \"pilot jobs\" submitted to your site batch system. Establishing an OSG runtime environment for running jobs. Delivering data to payload applications to be processed. There are multiple approaches for each item, depending on the VOs you support, and time you have to invest in the OSG. Note An essential concept in the OSG is the \"pilot job\". The pilot, which arrives at your batch system, is sent by the VO to get a resource allocation. However, it does not contain any research payload. Once started, it will connect back to a resource pool and pull down individuals' research \"payload jobs\". Hence, we do not think about submitting \"jobs\" to sites but rather \"resource requests\".","title":"OSG Site Services"},{"location":"site-planning/#pilot-jobs","text":"Traditionally, an OSG Compute Entrypoint (CE) provides remote access for VOs to submit pilot jobs to your local batch system . There are two options for accepting pilot jobs at your site: Hosted CE : OSG will run and operate the CE services at no cost; the site only needs to provide a SSH pubkey-based authentication access to the central OSG host. OSG will interface with the VO and submit pilots directly to your batch system via SSH. By far, this is the simplest option : however, it is less-scalable and the site delegates many of the scheduling decisions to the OSG. Contact help@osg-htc.org for more information on the hosted CE. OSG CE : The traditional option where the site installs and operates a HTCondor-based CE on a dedicated host. This provides the best scalability and flexibility, but may require an ongoing time investment from the site. The OSG CE install and operation is covered in this documentation page . There are additional ways that pilots can be started at a site (either by the site administrator or an end-user); see resource sharing for more details.","title":"Pilot Jobs"},{"location":"site-planning/#runtime-environment","text":"The OSG requires a very minimal runtime environment that can be deployed via tarball , RPM , or through a global filesystem on your cluster's worker nodes. We believe that all research applications should be portable and self-contained, with no OS dependencies. This provides access to the most resources and minimizes the presence at sites. However, this ideal is often difficult to achieve in practice. For sites that want to support a uniform runtime environment, we provide a global filesystem called CVMFS that VOs can use to distribute their own software dependencies. Finally, many researchers use applications that require a specific OS environment - not just individual dependencies - that is distributed as a container. OSG supports the use of the Singularity container runtime with Docker-based image distribution.","title":"Runtime environment"},{"location":"site-planning/#data-services","text":"Whether accessed through CVMFS or command-line software like curl , the majority of software is moved via HTTP in cache-friendly patterns. All sites are highly encouraged to use an HTTP proxy to reduce the load on the WAN from the cluster. Depending on the VOs you want to support, additional data services may be necessary: Some VOs elect to stream their larger input data from offsite using OSG's Data Federation . User jobs can make use of the OSG Data Federation without any services at your site but you may wish to run one or more of the following services: Data Cache to further reduce load on your connection to the WAN. Data Origin to allow local users to stage their data into the OSG Data Federation. The largest sites will additionally run large-scale data services such as a \"storage element\". This is often required for sites that want to support more complex organizations such as ATLAS or CMS.","title":"Data Services"},{"location":"site-planning/#site-policies","text":"Sites are encouraged to clearly specify and communicate their local policies regarding resource access. One common mechanism to do this is post them on a web page and make this page part of your site registration . Written policies help external entities understand what your site wants to accomplish with the OSG -- and are often internally clarifying. In line of our principle of sharing , we encourage you to allow virtual organizations registered with the OSG \"opportunistic use\" of your resources. You may need to preempt those jobs when higher priority jobs come around. The end-users using the OSG generally prefer having access to your site subject to preemption over having no access at all.","title":"Site Policies"},{"location":"site-planning/#getting-help","text":"If you need help with planning your site, follow the contact instructions .","title":"Getting Help"},{"location":"site-verification/","text":"Site Verification \u00b6 After installing and registering services from the site planning document , you will need to perform some verification steps before your site can scale up to full production . Verify OSG Software \u00b6 To verify your site's installation of OSG Software, you will need to: Submit local test jobs Contact the OSG for end-to-end tests of pilot job submission Check that OSG usage is reported to the GRACC Local verification \u00b6 It is useful to submit jobs from within your site to verify CE's ability to submit jobs to your local batch system. Consult the document for submitting jobs into an HTCondor-CE for detailed instructions on how to test job submission. Verify end-to-end pilot job submission \u00b6 Once you have validated job submission from within your site, request test pilot jobs from OSG Factory Operations and provide the following information: The fully qualified domain name of the CE Registered OSG resource name Supported OS version of your worker nodes (e.g., EL7, EL8, or a combination) Support for multicore jobs Support for GPUs Maximum job walltime Maximum job memory usage Once the Factory Operations team has enough information, they will start submitting pilots to your CE. Initially, this will be a handful of pilots at a time but once the factory verifies that pilot jobs are running successfully, that number will be ramped up. Verify reporting and monitoring \u00b6 To verify that your site is correctly reporting to the OSG, visit OSG's Accounting Portal and select your registered OSG site name from the Site dropdown. If you don't see your site in the dropdown, please contact us for assistance . Scale Up to Full Production \u00b6 After verifying end-to-end pilot job submission and usage reporting, your site is ready for production! In the same OSG Factory Operations ticket that you opened above , let OSG staff know when you are ready to accept production pilots. After requesting production pilots, review the documentation for how to maintain an OSG site . Getting Help \u00b6 If you need help with your site, or need to report a security incident, follow the contact instructions .","title":"Site Verification"},{"location":"site-verification/#site-verification","text":"After installing and registering services from the site planning document , you will need to perform some verification steps before your site can scale up to full production .","title":"Site Verification"},{"location":"site-verification/#verify-osg-software","text":"To verify your site's installation of OSG Software, you will need to: Submit local test jobs Contact the OSG for end-to-end tests of pilot job submission Check that OSG usage is reported to the GRACC","title":"Verify OSG Software"},{"location":"site-verification/#local-verification","text":"It is useful to submit jobs from within your site to verify CE's ability to submit jobs to your local batch system. Consult the document for submitting jobs into an HTCondor-CE for detailed instructions on how to test job submission.","title":"Local verification"},{"location":"site-verification/#verify-end-to-end-pilot-job-submission","text":"Once you have validated job submission from within your site, request test pilot jobs from OSG Factory Operations and provide the following information: The fully qualified domain name of the CE Registered OSG resource name Supported OS version of your worker nodes (e.g., EL7, EL8, or a combination) Support for multicore jobs Support for GPUs Maximum job walltime Maximum job memory usage Once the Factory Operations team has enough information, they will start submitting pilots to your CE. Initially, this will be a handful of pilots at a time but once the factory verifies that pilot jobs are running successfully, that number will be ramped up.","title":"Verify end-to-end pilot job submission"},{"location":"site-verification/#verify-reporting-and-monitoring","text":"To verify that your site is correctly reporting to the OSG, visit OSG's Accounting Portal and select your registered OSG site name from the Site dropdown. If you don't see your site in the dropdown, please contact us for assistance .","title":"Verify reporting and monitoring"},{"location":"site-verification/#scale-up-to-full-production","text":"After verifying end-to-end pilot job submission and usage reporting, your site is ready for production! In the same OSG Factory Operations ticket that you opened above , let OSG staff know when you are ready to accept production pilots. After requesting production pilots, review the documentation for how to maintain an OSG site .","title":"Scale Up to Full Production"},{"location":"site-verification/#getting-help","text":"If you need help with your site, or need to report a security incident, follow the contact instructions .","title":"Getting Help"},{"location":"common/ca/","text":"Installing Certificate Authorities (CAs) \u00b6 The certificate authorities (CAs) provide the trust roots for the public key infrastructure OSG uses to maintain integrity of its sites and services. This document provides details of various options to install the Certificate Authority (CA) certificates and have up-to-date certificate revocation lists (CRLs) on your OSG hosts. We provide three options for installing CA certificates that offer varying levels of control: Install an RPM for a specific set of CA certificates ( default ) Install osg-ca-scripts , a set of scripts that provide fine-grained CA management Install an RPM that doesn't install any CAs. This is useful if you'd like to manage CAs yourself while satisfying RPM dependencies. Prior to following the instructions on this page, you must enable our yum repositories Installing CA Certificates \u00b6 Please choose one of the three options to install CA certificates. Option 1: Install an RPM for a specific set of CA certificates \u00b6 Note This option is the default if you install OSG software without pre-installing CAs. For example, yum install osg-ce will bring in osg-ca-certs by default. In the OSG repositories, you will find two different sets of predefined CA certificates: ( default ) The OSG CA certificates. This is similar to the IGTF set but may have a small number of additions or deletions The IGTF CA certificates See this page for details of the contents of the OSG CA package. If you chose... Then run the following command... OSG CA certificates yum install osg-ca-certs IGTF CA certificates yum install igtf-ca-certs To automatically keep your RPM installation of CAs up to date, we recommend the OSG CA certificates updater service. Option 2: Install osg-ca-scripts \u00b6 The osg-ca-scripts package provides scripts to install and update predefined sets of CAs with the ability to add or remove specific CAs. The OSG CA certificates. This is similar to the IGTF set but may have a small number of additions or deletions The IGTF CA certificates See this page for details of the contents of the OSG CA package. Install the osg-ca-scripts package: root@host # yum install osg-ca-scripts Choose and install the CA certificate set: If you choose... Then run the following command... OSG CA certificates osg-ca-manage setupCA --location root --url osg IGTF CA certificates osg-ca-manage setupCA --location root --url igtf Enable the osg-update-certs-cron service to enable periodic CA updates. As a reminder, here are common service commands (all run as root ): To... Run the command... Start a service systemctl start Stop a service systemctl stop Enable a service to start on boot systemctl enable Disable a service from starting on boot systemctl disable (Optional) To add a new CA: osg-ca-manage add [--dir ] --hash (Optional) To remove a CA osg-ca-manage remove --hash A complete set of options available though osg-ca-manage command, can be found in the osg-ca-manage documentation Option 3: Site-managed CAs \u00b6 If you want to handle the list of CAs completely internally to your site, you can utilize the empty-ca-certs RPM to satisfy RPM dependencies while not actually installing any CAs. To install this RPM, run the following command: root@host # yum install empty-ca-certs \u2013-enablerepo = osg-empty Warning If you choose this option, you are responsible for installing and maintaining the CA certificates. They must be installed in /etc/grid-security/certificates , or a symlink must be made from that location to the directory that contains the CA certificates. Installing other CAs \u00b6 In addition to the above CAs, you can install other CAs via RPM. These only work with the RPMs that provide CAs (that is, osg-ca-certs and the like, but not osg-ca-scripts .) They are in addition to the above RPMs, so do not only install these extra CAs. Set of CAs RPM name Installation command (as root) cilogon-openid cilogon-openid-ca-cert yum install cilogon-openid-ca-cert Verifying CA Certificates \u00b6 After installing or updating the CA certificates, they can be verified with the following command: root@host # curl --cacert \\ --capath \\ -o /dev/null \\ https://gracc.opensciencegrid.org \\ && echo \"CA certificate installation verified\" Where is the path to a valid X.509 CA certificate and is the path to the directory containing the installed CA certificates. For example, the following command can be used to verify a default OSG CA certificate installation: root@host # curl --cacert /etc/grid-security/certificates/cilogon-osg.pem \\ --capath /etc/grid-security/certificates/ \\ -o /dev/null \\ https://gracc.opensciencegrid.org \\ && echo \"CA certificate installation verified\" % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 22005 0 22005 0 0 86633 0 --:--:-- --:--:-- --:--:-- 499k CA certificate installation verified If you do not see CA certificate installation verified this means that your CA certificate installation is broken. First, ensure that your CA installation is up-to-date and if you continue to see issues please contact us . Keeping CA Certificates Up-to-date \u00b6 It is important to keep CA certificates up-to-date for services and their clients to maintain integrity of production services. To verify that your CA certificates are on the latest version on a given host, determine the most recently released versions and the method by which your CA certificates have been installed: Retrieve the versions of the most recently released IGTF CA certificates and OSG CA certificates Determine which of the three CA certificate installation methods you are using: # rpm -q igtf-ca-certs osg-ca-certs osg-ca-scripts empty-ca-certs Based on which package is installed from the output in the previous step, choose one of the following options: If igtf-ca-certs or osg-ca-certs is installed , compare the installed version from step 2 to the corresponding version from step 1. If the version is older than the corresponding version from step 1, continue onto option 1 to upgrade your current installation and keep your installation up-to-date. If the versions match, your CA certificates are up-to-date! If osg-ca-scripts is installed , run the following command to update your CA certificates: # osg-ca-manage refreshCA And continue to the instructions in option 2 to enable automatic updates of your CA certificates. If empty-ca-scripts is installed , then you are responsible for maintaining your own CA certificates as outlined in option 3 . If none of the packages are installed , your host likely does not need CA certificates and you are done. Managing Certificate Revocation Lists \u00b6 In addition to CA certificates, you must have updated Certificate Revocation Lists (CRLs). CRLs contain certificate blacklists that OSG software uses to ensure that your hosts are only talking to valid clients or servers. To maintain up to date CAs, you will need to run the fetch-crl services. Note Normally fetch-crl is installed when you install the rest of the software and you do not need to explicitly install it. If you do wish to install it manually, run the following command: root@host # yum install fetch-crl If you do not wish to change the frequency of fetch-crl updates (default: every 6 hours) or use syslog for fetch-crl output, skip to the service management section Optional: configuring fetch-crl \u00b6 The following sub-sections contain optional configuration instructions. Note Note that the nosymlinks option in the configuration files refers to ignoring links within the certificates directory (e.g. two different names for the same file). It is perfectly fine if the path of the CA certificates directory itself ( infodir ) is a link to a directory. Changing the frequency of fetch-crl-cron \u00b6 To modify the times that fetch-crl-cron runs, edit /etc/cron.d/fetch-crl . Logging with syslog \u00b6 fetch-crl can produce quite a bit of output when run in verbose mode. To send fetch-crl output to syslog, use the following instructions: Change the configuration file to enable syslog: logmode = syslog syslogfacility = daemon Make sure the file /var/log/daemon exists, e.g. touching the file Change /etc/logrotate.d files to rotate it Managing fetch-crl services \u00b6 fetch-crl is installed as two different system services. The fetch-crl-boot service runs fetch-crl and is intended to only be enabled or disabled. The fetch-crl-cron service runs fetch-crl every 6 hours (with a random sleep time included). Both services are disabled by default. At the very minimum, the fetch-crl-cron service needs to be enabled and started, otherwise services will begin to fail as existing CRLs expire. Software Service name Notes Fetch CRL fetch-crl.timer (EL8-only) Runs fetch-crl every 6 hours and on boot fetch-crl-cron (EL7-only) Runs fetch-crl every 6 hours fetch-crl-boot (EL7-only) Runs fetch-crl immediately and on boot Start the services in the order listed and stop them in reverse order. As a reminder, here are common service commands (all run as root ): To... Run the command... Start a service systemctl start Stop a service systemctl stop Enable a service to start on boot systemctl enable Disable a service from starting on boot systemctl disable Getting Help \u00b6 To get assistance, please use the this page . References \u00b6 Some guides on X.509 certificates: Useful commands: http://security.ncsa.illinois.edu/research/grid-howtos/usefulopenssl.html Install GSI authentication on a server: http://security.ncsa.illinois.edu/research/wssec/gsihttps/ Certificates how-to: http://www.nordugrid.org/documents/certificate_howto.html See this page for examples of verifying certificates. Related software: osg-ca-manage osg-ca-certs-updater Configuration files \u00b6 Package File Description Location Comment All CA Packages CA File Location /etc/grid-security/certificates All CA Packages Index files /etc/grid-security/certificates/INDEX.html or /etc/grid-security/certificates/INDEX.txt Latest version also available at http://repo.opensciencegrid.org/cadist/ All CA Packages Change Log /etc/grid-security/certificates/CHANGES Latest version also available at http://repo.opensciencegrid.org/cadist/CHANGES osg-ca-certs or igtf-ca-certs contain only CA files osg-ca-scripts Configuration File for osg-update-certs /etc/osg/osg-update-certs.conf This file may be edited by hand, though it is recommended to use osg-ca-manage to set configuration parameters. fetch-crl-3.x Configuration file /etc/fetch-crl.conf The index and change log files contain a summary of all the CA distributed and their version. Logs files \u00b6 Package File Description Location osg-ca-scripts Log file of osg-update-certs /var/log/osg-update-certs.log osg-ca-scripts Stdout of osg-update-certs /var/log/osg-ca-certs-status.system.out osg-ca-scripts Stdout of osg-ca-manage /var/log/osg-ca-manage.system.out osg-ca-scripts Stdout of initial CA setup /var/log/osg-setup-ca-certificates.system.out","title":"Overview"},{"location":"common/ca/#installing-certificate-authorities-cas","text":"The certificate authorities (CAs) provide the trust roots for the public key infrastructure OSG uses to maintain integrity of its sites and services. This document provides details of various options to install the Certificate Authority (CA) certificates and have up-to-date certificate revocation lists (CRLs) on your OSG hosts. We provide three options for installing CA certificates that offer varying levels of control: Install an RPM for a specific set of CA certificates ( default ) Install osg-ca-scripts , a set of scripts that provide fine-grained CA management Install an RPM that doesn't install any CAs. This is useful if you'd like to manage CAs yourself while satisfying RPM dependencies. Prior to following the instructions on this page, you must enable our yum repositories","title":"Installing Certificate Authorities (CAs)"},{"location":"common/ca/#installing-ca-certificates","text":"Please choose one of the three options to install CA certificates.","title":"Installing CA Certificates"},{"location":"common/ca/#option-1-install-an-rpm-for-a-specific-set-of-ca-certificates","text":"Note This option is the default if you install OSG software without pre-installing CAs. For example, yum install osg-ce will bring in osg-ca-certs by default. In the OSG repositories, you will find two different sets of predefined CA certificates: ( default ) The OSG CA certificates. This is similar to the IGTF set but may have a small number of additions or deletions The IGTF CA certificates See this page for details of the contents of the OSG CA package. If you chose... Then run the following command... OSG CA certificates yum install osg-ca-certs IGTF CA certificates yum install igtf-ca-certs To automatically keep your RPM installation of CAs up to date, we recommend the OSG CA certificates updater service.","title":"Option 1: Install an RPM for a specific set of CA certificates"},{"location":"common/ca/#option-2-install-osg-ca-scripts","text":"The osg-ca-scripts package provides scripts to install and update predefined sets of CAs with the ability to add or remove specific CAs. The OSG CA certificates. This is similar to the IGTF set but may have a small number of additions or deletions The IGTF CA certificates See this page for details of the contents of the OSG CA package. Install the osg-ca-scripts package: root@host # yum install osg-ca-scripts Choose and install the CA certificate set: If you choose... Then run the following command... OSG CA certificates osg-ca-manage setupCA --location root --url osg IGTF CA certificates osg-ca-manage setupCA --location root --url igtf Enable the osg-update-certs-cron service to enable periodic CA updates. As a reminder, here are common service commands (all run as root ): To... Run the command... Start a service systemctl start Stop a service systemctl stop Enable a service to start on boot systemctl enable Disable a service from starting on boot systemctl disable (Optional) To add a new CA: osg-ca-manage add [--dir ] --hash (Optional) To remove a CA osg-ca-manage remove --hash A complete set of options available though osg-ca-manage command, can be found in the osg-ca-manage documentation","title":"Option 2: Install osg-ca-scripts"},{"location":"common/ca/#option-3-site-managed-cas","text":"If you want to handle the list of CAs completely internally to your site, you can utilize the empty-ca-certs RPM to satisfy RPM dependencies while not actually installing any CAs. To install this RPM, run the following command: root@host # yum install empty-ca-certs \u2013-enablerepo = osg-empty Warning If you choose this option, you are responsible for installing and maintaining the CA certificates. They must be installed in /etc/grid-security/certificates , or a symlink must be made from that location to the directory that contains the CA certificates.","title":"Option 3: Site-managed CAs"},{"location":"common/ca/#installing-other-cas","text":"In addition to the above CAs, you can install other CAs via RPM. These only work with the RPMs that provide CAs (that is, osg-ca-certs and the like, but not osg-ca-scripts .) They are in addition to the above RPMs, so do not only install these extra CAs. Set of CAs RPM name Installation command (as root) cilogon-openid cilogon-openid-ca-cert yum install cilogon-openid-ca-cert","title":"Installing other CAs"},{"location":"common/ca/#verifying-ca-certificates","text":"After installing or updating the CA certificates, they can be verified with the following command: root@host # curl --cacert \\ --capath \\ -o /dev/null \\ https://gracc.opensciencegrid.org \\ && echo \"CA certificate installation verified\" Where is the path to a valid X.509 CA certificate and is the path to the directory containing the installed CA certificates. For example, the following command can be used to verify a default OSG CA certificate installation: root@host # curl --cacert /etc/grid-security/certificates/cilogon-osg.pem \\ --capath /etc/grid-security/certificates/ \\ -o /dev/null \\ https://gracc.opensciencegrid.org \\ && echo \"CA certificate installation verified\" % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 22005 0 22005 0 0 86633 0 --:--:-- --:--:-- --:--:-- 499k CA certificate installation verified If you do not see CA certificate installation verified this means that your CA certificate installation is broken. First, ensure that your CA installation is up-to-date and if you continue to see issues please contact us .","title":"Verifying CA Certificates"},{"location":"common/ca/#keeping-ca-certificates-up-to-date","text":"It is important to keep CA certificates up-to-date for services and their clients to maintain integrity of production services. To verify that your CA certificates are on the latest version on a given host, determine the most recently released versions and the method by which your CA certificates have been installed: Retrieve the versions of the most recently released IGTF CA certificates and OSG CA certificates Determine which of the three CA certificate installation methods you are using: # rpm -q igtf-ca-certs osg-ca-certs osg-ca-scripts empty-ca-certs Based on which package is installed from the output in the previous step, choose one of the following options: If igtf-ca-certs or osg-ca-certs is installed , compare the installed version from step 2 to the corresponding version from step 1. If the version is older than the corresponding version from step 1, continue onto option 1 to upgrade your current installation and keep your installation up-to-date. If the versions match, your CA certificates are up-to-date! If osg-ca-scripts is installed , run the following command to update your CA certificates: # osg-ca-manage refreshCA And continue to the instructions in option 2 to enable automatic updates of your CA certificates. If empty-ca-scripts is installed , then you are responsible for maintaining your own CA certificates as outlined in option 3 . If none of the packages are installed , your host likely does not need CA certificates and you are done.","title":"Keeping CA Certificates Up-to-date"},{"location":"common/ca/#managing-certificate-revocation-lists","text":"In addition to CA certificates, you must have updated Certificate Revocation Lists (CRLs). CRLs contain certificate blacklists that OSG software uses to ensure that your hosts are only talking to valid clients or servers. To maintain up to date CAs, you will need to run the fetch-crl services. Note Normally fetch-crl is installed when you install the rest of the software and you do not need to explicitly install it. If you do wish to install it manually, run the following command: root@host # yum install fetch-crl If you do not wish to change the frequency of fetch-crl updates (default: every 6 hours) or use syslog for fetch-crl output, skip to the service management section","title":"Managing Certificate Revocation Lists"},{"location":"common/ca/#optional-configuring-fetch-crl","text":"The following sub-sections contain optional configuration instructions. Note Note that the nosymlinks option in the configuration files refers to ignoring links within the certificates directory (e.g. two different names for the same file). It is perfectly fine if the path of the CA certificates directory itself ( infodir ) is a link to a directory.","title":"Optional: configuring fetch-crl"},{"location":"common/ca/#changing-the-frequency-of-fetch-crl-cron","text":"To modify the times that fetch-crl-cron runs, edit /etc/cron.d/fetch-crl .","title":"Changing the frequency of fetch-crl-cron"},{"location":"common/ca/#logging-with-syslog","text":"fetch-crl can produce quite a bit of output when run in verbose mode. To send fetch-crl output to syslog, use the following instructions: Change the configuration file to enable syslog: logmode = syslog syslogfacility = daemon Make sure the file /var/log/daemon exists, e.g. touching the file Change /etc/logrotate.d files to rotate it","title":"Logging with syslog"},{"location":"common/ca/#managing-fetch-crl-services","text":"fetch-crl is installed as two different system services. The fetch-crl-boot service runs fetch-crl and is intended to only be enabled or disabled. The fetch-crl-cron service runs fetch-crl every 6 hours (with a random sleep time included). Both services are disabled by default. At the very minimum, the fetch-crl-cron service needs to be enabled and started, otherwise services will begin to fail as existing CRLs expire. Software Service name Notes Fetch CRL fetch-crl.timer (EL8-only) Runs fetch-crl every 6 hours and on boot fetch-crl-cron (EL7-only) Runs fetch-crl every 6 hours fetch-crl-boot (EL7-only) Runs fetch-crl immediately and on boot Start the services in the order listed and stop them in reverse order. As a reminder, here are common service commands (all run as root ): To... Run the command... Start a service systemctl start Stop a service systemctl stop Enable a service to start on boot systemctl enable Disable a service from starting on boot systemctl disable ","title":"Managing fetch-crl services"},{"location":"common/ca/#getting-help","text":"To get assistance, please use the this page .","title":"Getting Help"},{"location":"common/ca/#references","text":"Some guides on X.509 certificates: Useful commands: http://security.ncsa.illinois.edu/research/grid-howtos/usefulopenssl.html Install GSI authentication on a server: http://security.ncsa.illinois.edu/research/wssec/gsihttps/ Certificates how-to: http://www.nordugrid.org/documents/certificate_howto.html See this page for examples of verifying certificates. Related software: osg-ca-manage osg-ca-certs-updater","title":"References"},{"location":"common/ca/#configuration-files","text":"Package File Description Location Comment All CA Packages CA File Location /etc/grid-security/certificates All CA Packages Index files /etc/grid-security/certificates/INDEX.html or /etc/grid-security/certificates/INDEX.txt Latest version also available at http://repo.opensciencegrid.org/cadist/ All CA Packages Change Log /etc/grid-security/certificates/CHANGES Latest version also available at http://repo.opensciencegrid.org/cadist/CHANGES osg-ca-certs or igtf-ca-certs contain only CA files osg-ca-scripts Configuration File for osg-update-certs /etc/osg/osg-update-certs.conf This file may be edited by hand, though it is recommended to use osg-ca-manage to set configuration parameters. fetch-crl-3.x Configuration file /etc/fetch-crl.conf The index and change log files contain a summary of all the CA distributed and their version.","title":"Configuration files"},{"location":"common/ca/#logs-files","text":"Package File Description Location osg-ca-scripts Log file of osg-update-certs /var/log/osg-update-certs.log osg-ca-scripts Stdout of osg-update-certs /var/log/osg-ca-certs-status.system.out osg-ca-scripts Stdout of osg-ca-manage /var/log/osg-ca-manage.system.out osg-ca-scripts Stdout of initial CA setup /var/log/osg-setup-ca-certificates.system.out","title":"Logs files"},{"location":"common/contact-registration/","text":"Registering Contact Information \u00b6 OSG staff keep track of contact information for OSG Consortium participants to provide access to OSG services, notify administrators and security contacts of software and security updates, and coordinate in case of security incidents or troubleshooting services. The OSG contact management service is backed by InCommon federation , meaning that contacts may register with the OSG using their institutional identities with familiar Single Sign-On forms. Privacy Notice The OSG treats any email addresses and phone numbers as confidential data but does not make any guarantees of privacy. All other data is public (such as name, GitHub username, and any association with particular services or collaborations). How do I register a mailing list? If you would like to register a mailing list as a contact for your site, please contact us directly . Submitting an Application \u00b6 To register with the OSG, submit an application using the self-signup process: Visit https://osg-htc.org/register You will be presented with a Single-Sign On page. Select your insitution and sign in with your insitutional credentials: Help, my institution does not show up in the drop-down! If your institution does not show up in the drop-down menu, then your institution is not part of the InCommon federation . In this case, we recommend using an ORCID account instead, registering a new one if necessary. After you have signed in, you will be presented with the self-signup form. Click the \"BEGIN\" button: Enter your name, email address, GitHub username (optional), and a comment describing why you are registering as a participant in the OSG Consortium. Your institution may provide defaults for your name and email address but you may override these values. Once you have updated all the fields to your liking, click the \"SUBMIT\" button: Verifying Your Email Address \u00b6 After submitting your registration application, you will receive an email from registry@cilogon.org to verify your email address. Follow the link in the email and click the \"Accept\" button to complete the verification: Wait for URL redirection After clicking the email verification link, be sure to let the page to completely load (you will be redirected back to this page), otherwise you may have issues completing your registration. If you believe this has happened to you, please contact us for assistance. Help, my email verification link has expired! If the email verification link has expired, please contact us to request a new verification link. Waiting for Approval \u00b6 After verifying your email address, your registration application must be approved by OSG staff. Once your registration application has been approved, you will receive a confirmation email: Once you have received your confirmation email, you may start using OSG services such as registering your resources . OASIS Managers: Adding an SSH Key \u00b6 After approval by OSG staff, OASIS managers must upload a public SSH key before being able to access the OASIS login host: Visit https://osg-htc.org/register and login if prompted Click your name in the top right to get a dropdown and click the My Profile (OSG) button On the right-side of your profile, click the Authenticators link: On the authenticators page, click the Manage button: On the SSH keys page, click the Add SSH Key link: Finally, upload your public SSH key from your computer: Getting Help \u00b6 For assistance with the OSG contact registration process, please use this page .","title":"Contact Information"},{"location":"common/contact-registration/#registering-contact-information","text":"OSG staff keep track of contact information for OSG Consortium participants to provide access to OSG services, notify administrators and security contacts of software and security updates, and coordinate in case of security incidents or troubleshooting services. The OSG contact management service is backed by InCommon federation , meaning that contacts may register with the OSG using their institutional identities with familiar Single Sign-On forms. Privacy Notice The OSG treats any email addresses and phone numbers as confidential data but does not make any guarantees of privacy. All other data is public (such as name, GitHub username, and any association with particular services or collaborations). How do I register a mailing list? If you would like to register a mailing list as a contact for your site, please contact us directly .","title":"Registering Contact Information"},{"location":"common/contact-registration/#submitting-an-application","text":"To register with the OSG, submit an application using the self-signup process: Visit https://osg-htc.org/register You will be presented with a Single-Sign On page. Select your insitution and sign in with your insitutional credentials: Help, my institution does not show up in the drop-down! If your institution does not show up in the drop-down menu, then your institution is not part of the InCommon federation . In this case, we recommend using an ORCID account instead, registering a new one if necessary. After you have signed in, you will be presented with the self-signup form. Click the \"BEGIN\" button: Enter your name, email address, GitHub username (optional), and a comment describing why you are registering as a participant in the OSG Consortium. Your institution may provide defaults for your name and email address but you may override these values. Once you have updated all the fields to your liking, click the \"SUBMIT\" button:","title":"Submitting an Application"},{"location":"common/contact-registration/#verifying-your-email-address","text":"After submitting your registration application, you will receive an email from registry@cilogon.org to verify your email address. Follow the link in the email and click the \"Accept\" button to complete the verification: Wait for URL redirection After clicking the email verification link, be sure to let the page to completely load (you will be redirected back to this page), otherwise you may have issues completing your registration. If you believe this has happened to you, please contact us for assistance. Help, my email verification link has expired! If the email verification link has expired, please contact us to request a new verification link.","title":"Verifying Your Email Address"},{"location":"common/contact-registration/#waiting-for-approval","text":"After verifying your email address, your registration application must be approved by OSG staff. Once your registration application has been approved, you will receive a confirmation email: Once you have received your confirmation email, you may start using OSG services such as registering your resources .","title":"Waiting for Approval"},{"location":"common/contact-registration/#oasis-managers-adding-an-ssh-key","text":"After approval by OSG staff, OASIS managers must upload a public SSH key before being able to access the OASIS login host: Visit https://osg-htc.org/register and login if prompted Click your name in the top right to get a dropdown and click the My Profile (OSG) button On the right-side of your profile, click the Authenticators link: On the authenticators page, click the Manage button: On the SSH keys page, click the Add SSH Key link: Finally, upload your public SSH key from your computer:","title":"OASIS Managers: Adding an SSH Key"},{"location":"common/contact-registration/#getting-help","text":"For assistance with the OSG contact registration process, please use this page .","title":"Getting Help"},{"location":"common/help/","text":"How to Get Help \u00b6 This page is aimed at OSG site administrators looking for support. Help for OSG users can be found at our support desk . Security Incidents \u00b6 Security incidents can be reported by following the instructions on the Incident Discovery and Reporting page. Software or Service Support \u00b6 If you are experiencing issues with OSG software or services, please consult the following resources before opening a support inquiry: Troubleshooting sections or pages for the problematic software Recent OSG Software release notes OSG 23 OSG 3.6 Outage information for OSG services Submitting support inquiries \u00b6 If your problem still hasn't been resolved by consulting the resources above, please submit a support inquiry with the information noted below: If you came to this page from an installation guide, please provide the following information: Commands and output from any Troubleshooting sections or pages The OSG system profile ( osg-profile.txt ), generated by running the following command: root@host # osg-system-profiler Submit a support inquiry to the system based on the VOs that you are associated with: If you are primarily associated with... Submit new tickets to... LHC VOs GGUS Anyone else help@osg-htc.org Community-specific support \u00b6 Some OSG VOs have dedicated forums or mechanisms for community-specific support. If your VO provides user support, that should be a user's first line of support because the VO is most familiar with your applications and requirements. The list of support centers for OSG VOs can be found in the here . Resources for CMS sites: http://www.uscms.org/uscms_at_work/physics/computing/grid/index.shtml CMS Hyper News: https://hypernews.cern.ch/HyperNews/CMS/get/osg-tier3.html CMS Twiki: https://twiki.cern.ch/twiki/bin/viewauth/CMS/USTier3Computing","title":"Help / Security Incidents"},{"location":"common/help/#how-to-get-help","text":"This page is aimed at OSG site administrators looking for support. Help for OSG users can be found at our support desk .","title":"How to Get Help"},{"location":"common/help/#security-incidents","text":"Security incidents can be reported by following the instructions on the Incident Discovery and Reporting page.","title":"Security Incidents"},{"location":"common/help/#software-or-service-support","text":"If you are experiencing issues with OSG software or services, please consult the following resources before opening a support inquiry: Troubleshooting sections or pages for the problematic software Recent OSG Software release notes OSG 23 OSG 3.6 Outage information for OSG services","title":"Software or Service Support"},{"location":"common/help/#submitting-support-inquiries","text":"If your problem still hasn't been resolved by consulting the resources above, please submit a support inquiry with the information noted below: If you came to this page from an installation guide, please provide the following information: Commands and output from any Troubleshooting sections or pages The OSG system profile ( osg-profile.txt ), generated by running the following command: root@host # osg-system-profiler Submit a support inquiry to the system based on the VOs that you are associated with: If you are primarily associated with... Submit new tickets to... LHC VOs GGUS Anyone else help@osg-htc.org","title":"Submitting support inquiries"},{"location":"common/help/#community-specific-support","text":"Some OSG VOs have dedicated forums or mechanisms for community-specific support. If your VO provides user support, that should be a user's first line of support because the VO is most familiar with your applications and requirements. The list of support centers for OSG VOs can be found in the here . Resources for CMS sites: http://www.uscms.org/uscms_at_work/physics/computing/grid/index.shtml CMS Hyper News: https://hypernews.cern.ch/HyperNews/CMS/get/osg-tier3.html CMS Twiki: https://twiki.cern.ch/twiki/bin/viewauth/CMS/USTier3Computing","title":"Community-specific support"},{"location":"common/registration/","text":"Registering with the OSG Consortium \u00b6 OSG staff keeps a registry containing active projects, collaborations (a.k.a. virtual organizations or VOs), resources, and resource downtimes stored as YAML files in the topology GitHub repository . This registry is used for accounting data , contact information, and resource availability. Use this page to learn how to register information in the OSG Consortium. Registration Requirements \u00b6 The instructions in this document require the following: A GitHub account A working knowledge of GitHub collaboration OSG contact registration Registering Contacts \u00b6 OSG staff keep track of contact information for OSG Consortium participants to provide access to OSG services, notify administrators and security contacts of software and security updates, and coordinating in case of security incidents or troubleshooting services. To register your contact information with the OSG Consortium, follow the instructions in this document . Privacy Notice The OSG treats any email addresses and phone numbers as confidential data but does not make any guarantees of privacy. All other data is public (such as name, GitHub username, and any association with particular services or collaborations). Registering Resources \u00b6 An OSG resource is a host that provides services to OSG campuses and collaborations; some examples are Compute Entrypoints, storage endpoints, or perfSONAR hosts. See the full list of services that should be registered in the OSG topology here . OSG resources are stored under a hierarchy of facilities, sites, and resource groups, defined as follows: Facility : The institution or company name where your resource is located. Site : Smaller than a facility; typically represents a computing center or an academic department. Frequently used as the display name for accounting dashboards . Resource Group : A logical grouping of resources at a site, i.e. all resources associated with a specific computing cluster. Multi-resource downtimes are easiest to declare across a resource group. Production and testing resources must be placed into separate resource groups. Resource : A host that provides services, e.g. Compute Entrypoints, storage endpoints, or perfSONAR hosts. Throughout this document, you will be asked to substitute your own facility, site, resource group, and resource names when registering with the OSG. If you don't already know the relevant names for your resource, using the following naming conventions: Level Naming convention Facility Unabbreviated institution or company name, e.g. University of Wisconsin - Madison Site Computing center or academic department, e.g. CHTC , MWT2 ATLAS UC , San Diego Supercomputer Center The only characters allowed in Site names are letters, numbers, underscores, hyphens, and spaces; i.e., a Site name must match the regular expression ^[A-Za-z0-9_ -]+$ Resource Group Abbreviated facility, site, and cluster name. Resource groups used for testing purposes should have an -ITB or - ITB suffix, e.g. TCNJ-ELSA-ITB Resource In all capital letters, -- , for example: TCNJ-ELSA-CE or NMSU-AGGIE-GRID-SQUID If you don't know which VO to use, pick OSG . OSG resources are stored in the GitHub repository as YAML files under a directory structure that reflects the above hierarchy, i.e. topology///.yaml from the root of the topology repository . New site \u00b6 To register a site, first choose a name for it (see the naming conventions table above ) The site name will appear in OSG accounting in places such as the GRACC site dashboard . Once you have chosen a site name, open the following in your browser: https://github.com/opensciencegrid/topology/new/master?filename=topology///SITE.yaml (replacing and with the facility and the site name that you chose ). \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Make changes with the GitHub file editor using the site template as a guide. You may leave the ID field blank. When adding new entries, make sure that the formatting and indentation of your entry matches that of the template. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Adding AggieGrid cluster for New Mexico State Searching for resources \u00b6 Whether you are registering a new resource or modifying an existing resource, start by searching for the FQDN of your host to avoid any duplicate registrations: Open the topology repository in your browser. Search the repository for the FQDN of your resource wrapped in double-quotes using the GitHub search bar (e.g., \"glidein2.chtc.wisc.edu\" ): If the search doesn't return any results , skip to these instructions for registering a new resource. If the search returns a single YAML file , open the link to the YAML file and skip to these instructions for modifying existing resources. If the search returns more than one YAML file , please contact us . Note If you are adding a new service to a host which is already registered as a resource, follow the instructions for modifying existing resources. New resources \u00b6 Before registering a new resource, make sure that its FQDN is not already registered . To register a new resource, follow the instructions below: Find the facility, site, and resource group for your resource in the topology repository under this directory structure: topology///.yaml . When searching for these, keep in mind that case and spaces matter. If you do not have a facility, contact help@osg-htc.org for help. If you have a facility but not a site, first follow the instructions for registering a site above. If you have a facility and a site but not a resource group, pick a resource group name . Once you have your facility, site, and resource group, follow the instructions below, replacing instances of , , and with the corresponding names that you chose above : If your resource group already exists under your facility and site, open the following URL in your browser: https://github.com/opensciencegrid/topology/edit/master/topology///.yaml For example, to add a resource to the CHTC resource group for the CHTC site at the University of Wisconsin , open the following URL: https://github.com/opensciencegrid/topology/edit/master/topology/University of Wisconsin/CHTC/CHTC.yaml If your resource group does not exist, open the following URL in your browser: https://github.com/opensciencegrid/topology/new/master?filename=topology///.yaml For example, to create a CHTC-Slurm-HPC resource group for the Center for High Throughput Computing ( CHTC ) at the University of Wisconsin , open the following URL: https://github.com/opensciencegrid/topology/new/master?filename=topology/University of Wisconsin/CHTC/CHTC-Slurm-HPC.yaml \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Make changes with the GitHub file editor using the resource group template as a guide. You may leave any ID or GroupID fields blank. When adding new entries, make sure that the formatting and indentation of your entry matches that of the template. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Adding a new compute entrypoint to the CHTC Modifying existing resources \u00b6 To modify an existing resource, follow these instructions: Find the resource that you would like to modify by searching GitHub , and open the link to the YAML file. Click the branch selector button next to the file path and select the master branch. Make changes with the GitHub file editor using the resource group template as a guide. You may leave any ID or GroupID fields blank. Make sure that the formatting and indentation of the modified entry does not change. If you are adding a new service to a host that is already registered as a resource, add the new service to the existing resource; do not create a new resource for the same host. !!! note \"\"You're editing a file in a project you don't have write access to.\"\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Updating administrative contact information for CHTC-glidein2 Retiring resources \u00b6 To retire an already registered resource, set Active: false . For example: ... Production: true Resources: GLOW: Active: false ... Services: CE: Description: Compute Entrypoint Details: hidden: false If the Active attribute does not already exist within the resource definition, add it. If your resource becomes available again, set Active: true . Registering Resource Downtimes \u00b6 Resource downtime is a finite period of time for which one or more of the services of a registered resource are unavailable. Warning If you expect your resource to be indefinitely unavailable, retire the resource instead of registering a downtime. Downtimes are stored in YAML files alongside the resource group YAML files as described here . For example, downtimes for resources in the CHTC-Slurm-HPC resource group of the CHTC site at the University of Wisconsin can be found and registered in the following file, relative to the root of the topology repository : topology/University of Wisconsin/CHTC/CHTC-Slurm-HPC_downtime.yaml Note Do not put downtime updates in the same pull request as other topology updates. Registering new downtime \u00b6 To register a new downtime for a resource or for multiples resources that are part of a resource group, you will use webforms to generate the contents of the downtime entry, copy it into the downtime file corresponding to your resource, and submit it as a GitHub pull request. Follow the instructions below: Open one of the downtime generation webforms in your browser: Use the resource downtime generator if you only need to declare a downtime for a single resource. Use the resource group downtime generator if you need to declare a downtime for multiple resources across a resource group. Select your facility, site, resource group, and/or resource from the corresponding lists. For the single resource downtime form: Select all the services that will be down. To select multiple, use Control-Click on Windows and Linux, or Command-Click on macOS. Fill the other fields with information about the downtime. Click the Generate button. If the information is valid, a block of text will be displayed in the box labeled Generated YAML . Otherwise, check for error messages and fix your input. Follow the instructions shown below the generated block of text. \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Wait for OSG staff to approve and merge your new downtime. Modifying existing downtime \u00b6 In case an already registered downtime is incorrect or need to be updated to reflect new information, you can modify existing downtime entries using the GitHub editor. Failure Changes to the ID or CreatedTime fields will be rejected. To modify an existing downtime entry for a registered resource, manually make the changes in the matching downtime YAML file. Follow the instructions below: Open the topology repository in your browser. If you do not know the facility, site, and resource group of the resource the downtime entry refers to, search the repository for the FQDN of your resource wrapped in double-quotes using the GitHub search bar (e.g., \"glidein2.chtc.wisc.edu\" ): If the search returns a single YAML file , note the name of the facility, site, and resource group and continue to the next step. If the search doesn't return any results or returns more than one YAML file , please contact us . Open the following URL in your browser using the facility, site, and resource group names to replace , , and , respectively: https://github.com/opensciencegrid/topology/edit/master/topology///_downtime.yaml \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Make changes with the GitHub file editor using the downtime template as a reference. Make sure that the formatting and indentation of the modified entry does not change. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Move forward end date for CHTC-glidein2 regular maintenance Wait for OSG staff to approve and merge your modified downtime. Registering Virtual Organizations \u00b6 Virtual Organizations (VOs) are sets of groups or individuals defined by some common cyber-infrastructure need. This can be a scientific experiment, a university campus or a distributed research effort. A VO represents all its members and their common needs in distributed computing environment. A VO also includes the group\u2019s computing/storage resources and services. For more information about VOs, see this page . Info Before submitting a registration for a new VO, please contact us describing your organization's computing needs. VO information is stored as YAML files in the virtual-organizations directory of the topology repository . To modify a VO's information or register a new VO, follow the instructions below: Open the topology repository in your browser. If you see your VO in the list, open the file and continue to the next step. If you do not see your VO in the list, click Create new file button: In the new file dialog, enter .yaml , replacing with the name of your VO. \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Make changes with the GitHub file editor using the VO template as a guide. You may leave any ID fields blank. If you are modifying existing entries, make sure you do not change formatting or indentation of the modified entry. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Updating contact information for the GLOW VO Registering Projects \u00b6 Info Before submitting a registration for a new project, please contact us describing your organization's computing needs. Project information is stored as YAML files in the projects directory of the topology repository . To modify a VO's information or register a new VO, follow the instructions below: Open the topology repository in your browser. If you see your project in the list, open the file and continue to the next step. If you do not see your project in the list, click Create new file button: In the new file dialog, enter .yaml , replacing with the name of your project. \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Make changes with the GitHub file editor using the project template as a guide. You may leave any ID fields blank. If you are modifying existing entries, make sure you do not change formatting or indentation of the modified entry. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Updating contact information for the Mu2e project Getting Help \u00b6 To get assistance, please use the this page .","title":"Resources and Collaborations"},{"location":"common/registration/#registering-with-the-osg-consortium","text":"OSG staff keeps a registry containing active projects, collaborations (a.k.a. virtual organizations or VOs), resources, and resource downtimes stored as YAML files in the topology GitHub repository . This registry is used for accounting data , contact information, and resource availability. Use this page to learn how to register information in the OSG Consortium.","title":"Registering with the OSG Consortium"},{"location":"common/registration/#registration-requirements","text":"The instructions in this document require the following: A GitHub account A working knowledge of GitHub collaboration OSG contact registration","title":"Registration Requirements"},{"location":"common/registration/#registering-contacts","text":"OSG staff keep track of contact information for OSG Consortium participants to provide access to OSG services, notify administrators and security contacts of software and security updates, and coordinating in case of security incidents or troubleshooting services. To register your contact information with the OSG Consortium, follow the instructions in this document . Privacy Notice The OSG treats any email addresses and phone numbers as confidential data but does not make any guarantees of privacy. All other data is public (such as name, GitHub username, and any association with particular services or collaborations).","title":"Registering Contacts"},{"location":"common/registration/#registering-resources","text":"An OSG resource is a host that provides services to OSG campuses and collaborations; some examples are Compute Entrypoints, storage endpoints, or perfSONAR hosts. See the full list of services that should be registered in the OSG topology here . OSG resources are stored under a hierarchy of facilities, sites, and resource groups, defined as follows: Facility : The institution or company name where your resource is located. Site : Smaller than a facility; typically represents a computing center or an academic department. Frequently used as the display name for accounting dashboards . Resource Group : A logical grouping of resources at a site, i.e. all resources associated with a specific computing cluster. Multi-resource downtimes are easiest to declare across a resource group. Production and testing resources must be placed into separate resource groups. Resource : A host that provides services, e.g. Compute Entrypoints, storage endpoints, or perfSONAR hosts. Throughout this document, you will be asked to substitute your own facility, site, resource group, and resource names when registering with the OSG. If you don't already know the relevant names for your resource, using the following naming conventions: Level Naming convention Facility Unabbreviated institution or company name, e.g. University of Wisconsin - Madison Site Computing center or academic department, e.g. CHTC , MWT2 ATLAS UC , San Diego Supercomputer Center The only characters allowed in Site names are letters, numbers, underscores, hyphens, and spaces; i.e., a Site name must match the regular expression ^[A-Za-z0-9_ -]+$ Resource Group Abbreviated facility, site, and cluster name. Resource groups used for testing purposes should have an -ITB or - ITB suffix, e.g. TCNJ-ELSA-ITB Resource In all capital letters, -- , for example: TCNJ-ELSA-CE or NMSU-AGGIE-GRID-SQUID If you don't know which VO to use, pick OSG . OSG resources are stored in the GitHub repository as YAML files under a directory structure that reflects the above hierarchy, i.e. topology///.yaml from the root of the topology repository .","title":"Registering Resources"},{"location":"common/registration/#new-site","text":"To register a site, first choose a name for it (see the naming conventions table above ) The site name will appear in OSG accounting in places such as the GRACC site dashboard . Once you have chosen a site name, open the following in your browser: https://github.com/opensciencegrid/topology/new/master?filename=topology///SITE.yaml (replacing and with the facility and the site name that you chose ). \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Make changes with the GitHub file editor using the site template as a guide. You may leave the ID field blank. When adding new entries, make sure that the formatting and indentation of your entry matches that of the template. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Adding AggieGrid cluster for New Mexico State","title":"New site"},{"location":"common/registration/#searching-for-resources","text":"Whether you are registering a new resource or modifying an existing resource, start by searching for the FQDN of your host to avoid any duplicate registrations: Open the topology repository in your browser. Search the repository for the FQDN of your resource wrapped in double-quotes using the GitHub search bar (e.g., \"glidein2.chtc.wisc.edu\" ): If the search doesn't return any results , skip to these instructions for registering a new resource. If the search returns a single YAML file , open the link to the YAML file and skip to these instructions for modifying existing resources. If the search returns more than one YAML file , please contact us . Note If you are adding a new service to a host which is already registered as a resource, follow the instructions for modifying existing resources.","title":"Searching for resources"},{"location":"common/registration/#new-resources","text":"Before registering a new resource, make sure that its FQDN is not already registered . To register a new resource, follow the instructions below: Find the facility, site, and resource group for your resource in the topology repository under this directory structure: topology///.yaml . When searching for these, keep in mind that case and spaces matter. If you do not have a facility, contact help@osg-htc.org for help. If you have a facility but not a site, first follow the instructions for registering a site above. If you have a facility and a site but not a resource group, pick a resource group name . Once you have your facility, site, and resource group, follow the instructions below, replacing instances of , , and with the corresponding names that you chose above : If your resource group already exists under your facility and site, open the following URL in your browser: https://github.com/opensciencegrid/topology/edit/master/topology///.yaml For example, to add a resource to the CHTC resource group for the CHTC site at the University of Wisconsin , open the following URL: https://github.com/opensciencegrid/topology/edit/master/topology/University of Wisconsin/CHTC/CHTC.yaml If your resource group does not exist, open the following URL in your browser: https://github.com/opensciencegrid/topology/new/master?filename=topology///.yaml For example, to create a CHTC-Slurm-HPC resource group for the Center for High Throughput Computing ( CHTC ) at the University of Wisconsin , open the following URL: https://github.com/opensciencegrid/topology/new/master?filename=topology/University of Wisconsin/CHTC/CHTC-Slurm-HPC.yaml \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Make changes with the GitHub file editor using the resource group template as a guide. You may leave any ID or GroupID fields blank. When adding new entries, make sure that the formatting and indentation of your entry matches that of the template. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Adding a new compute entrypoint to the CHTC","title":"New resources"},{"location":"common/registration/#modifying-existing-resources","text":"To modify an existing resource, follow these instructions: Find the resource that you would like to modify by searching GitHub , and open the link to the YAML file. Click the branch selector button next to the file path and select the master branch. Make changes with the GitHub file editor using the resource group template as a guide. You may leave any ID or GroupID fields blank. Make sure that the formatting and indentation of the modified entry does not change. If you are adding a new service to a host that is already registered as a resource, add the new service to the existing resource; do not create a new resource for the same host. !!! note \"\"You're editing a file in a project you don't have write access to.\"\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Updating administrative contact information for CHTC-glidein2","title":"Modifying existing resources"},{"location":"common/registration/#retiring-resources","text":"To retire an already registered resource, set Active: false . For example: ... Production: true Resources: GLOW: Active: false ... Services: CE: Description: Compute Entrypoint Details: hidden: false If the Active attribute does not already exist within the resource definition, add it. If your resource becomes available again, set Active: true .","title":"Retiring resources"},{"location":"common/registration/#registering-resource-downtimes","text":"Resource downtime is a finite period of time for which one or more of the services of a registered resource are unavailable. Warning If you expect your resource to be indefinitely unavailable, retire the resource instead of registering a downtime. Downtimes are stored in YAML files alongside the resource group YAML files as described here . For example, downtimes for resources in the CHTC-Slurm-HPC resource group of the CHTC site at the University of Wisconsin can be found and registered in the following file, relative to the root of the topology repository : topology/University of Wisconsin/CHTC/CHTC-Slurm-HPC_downtime.yaml Note Do not put downtime updates in the same pull request as other topology updates.","title":"Registering Resource Downtimes"},{"location":"common/registration/#registering-new-downtime","text":"To register a new downtime for a resource or for multiples resources that are part of a resource group, you will use webforms to generate the contents of the downtime entry, copy it into the downtime file corresponding to your resource, and submit it as a GitHub pull request. Follow the instructions below: Open one of the downtime generation webforms in your browser: Use the resource downtime generator if you only need to declare a downtime for a single resource. Use the resource group downtime generator if you need to declare a downtime for multiple resources across a resource group. Select your facility, site, resource group, and/or resource from the corresponding lists. For the single resource downtime form: Select all the services that will be down. To select multiple, use Control-Click on Windows and Linux, or Command-Click on macOS. Fill the other fields with information about the downtime. Click the Generate button. If the information is valid, a block of text will be displayed in the box labeled Generated YAML . Otherwise, check for error messages and fix your input. Follow the instructions shown below the generated block of text. \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Wait for OSG staff to approve and merge your new downtime.","title":"Registering new downtime"},{"location":"common/registration/#modifying-existing-downtime","text":"In case an already registered downtime is incorrect or need to be updated to reflect new information, you can modify existing downtime entries using the GitHub editor. Failure Changes to the ID or CreatedTime fields will be rejected. To modify an existing downtime entry for a registered resource, manually make the changes in the matching downtime YAML file. Follow the instructions below: Open the topology repository in your browser. If you do not know the facility, site, and resource group of the resource the downtime entry refers to, search the repository for the FQDN of your resource wrapped in double-quotes using the GitHub search bar (e.g., \"glidein2.chtc.wisc.edu\" ): If the search returns a single YAML file , note the name of the facility, site, and resource group and continue to the next step. If the search doesn't return any results or returns more than one YAML file , please contact us . Open the following URL in your browser using the facility, site, and resource group names to replace , , and , respectively: https://github.com/opensciencegrid/topology/edit/master/topology///_downtime.yaml \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Make changes with the GitHub file editor using the downtime template as a reference. Make sure that the formatting and indentation of the modified entry does not change. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Move forward end date for CHTC-glidein2 regular maintenance Wait for OSG staff to approve and merge your modified downtime.","title":"Modifying existing downtime"},{"location":"common/registration/#registering-virtual-organizations","text":"Virtual Organizations (VOs) are sets of groups or individuals defined by some common cyber-infrastructure need. This can be a scientific experiment, a university campus or a distributed research effort. A VO represents all its members and their common needs in distributed computing environment. A VO also includes the group\u2019s computing/storage resources and services. For more information about VOs, see this page . Info Before submitting a registration for a new VO, please contact us describing your organization's computing needs. VO information is stored as YAML files in the virtual-organizations directory of the topology repository . To modify a VO's information or register a new VO, follow the instructions below: Open the topology repository in your browser. If you see your VO in the list, open the file and continue to the next step. If you do not see your VO in the list, click Create new file button: In the new file dialog, enter .yaml , replacing with the name of your VO. \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Make changes with the GitHub file editor using the VO template as a guide. You may leave any ID fields blank. If you are modifying existing entries, make sure you do not change formatting or indentation of the modified entry. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Updating contact information for the GLOW VO","title":"Registering Virtual Organizations"},{"location":"common/registration/#registering-projects","text":"Info Before submitting a registration for a new project, please contact us describing your organization's computing needs. Project information is stored as YAML files in the projects directory of the topology repository . To modify a VO's information or register a new VO, follow the instructions below: Open the topology repository in your browser. If you see your project in the list, open the file and continue to the next step. If you do not see your project in the list, click Create new file button: In the new file dialog, enter .yaml , replacing with the name of your project. \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Make changes with the GitHub file editor using the project template as a guide. You may leave any ID fields blank. If you are modifying existing entries, make sure you do not change formatting or indentation of the modified entry. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Updating contact information for the Mu2e project","title":"Registering Projects"},{"location":"common/registration/#getting-help","text":"To get assistance, please use the this page .","title":"Getting Help"},{"location":"common/yum/","text":"OSG Yum Repositories \u00b6 This document introduces Yum repositories and how they are used in the OSG. If you are unfamiliar with Yum, see the documentation on using Yum and RPM . Repositories \u00b6 The OSG hosts multiple repositories at repo.opensciencegrid.org that are intended for public use: The OSG Yum repositories... Contain RPMs that... osg , osg-upcoming are considered production-ready (default). osg-testing , osg-upcoming-testing have passed developer or integration testing but not acceptance testing osg-development , osg-upcoming-development have not passed developer, integration or acceptance testing. Do not use without instruction from the OSG Software and Release Team. osg-contrib have been contributed from outside of the OSG Software and Release Team. See this section for details. Note The upcoming repositories contain newer software that might require manual action after an update. They are not enabled by default and must be enabled in addition to the main osg repository. See the upcoming software section for details. OSG's RPM packages also rely on external packages provided by supported OSes and EPEL. You must have the following repositories available and enabled: OS repositories, including the following ones that aren't enabled by default: extras (SL 7, CentOS 7, CentOS Stream 8, Rocky Linux 8, AlmaLinux 8) Server-Extras (RHEL 7) powertools (CentOS Stream 8, Rocky Linux 8, AlmaLinux 8) CodeReady Builder (RHEL 8) or crb (all EL9 variants) EPEL repositories OSG repositories If any of these repositories are missing, you may end up with installation issues or missing dependencies. Danger Other repositories, such as jpackage , dag , or rpmforge , are not supported and you may encounter problems if you use them. Upcoming Software \u00b6 Certain sites have requested new versions of software that would be considered \"disruptive\" or \"experimental\": upgrading to them would likely require manual intervention after their installation. We do not want sites to unwittingly upgrade to these versions. We have placed such software in separate repositories. Their names start with osg-upcoming and have the same structure as our standard repositories, as well as the same guarantees of quality and production-readiness. There are separate sets of upcoming repositories for each release series. For example, the OSG 23 repos have corresponding 23-upcoming repos . The upcoming repositories are meant to be layered on top of our standard repositories: installing software from the upcoming repositories requires also enabling the standard repositories from the same release. Contrib Software \u00b6 In addition to our regular software repositories, we also have a contrib (short for \"contributed\") software repository. This is software that is does not go through the same software testing and release processes as the official OSG Software release, but may be useful to you. Particularly, contrib software is not guaranteed to be compatible with the rest of the OSG Software stack nor is it supported by the OSG. The definitive list of software in the contrib repository can be found here: OSG 23 EL8 contrib software repository OSG 23 EL9 contrib software repository OSG 3.6 EL7 contrib software repository OSG 3.6 EL8 contrib software repository OSG 3.6 EL9 contrib software repository If you would like to distribute your software in the OSG contrib repository, please contact us with a description of your software, what users it serves, and relevant RPM packaging. Installing Yum Repositories \u00b6 Install the Yum priorities plugin (EL7) \u00b6 The Yum priorities plugin is used to tell Yum to prefer OSG packages over EPEL or OS packages. It is important to install and enable the Yum priorities plugin before installing OSG Software to ensure that you are getting the OSG-supported versions. This plugin is built into Yum on EL8 and EL9 distributions. Install the Yum priorities package: root@host # yum install yum-plugin-priorities Ensure that /etc/yum.conf has the following line in the [main] section: plugins=1 Enable additional OS repositories \u00b6 Some packages depend on packages that are in OS repositories not enabled by default. The repositories to enable, as well as the instructions to enable them, are OS-dependent. Note A repository is enabled if it has enabled=1 in its definition, or if the enabled line is missing (i.e. it is enabled unless specified otherwise.) SL 7 \u00b6 Install the yum-conf-extras RPM package. Ensure that the sl-extras repo in /etc/yum.repos.d/sl-extras.repo is enabled. CentOS 7 \u00b6 Ensure that the extras repo in /etc/yum.repos.d/CentOS-Base.repo is enabled. CentOS Stream 8 \u00b6 Ensure that the extras repo in /etc/yum.repos.d/CentOS-Stream-Extras.repo is enabled. Ensure that the powertools repo in /etc/yum.repos.d/CentOS-Stream-PowerTools.repo is enabled. Rocky Linux 8 \u00b6 Ensure that the extras repo in /etc/yum.repos.d/Rocky-Extras.repo is enabled. Ensure that the powertools repo in /etc/yum.repos.d/Rocky-PowerTools.repo is enabled. AlmaLinux 8 \u00b6 Ensure that the extras repo in /etc/yum.repos.d/almalinux.repo is enabled. Ensure that the powertools repo in /etc/yum.repos.d/almalinux-powertools.repo is enabled. RHEL 7 \u00b6 Ensure that the Server-Extras channel is enabled. RHEL 8 \u00b6 Ensure that the CodeReady Linux Builder channel is enabled. See Red Hat's instructions on how to enable this repo. Rocky Linux 9 \u00b6 Ensure that the crb repo in /etc/yum.repos.d/rocky.repo is enabled AlmaLinux 9 \u00b6 Ensure that the crb repo in /etc/yum.repos.d/almalinux-crb.repo is enabled CentOS Stream 9 \u00b6 Ensure that the crb repo in /etc/yum.repos.d/centos.repo is enabled Install the EPEL repositories \u00b6 OSG software depends on packages distributed via the EPEL repositories. You must install and enable these first. Install the EPEL repository, if not already present. Choose the right version to match your OS version. # # EPEL 7 (For RHEL 7, CentOS 7, and SL 7) root@host # yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm # # EPEL 8 (For RHEL 8 and CentOS Stream 8, Rocky Linux 8, AlmaLinux 8) root@host # yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm # # EPEL 9 (For RHEL 9 and CentOS Stream 9, Rocky Linux 9, AlmaLinux 9) root@host # yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm Verify that /etc/yum.repos.d/epel.repo exists; the [epel] section should contain: The line enabled=1 Either no priority setting, or a priority setting that is 99 or higher Warning If you have your own mirror or configuration of the EPEL repository, you MUST verify that the priority of the EPEL repository is either missing, or 99 or a higher number. The OSG repositories must have a better (numerically lower) priority than the EPEL repositories; otherwise, you might have dependency resolution (\"depsolving\") issues. Install the OSG Repositories \u00b6 This document assumes a fresh install. For instructions on upgrading from one OSG series to another, see the release series document . Install the OSG repository for your OS version and the OSG release series that you wish to use: OSG 23 EL8: root@host # yum install https://repo.opensciencegrid.org/osg/23-main/osg-23-main-el8-release-latest.rpm OSG 23 EL9: root@host # yum install https://repo.opensciencegrid.org/osg/23-main/osg-23-main-el9-release-latest.rpm OSG 3.6 EL7: root@host # yum install https://repo.opensciencegrid.org/osg/3.6/osg-3.6-el7-release-latest.rpm OSG 3.6 EL8: root@host # yum install https://repo.opensciencegrid.org/osg/3.6/osg-3.6-el8-release-latest.rpm OSG 3.6 EL9: root@host # yum install https://repo.opensciencegrid.org/osg/3.6/osg-3.6-el9-release-latest.rpm The only OSG repository enabled by default is the release one. If you want to enable another one (e.g. osg-testing ), then edit its file (e.g. /etc/yum.repos.d/osg-testing.repo ) and change the enabled option from 0 to 1: [osg-testing] name=OSG Software for Enterprise Linux 7 - Testing - $basearch #baseurl=https://repo.opensciencegrid.org/osg/3.6/el7/testing/$basearch mirrorlist=https://repo.opensciencegrid.org/mirror/osg/3.6/el7/testing/$basearch failovermethod=priority priority=98 enabled=1 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-OSG file:///etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-2 Optional Configuration \u00b6 Enable automatic security updates \u00b6 For production services, we suggest only changing software versions during controlled downtime. Therefore we recommend security-only automatic updates or disabling automatic updates entirely. Note Automatic updates for EL8 and EL9 variants are provided in the dnf-automatic RPM, which is not installed by default. To enable only security related automatic updates: On EL 7 variants, edit /etc/yum/yum-cron.conf and set update_cmd = security On EL8 and EL9 variants, edit /etc/dnf/automatic.conf and set upgrade_type = security CentOS 7, CentOS Stream 8, and CentOS Stream 9 do not support security-only automatic updates; doing any of the above steps will prevent automatic updates from happening at all. To disable automatic updates entirely: On EL7 variants, run: root@host # service yum-cron stop On EL8 and EL9 variants, run: root@host # systemctl disable --now dnf-automatic.timer Configuring Spacewalk priorities \u00b6 Sites using Spacewalk to manage RPM packages will need to configure OSG Yum repository priorities using their Spacewalk ID. For example, if the OSG 3.4 repository's Spacewalk ID is centos_7_osg34_dev , modify /etc/yum/pluginconf.d/90-osg.conf to include the following: [centos_7_osg_34_dev] priority = 98 Repository Mirrors \u00b6 If you run a large site (>20 nodes), you should consider setting up a local mirror for the OSG repositories. A local Yum mirror allows you to reduce the amount of external bandwidth used when updating or installing packages. Add the following to a file in /etc/cron.d : * * * * root rsync -aH rsync://repo-rsync.opensciencegrid.org/osg/ /var/www/html/osg/ Or, to mirror only a single repository: * * * * root rsync -aH rsync://repo-rsync.opensciencegrid.org/osg//el9/development /var/www/html/osg//el7 Replace with the OSG release you would like to use (e.g. 23-main ) and with a number between 0 and 59. On your worker node, you can replace the baseurl line of /etc/yum.repos.d/osg.repo with the appropriate URL for your mirror. If you are interested in having your mirror be part of the OSG's default set of mirrors, please file a support ticket . Reference \u00b6 Basic use of Yum","title":"OSG Yum Repos"},{"location":"common/yum/#osg-yum-repositories","text":"This document introduces Yum repositories and how they are used in the OSG. If you are unfamiliar with Yum, see the documentation on using Yum and RPM .","title":"OSG Yum Repositories"},{"location":"common/yum/#repositories","text":"The OSG hosts multiple repositories at repo.opensciencegrid.org that are intended for public use: The OSG Yum repositories... Contain RPMs that... osg , osg-upcoming are considered production-ready (default). osg-testing , osg-upcoming-testing have passed developer or integration testing but not acceptance testing osg-development , osg-upcoming-development have not passed developer, integration or acceptance testing. Do not use without instruction from the OSG Software and Release Team. osg-contrib have been contributed from outside of the OSG Software and Release Team. See this section for details. Note The upcoming repositories contain newer software that might require manual action after an update. They are not enabled by default and must be enabled in addition to the main osg repository. See the upcoming software section for details. OSG's RPM packages also rely on external packages provided by supported OSes and EPEL. You must have the following repositories available and enabled: OS repositories, including the following ones that aren't enabled by default: extras (SL 7, CentOS 7, CentOS Stream 8, Rocky Linux 8, AlmaLinux 8) Server-Extras (RHEL 7) powertools (CentOS Stream 8, Rocky Linux 8, AlmaLinux 8) CodeReady Builder (RHEL 8) or crb (all EL9 variants) EPEL repositories OSG repositories If any of these repositories are missing, you may end up with installation issues or missing dependencies. Danger Other repositories, such as jpackage , dag , or rpmforge , are not supported and you may encounter problems if you use them.","title":"Repositories"},{"location":"common/yum/#upcoming-software","text":"Certain sites have requested new versions of software that would be considered \"disruptive\" or \"experimental\": upgrading to them would likely require manual intervention after their installation. We do not want sites to unwittingly upgrade to these versions. We have placed such software in separate repositories. Their names start with osg-upcoming and have the same structure as our standard repositories, as well as the same guarantees of quality and production-readiness. There are separate sets of upcoming repositories for each release series. For example, the OSG 23 repos have corresponding 23-upcoming repos . The upcoming repositories are meant to be layered on top of our standard repositories: installing software from the upcoming repositories requires also enabling the standard repositories from the same release.","title":"Upcoming Software"},{"location":"common/yum/#contrib-software","text":"In addition to our regular software repositories, we also have a contrib (short for \"contributed\") software repository. This is software that is does not go through the same software testing and release processes as the official OSG Software release, but may be useful to you. Particularly, contrib software is not guaranteed to be compatible with the rest of the OSG Software stack nor is it supported by the OSG. The definitive list of software in the contrib repository can be found here: OSG 23 EL8 contrib software repository OSG 23 EL9 contrib software repository OSG 3.6 EL7 contrib software repository OSG 3.6 EL8 contrib software repository OSG 3.6 EL9 contrib software repository If you would like to distribute your software in the OSG contrib repository, please contact us with a description of your software, what users it serves, and relevant RPM packaging.","title":"Contrib Software"},{"location":"common/yum/#installing-yum-repositories","text":"","title":"Installing Yum Repositories"},{"location":"common/yum/#install-the-yum-priorities-plugin-el7","text":"The Yum priorities plugin is used to tell Yum to prefer OSG packages over EPEL or OS packages. It is important to install and enable the Yum priorities plugin before installing OSG Software to ensure that you are getting the OSG-supported versions. This plugin is built into Yum on EL8 and EL9 distributions. Install the Yum priorities package: root@host # yum install yum-plugin-priorities Ensure that /etc/yum.conf has the following line in the [main] section: plugins=1","title":"Install the Yum priorities plugin (EL7)"},{"location":"common/yum/#enable-additional-os-repositories","text":"Some packages depend on packages that are in OS repositories not enabled by default. The repositories to enable, as well as the instructions to enable them, are OS-dependent. Note A repository is enabled if it has enabled=1 in its definition, or if the enabled line is missing (i.e. it is enabled unless specified otherwise.)","title":"Enable additional OS repositories"},{"location":"common/yum/#sl-7","text":"Install the yum-conf-extras RPM package. Ensure that the sl-extras repo in /etc/yum.repos.d/sl-extras.repo is enabled.","title":"SL 7"},{"location":"common/yum/#centos-7","text":"Ensure that the extras repo in /etc/yum.repos.d/CentOS-Base.repo is enabled.","title":"CentOS 7"},{"location":"common/yum/#centos-stream-8","text":"Ensure that the extras repo in /etc/yum.repos.d/CentOS-Stream-Extras.repo is enabled. Ensure that the powertools repo in /etc/yum.repos.d/CentOS-Stream-PowerTools.repo is enabled.","title":"CentOS Stream 8"},{"location":"common/yum/#rocky-linux-8","text":"Ensure that the extras repo in /etc/yum.repos.d/Rocky-Extras.repo is enabled. Ensure that the powertools repo in /etc/yum.repos.d/Rocky-PowerTools.repo is enabled.","title":"Rocky Linux 8"},{"location":"common/yum/#almalinux-8","text":"Ensure that the extras repo in /etc/yum.repos.d/almalinux.repo is enabled. Ensure that the powertools repo in /etc/yum.repos.d/almalinux-powertools.repo is enabled.","title":"AlmaLinux 8"},{"location":"common/yum/#rhel-7","text":"Ensure that the Server-Extras channel is enabled.","title":"RHEL 7"},{"location":"common/yum/#rhel-8","text":"Ensure that the CodeReady Linux Builder channel is enabled. See Red Hat's instructions on how to enable this repo.","title":"RHEL 8"},{"location":"common/yum/#rocky-linux-9","text":"Ensure that the crb repo in /etc/yum.repos.d/rocky.repo is enabled","title":"Rocky Linux 9"},{"location":"common/yum/#almalinux-9","text":"Ensure that the crb repo in /etc/yum.repos.d/almalinux-crb.repo is enabled","title":"AlmaLinux 9"},{"location":"common/yum/#centos-stream-9","text":"Ensure that the crb repo in /etc/yum.repos.d/centos.repo is enabled","title":"CentOS Stream 9"},{"location":"common/yum/#install-the-epel-repositories","text":"OSG software depends on packages distributed via the EPEL repositories. You must install and enable these first. Install the EPEL repository, if not already present. Choose the right version to match your OS version. # # EPEL 7 (For RHEL 7, CentOS 7, and SL 7) root@host # yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm # # EPEL 8 (For RHEL 8 and CentOS Stream 8, Rocky Linux 8, AlmaLinux 8) root@host # yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm # # EPEL 9 (For RHEL 9 and CentOS Stream 9, Rocky Linux 9, AlmaLinux 9) root@host # yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm Verify that /etc/yum.repos.d/epel.repo exists; the [epel] section should contain: The line enabled=1 Either no priority setting, or a priority setting that is 99 or higher Warning If you have your own mirror or configuration of the EPEL repository, you MUST verify that the priority of the EPEL repository is either missing, or 99 or a higher number. The OSG repositories must have a better (numerically lower) priority than the EPEL repositories; otherwise, you might have dependency resolution (\"depsolving\") issues.","title":"Install the EPEL repositories"},{"location":"common/yum/#install-the-osg-repositories","text":"This document assumes a fresh install. For instructions on upgrading from one OSG series to another, see the release series document . Install the OSG repository for your OS version and the OSG release series that you wish to use: OSG 23 EL8: root@host # yum install https://repo.opensciencegrid.org/osg/23-main/osg-23-main-el8-release-latest.rpm OSG 23 EL9: root@host # yum install https://repo.opensciencegrid.org/osg/23-main/osg-23-main-el9-release-latest.rpm OSG 3.6 EL7: root@host # yum install https://repo.opensciencegrid.org/osg/3.6/osg-3.6-el7-release-latest.rpm OSG 3.6 EL8: root@host # yum install https://repo.opensciencegrid.org/osg/3.6/osg-3.6-el8-release-latest.rpm OSG 3.6 EL9: root@host # yum install https://repo.opensciencegrid.org/osg/3.6/osg-3.6-el9-release-latest.rpm The only OSG repository enabled by default is the release one. If you want to enable another one (e.g. osg-testing ), then edit its file (e.g. /etc/yum.repos.d/osg-testing.repo ) and change the enabled option from 0 to 1: [osg-testing] name=OSG Software for Enterprise Linux 7 - Testing - $basearch #baseurl=https://repo.opensciencegrid.org/osg/3.6/el7/testing/$basearch mirrorlist=https://repo.opensciencegrid.org/mirror/osg/3.6/el7/testing/$basearch failovermethod=priority priority=98 enabled=1 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-OSG file:///etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-2","title":"Install the OSG Repositories"},{"location":"common/yum/#optional-configuration","text":"","title":"Optional Configuration"},{"location":"common/yum/#enable-automatic-security-updates","text":"For production services, we suggest only changing software versions during controlled downtime. Therefore we recommend security-only automatic updates or disabling automatic updates entirely. Note Automatic updates for EL8 and EL9 variants are provided in the dnf-automatic RPM, which is not installed by default. To enable only security related automatic updates: On EL 7 variants, edit /etc/yum/yum-cron.conf and set update_cmd = security On EL8 and EL9 variants, edit /etc/dnf/automatic.conf and set upgrade_type = security CentOS 7, CentOS Stream 8, and CentOS Stream 9 do not support security-only automatic updates; doing any of the above steps will prevent automatic updates from happening at all. To disable automatic updates entirely: On EL7 variants, run: root@host # service yum-cron stop On EL8 and EL9 variants, run: root@host # systemctl disable --now dnf-automatic.timer","title":"Enable automatic security updates"},{"location":"common/yum/#configuring-spacewalk-priorities","text":"Sites using Spacewalk to manage RPM packages will need to configure OSG Yum repository priorities using their Spacewalk ID. For example, if the OSG 3.4 repository's Spacewalk ID is centos_7_osg34_dev , modify /etc/yum/pluginconf.d/90-osg.conf to include the following: [centos_7_osg_34_dev] priority = 98","title":"Configuring Spacewalk priorities"},{"location":"common/yum/#repository-mirrors","text":"If you run a large site (>20 nodes), you should consider setting up a local mirror for the OSG repositories. A local Yum mirror allows you to reduce the amount of external bandwidth used when updating or installing packages. Add the following to a file in /etc/cron.d : * * * * root rsync -aH rsync://repo-rsync.opensciencegrid.org/osg/ /var/www/html/osg/ Or, to mirror only a single repository: * * * * root rsync -aH rsync://repo-rsync.opensciencegrid.org/osg//el9/development /var/www/html/osg//el7 Replace with the OSG release you would like to use (e.g. 23-main ) and with a number between 0 and 59. On your worker node, you can replace the baseurl line of /etc/yum.repos.d/osg.repo with the appropriate URL for your mirror. If you are interested in having your mirror be part of the OSG's default set of mirrors, please file a support ticket .","title":"Repository Mirrors"},{"location":"common/yum/#reference","text":"Basic use of Yum","title":"Reference"},{"location":"compute-element/covid-19/","text":"Supporting COVID-19 Research on the OSG \u00b6 Info The instructions in this document are deprecated, as COVID-19 jobs are no longer prioritized. There a few options available for sites with computing resources who want to support the important and urgent work of COVID-19 researchers using the OSG. As we're currently routing such projects through the OSG VO, your site can be configured to accept pilots that exclusively run OSG VO jobs relating to COVID-19 research (among other pilots you support), allowing you to prioritize these pilots and account for this usage separately from other OSG activity. To support COVID-19 work, the overall process includes the following: Make the site computing resources available through a HTCondor-CE if you have not already done so. You can install a locally-managed instance or ask OSG to host the CE on your behalf. If neither solution is viable, or you'd like to discuss the options, please send email to help@osg-htc.org and we'll work with you to arrive at the best solution. If you already provide resources through an OSG Hosted CE, skip to this section . Enable the OSG VO on your HTCondor-CE. Setup a job route specific to COVID-19 pilot jobs (documented below). The job route will allow you to prioritize these jobs using local policy in your site's cluster. (Optional) To attract more user jobs, install CVMFS and Apptainer on your site's worker nodes Send email to help@osg-htc.org requesting that your CE receive COVID-19 pilots. We will need to know the CE hostname and any special restrictions that might apply to these pilots. Setting up a COVID-19 Job Route \u00b6 By default, COVID-19 pilots will look identical to OSG pilots except they will have the attribute IsCOVID19 = true . They do not require mapping to a distinct Unix account but can be sent to a prioritized queue or accounting group. Job routes are controlled by the JOB_ROUTER_ENTRIES configuration variable in HTCondor-CE. Customizations may be placed in /etc/condor-ce/config.d/ where files are parsed in lexicographical order, e.g. JOB_ROUTER_ENTRIES specified in 50-covid-routes.conf will override JOB_ROUTER_ENTRIES in 02-local-slurm.conf . For Non-HTCondor batch systems \u00b6 To add a new route for COVID-19 pilots for non-HTCondor batch systems: Note the names of your currently enabled routes: condor_ce_job_router_info -config Add the following configuration to a file in /etc/condor-ce/config.d/ (files are parsed in lexicographical order): JOB_ROUTER_ENTRIES @=jre [ name = \"OSG_COVID19_Jobs\"; GridResource = \"batch slurm\"; TargetUniverse = 9; set_default_queue = \"covid19\"; Requirements = (TARGET.IsCOVID19 =?= true); ] $(JOB_ROUTER_ENTRIES) @jre Replacing slurm in the GridResource attribute with the appropriate value for your batch system (e.g., lsf , pbs , sge , or slurm ); and the value of set_default_queue with the name of the partition or queue of your local batch system dedicated to COVID-19 work. Ensure that COVID-19 jobs match to the new route. Choose one of the options below depending on your HTCondor version ( condor_version ): For versions of HTCondor >= 8.8.7 and < 8.9.0; or HTCondor >= 8.9.6: specify the routes considered by the job router and the order in which they're considered by adding the following configuration to a file in /etc/condor-ce/config.d/ : JOB_ROUTER_ROUTE_NAMES = OSG_COVID19_Jobs, $(JOB_ROUTER_ROUTE_NAMES) If your configuration does not already define JOB_ROUTER_ROUTE_NAMES , you need to add the name of all previous routes to it, leaving OSG_COVID19_Jobs at the start of the list. For example: JOB_ROUTER_ROUTE_NAMES = OSG_COVID19_Jobs, Local_Condor, $(JOB_ROUTER_ROUTE_NAMES) For older versions of HTCondor: add (TARGET.IsCOVID19 =!= true) to the Requirements of any existing routes. For example, the following job route: JOB_ROUTER_ENTRIES @=jre [ name = \"Local_Slurm\" GridResource = \"batch slurm\"; TargetUniverse = 9; set_default_queue = \"atlas; Requirements = (TARGET.Owner =!= \"osg\"); ] @jre Should be updated as follows: JOB_ROUTER_ENTRIES @=jre [ name = \"Local_Slurm\" GridResource = \"batch slurm\"; TargetUniverse = 9; set_default_queue = \"atlas; Requirements = (TARGET.Owner =!= \"osg\") && (TARGET.IsCOVID19 =!= true); ] @jre Reconfigure your HTCondor-CE: condor_ce_reconfig Continue onto this section to verify your configuration For HTCondor batch systems \u00b6 Similarly, at an HTCondor site, one can place these jobs into a separate accounting group by providing the set_AcctGroup and eval_set_AccountingGroup attributes in a new job route. To add a new route for COVID-19 pilots for non-HTCondor batch systems: Note the names of your currently enabled routes: condor_ce_job_router_info -config Add the following configuration to a file in /etc/condor-ce/config.d/ (files are parsed in lexicographical order): JOB_ROUTER_ENTRIES @=jre [ name = \"OSG_COVID19_Jobs\"; TargetUniverse = 5; set_AcctGroup = \"covid19\"; eval_set_AccountingGroup = strcat(AcctGroup, \".\", Owner); Requirements = (TARGET.IsCOVID19 =?= true); ] $(JOB_ROUTER_ENTRIES) @jre Replacing covid19 in set_AcctGroup with the name of the accounting group that you would like to use for COVID-19 jobs. Ensure that COVID-19 jobs match to the new route. Choose one of the options below depending on your HTCondor version ( condor_version ): For versions of HTCondor >= 8.8.7 and < 8.9.0; or HTCondor >= 8.9.6: specify the routes considered by the job router and the order in which they're considered by adding the following configuration to a file in /etc/condor-ce/config.d/ : JOB_ROUTER_ROUTE_NAMES = OSG_COVID19_Jobs, $(JOB_ROUTER_ROUTE_NAMES) For older versions of HTCondor: add (TARGET.IsCOVID19 =!= true) to the Requirements of any existing routes. For example, the following job route: JOB_ROUTER_ENTRIES @=jre [ name = \"Local_Condor\" TargetUniverse = 5; Requirements = (TARGET.Owner =!= \"osg\"); ] @jre Should be updated as follows: JOB_ROUTER_ENTRIES @=jre [ name = \"Local_Condor\" TargetUniverse = 5; Requirements = (TARGET.Owner =!= \"atlas\") && (TARGET.IsCOVID19 =!= true); ] @jre Reconfigure your HTCondor-CE: condor_ce_reconfig Continue onto this section to verify your configuration Verifying the COVID-19 Job Route \u00b6 To verify that your HTCondor-CE is configured to support COVID-19 jobs, perform the following steps: Ensure that the OSG_COVID19_Jobs route appears with all of your other previously enabled routes: condor_ce_job_router_info -config Known issue: removing old routes If your HTCondor-CE has jobs associated with a route that is removed from your configuration, this will result in a crashing Job Router. If you accidentally remove an old route, restore the route or remove all jobs associated with said route. Ensure that COVID-19 jobs will match to your new job route: For versions of HTCondor >= 8.8.7 and < 8.9.0; or HTCondor >= 8.9.6: OSG_COVID19_Jobs should be the first route in the routing table: condor_ce_config_val -verbose JOB_ROUTER_ROUTE_NAMES For older versions of HTCondor: the Requirements expresison of your OSG_COVID19_Jobs route must contain (TARGET.IsCOVID19 =?= true) and all other routes must contain (TARGET.IsCOVID19 =!= true) in their Requirements expression. After requesting COVID-19 jobs , verify that jobs are being routed appropriately, by examining pilots with condor_ce_router_q . Requesting COVID-19 Jobs \u00b6 To receive COVID-19 pilot jobs, send an email to help@osg-htc.org with the subject Requesting COVID-19 pilots and the following information: Whether you want to receive only COVID-19 jobs, or if you want to accept COVID-19 and other OSG jobs The hostname(s) of your HTCondor-CE(s) Any other restrictions that may apply to these jobs (e.g. number of available cores) Viewing COVID-19 Contributions \u00b6 You can view how many hours that COVID-19 projects have consumed at your site with this GRACC dashboard . Getting Help \u00b6 To get assistance, please use this page .","title":"Supporting COVID-19 Research on the OSG"},{"location":"compute-element/covid-19/#supporting-covid-19-research-on-the-osg","text":"Info The instructions in this document are deprecated, as COVID-19 jobs are no longer prioritized. There a few options available for sites with computing resources who want to support the important and urgent work of COVID-19 researchers using the OSG. As we're currently routing such projects through the OSG VO, your site can be configured to accept pilots that exclusively run OSG VO jobs relating to COVID-19 research (among other pilots you support), allowing you to prioritize these pilots and account for this usage separately from other OSG activity. To support COVID-19 work, the overall process includes the following: Make the site computing resources available through a HTCondor-CE if you have not already done so. You can install a locally-managed instance or ask OSG to host the CE on your behalf. If neither solution is viable, or you'd like to discuss the options, please send email to help@osg-htc.org and we'll work with you to arrive at the best solution. If you already provide resources through an OSG Hosted CE, skip to this section . Enable the OSG VO on your HTCondor-CE. Setup a job route specific to COVID-19 pilot jobs (documented below). The job route will allow you to prioritize these jobs using local policy in your site's cluster. (Optional) To attract more user jobs, install CVMFS and Apptainer on your site's worker nodes Send email to help@osg-htc.org requesting that your CE receive COVID-19 pilots. We will need to know the CE hostname and any special restrictions that might apply to these pilots.","title":"Supporting COVID-19 Research on the OSG"},{"location":"compute-element/covid-19/#setting-up-a-covid-19-job-route","text":"By default, COVID-19 pilots will look identical to OSG pilots except they will have the attribute IsCOVID19 = true . They do not require mapping to a distinct Unix account but can be sent to a prioritized queue or accounting group. Job routes are controlled by the JOB_ROUTER_ENTRIES configuration variable in HTCondor-CE. Customizations may be placed in /etc/condor-ce/config.d/ where files are parsed in lexicographical order, e.g. JOB_ROUTER_ENTRIES specified in 50-covid-routes.conf will override JOB_ROUTER_ENTRIES in 02-local-slurm.conf .","title":"Setting up a COVID-19 Job Route"},{"location":"compute-element/covid-19/#for-non-htcondor-batch-systems","text":"To add a new route for COVID-19 pilots for non-HTCondor batch systems: Note the names of your currently enabled routes: condor_ce_job_router_info -config Add the following configuration to a file in /etc/condor-ce/config.d/ (files are parsed in lexicographical order): JOB_ROUTER_ENTRIES @=jre [ name = \"OSG_COVID19_Jobs\"; GridResource = \"batch slurm\"; TargetUniverse = 9; set_default_queue = \"covid19\"; Requirements = (TARGET.IsCOVID19 =?= true); ] $(JOB_ROUTER_ENTRIES) @jre Replacing slurm in the GridResource attribute with the appropriate value for your batch system (e.g., lsf , pbs , sge , or slurm ); and the value of set_default_queue with the name of the partition or queue of your local batch system dedicated to COVID-19 work. Ensure that COVID-19 jobs match to the new route. Choose one of the options below depending on your HTCondor version ( condor_version ): For versions of HTCondor >= 8.8.7 and < 8.9.0; or HTCondor >= 8.9.6: specify the routes considered by the job router and the order in which they're considered by adding the following configuration to a file in /etc/condor-ce/config.d/ : JOB_ROUTER_ROUTE_NAMES = OSG_COVID19_Jobs, $(JOB_ROUTER_ROUTE_NAMES) If your configuration does not already define JOB_ROUTER_ROUTE_NAMES , you need to add the name of all previous routes to it, leaving OSG_COVID19_Jobs at the start of the list. For example: JOB_ROUTER_ROUTE_NAMES = OSG_COVID19_Jobs, Local_Condor, $(JOB_ROUTER_ROUTE_NAMES) For older versions of HTCondor: add (TARGET.IsCOVID19 =!= true) to the Requirements of any existing routes. For example, the following job route: JOB_ROUTER_ENTRIES @=jre [ name = \"Local_Slurm\" GridResource = \"batch slurm\"; TargetUniverse = 9; set_default_queue = \"atlas; Requirements = (TARGET.Owner =!= \"osg\"); ] @jre Should be updated as follows: JOB_ROUTER_ENTRIES @=jre [ name = \"Local_Slurm\" GridResource = \"batch slurm\"; TargetUniverse = 9; set_default_queue = \"atlas; Requirements = (TARGET.Owner =!= \"osg\") && (TARGET.IsCOVID19 =!= true); ] @jre Reconfigure your HTCondor-CE: condor_ce_reconfig Continue onto this section to verify your configuration","title":"For Non-HTCondor batch systems"},{"location":"compute-element/covid-19/#for-htcondor-batch-systems","text":"Similarly, at an HTCondor site, one can place these jobs into a separate accounting group by providing the set_AcctGroup and eval_set_AccountingGroup attributes in a new job route. To add a new route for COVID-19 pilots for non-HTCondor batch systems: Note the names of your currently enabled routes: condor_ce_job_router_info -config Add the following configuration to a file in /etc/condor-ce/config.d/ (files are parsed in lexicographical order): JOB_ROUTER_ENTRIES @=jre [ name = \"OSG_COVID19_Jobs\"; TargetUniverse = 5; set_AcctGroup = \"covid19\"; eval_set_AccountingGroup = strcat(AcctGroup, \".\", Owner); Requirements = (TARGET.IsCOVID19 =?= true); ] $(JOB_ROUTER_ENTRIES) @jre Replacing covid19 in set_AcctGroup with the name of the accounting group that you would like to use for COVID-19 jobs. Ensure that COVID-19 jobs match to the new route. Choose one of the options below depending on your HTCondor version ( condor_version ): For versions of HTCondor >= 8.8.7 and < 8.9.0; or HTCondor >= 8.9.6: specify the routes considered by the job router and the order in which they're considered by adding the following configuration to a file in /etc/condor-ce/config.d/ : JOB_ROUTER_ROUTE_NAMES = OSG_COVID19_Jobs, $(JOB_ROUTER_ROUTE_NAMES) For older versions of HTCondor: add (TARGET.IsCOVID19 =!= true) to the Requirements of any existing routes. For example, the following job route: JOB_ROUTER_ENTRIES @=jre [ name = \"Local_Condor\" TargetUniverse = 5; Requirements = (TARGET.Owner =!= \"osg\"); ] @jre Should be updated as follows: JOB_ROUTER_ENTRIES @=jre [ name = \"Local_Condor\" TargetUniverse = 5; Requirements = (TARGET.Owner =!= \"atlas\") && (TARGET.IsCOVID19 =!= true); ] @jre Reconfigure your HTCondor-CE: condor_ce_reconfig Continue onto this section to verify your configuration","title":"For HTCondor batch systems"},{"location":"compute-element/covid-19/#verifying-the-covid-19-job-route","text":"To verify that your HTCondor-CE is configured to support COVID-19 jobs, perform the following steps: Ensure that the OSG_COVID19_Jobs route appears with all of your other previously enabled routes: condor_ce_job_router_info -config Known issue: removing old routes If your HTCondor-CE has jobs associated with a route that is removed from your configuration, this will result in a crashing Job Router. If you accidentally remove an old route, restore the route or remove all jobs associated with said route. Ensure that COVID-19 jobs will match to your new job route: For versions of HTCondor >= 8.8.7 and < 8.9.0; or HTCondor >= 8.9.6: OSG_COVID19_Jobs should be the first route in the routing table: condor_ce_config_val -verbose JOB_ROUTER_ROUTE_NAMES For older versions of HTCondor: the Requirements expresison of your OSG_COVID19_Jobs route must contain (TARGET.IsCOVID19 =?= true) and all other routes must contain (TARGET.IsCOVID19 =!= true) in their Requirements expression. After requesting COVID-19 jobs , verify that jobs are being routed appropriately, by examining pilots with condor_ce_router_q .","title":"Verifying the COVID-19 Job Route"},{"location":"compute-element/covid-19/#requesting-covid-19-jobs","text":"To receive COVID-19 pilot jobs, send an email to help@osg-htc.org with the subject Requesting COVID-19 pilots and the following information: Whether you want to receive only COVID-19 jobs, or if you want to accept COVID-19 and other OSG jobs The hostname(s) of your HTCondor-CE(s) Any other restrictions that may apply to these jobs (e.g. number of available cores)","title":"Requesting COVID-19 Jobs"},{"location":"compute-element/covid-19/#viewing-covid-19-contributions","text":"You can view how many hours that COVID-19 projects have consumed at your site with this GRACC dashboard .","title":"Viewing COVID-19 Contributions"},{"location":"compute-element/covid-19/#getting-help","text":"To get assistance, please use this page .","title":"Getting Help"},{"location":"compute-element/hosted-ce/","text":"Requesting an OSG Hosted CE \u00b6 An OSG Hosted Compute Entrypoint (CE) is the entry point for resource requests coming from the OSG; it handles authorization and delegation of resource requests to your existing campus HPC/HTC cluster. Many sites set up their compute entrypoint locally. As an alternative, OSG offers a no-cost Hosted CE option wherein the OSG team will host and operate the HTCondor Compute Entrypoint, and configure it for the communities that you choose to support. This document explains the requirements and the procedure for requesting an OSG Hosted CE. Running more than 10,000 resource requests The Hosted CE can support thousands of concurrent resource request submissions. If you wish to run your own local compute entrypoint or expect to support more than 10,000 concurrently running OSG resource requests, see this page for installing the HTCondor-CE. Before Starting \u00b6 Before preparing your cluster for OSG resource requests, consider the following requirements: An existing compute cluster with a supported batch system running on a supported operating system Outbound network connectivity from the worker nodes (they can be behind NAT) One or more Unix accounts on your cluster's submit server with the following capabilities: Accessible via SSH key Use of SSH remote port forwarding ( AllowTcpForwarding yes ) and SSH multiplexing ( MaxSessions 10 or greater) Permission to submit jobs to your local cluster. Shared user home directories between the submit server and the worker nodes. Not required for HTCondor clusters: see this section for more details. Temporary scratch space on each worker node; site administrators should ensure that files in this directory are regularly cleaned out. OSG resource contributors must inform the OSG of any relevant changes to their site. Site downtimes For an improved turnaround time regarding an outage or downtime at your site, contact us and include downtime in the subject or body of the email. For additional technical details, please consult the reference section below. Don't meet the requirements? If your site does not meet these conditions, please contact us to discuss your options for contributing to the OSG. Scheduling a Planning Consultation \u00b6 Before participating in the OSG, either as a computational resource contributor or consumer, we ask that you contact us to set up a consultation. During this consultation, OSG staff will introduce you and your team to the OSG and develop a plan to meet your resource contribution and/or research goals. Preparing Your Local Cluster \u00b6 After the consultation, ensure that your local cluster meets the requirements as outlined above . In particular, you should now know which accounts to create for the communities that you wish to serve at your cluster. Also consider the size and number of jobs that the OSG should send to your site (e.g., number of cores, memory, GPUs, walltime) as well as their scheduling policy (e.g. preemptible backfill partitions). Additionally, OSG staff may have directed you to follow installation instructions from one or more of the following sections: (Recommended) Providing access to CVMFS \u00b6 Maximize resource utilization; required for GPU support Installing CVMFS on your cluster makes your resources more attractive to OSG user jobs! Additionally, if you plan to contribute GPUs to the OSG, installation of CVMFS is required . Many users in the OSG make of use software modules and/or containers provided by their collaborations or by the OSG Research Facilitation team. In order to support these users without having to install specific software modules on your cluster, you may provide a distributed software repository system called CernVM File System (CVMFS). In order to provide CVMFS at your site, you will need the following: A cluster-wide Frontier Squid proxy service with at least 50GB of cache space; installation instructions for Frontier Squid are provided here . A local CVMFS cache per worker node (10 GB minimum, 20 GB recommended) After setting up the Frontier Squid proxy and worker node local caches, install CVMFS on each worker node. (HTCondor clusters only) Installing the OSG Worker Node Client \u00b6 Skip this section if you have CVMFS or shared home directories! If you have CVMFS installed or shared home directories on your worker nodes, you can skip manual installation of the OSG Worker Node Client. All OSG sites need to provide the OSG Worker Node Client on each worker node in their local cluster. This is normally handled by OSG staff for a Hosted CE but that requires shared home directories across the cluster. However, for sites with an HTCondor batch system, often there is no shared filesystem set up. If you run an HTCondor site and it is easier to install and maintain the Worker Node Client on each worker node than to install CVMFS or maintain shared file system, you have the following options: Install the Worker Node Client from RPM Install the Worker Node Client from tarball Requesting an OSG Hosted CE \u00b6 After preparing your local cluster, apply for a Hosted CE by filling out the cluster integration questionnaire. Your answers will help our operators submit resource requests to your local cluster of the appropriate size and scale. Cluster Integration Questionnaire Can I change my answers at a later date? Yes! If you want the OSG to change the size (i.e. CPU, RAM), type (e.g., GPU requests), or number of resource requests, contact us with the FQDN of your login host and the details of your changes. Finalizing Installation \u00b6 After applying for an OSG Hosted CE, our staff will contact you with the following information: IP ranges of OSG hosted services Public SSH key to be installed in the OSG accounts Once this is done, OSG staff will work with you and your team to begin submitting resource requests to your site, first with some tests, then with a steady ramp-up to full production. Validating contributions \u00b6 In addition to any internal validation processes that you may have, the OSG provides monitoring to view which communities and projects within said communities are accessing your site, their fields of science, and home institution. Below is an example of the monitoring views that will be available for your cluster. To view your contributions, select your site from the Facility dropdown of the Payload job summary dashboard. Note that accounting data may take up to 24 hours to display. Reference \u00b6 User accounts \u00b6 Each resource pool in the OSG Consortium that uses Hosted CEs is mapped to your site as a fixed, specific account; we request the account names are of the form osg01 through osg20 . The mappings from Unix usernames to resource pools are as follows: Username Pool Supported Research osg01 OSPool Projects (primarily single PI) supported directly by the OSG organization osg02 GLOW Projects coming from the Center for High Throughput Computing at the University of Wisconsin-Madison osg03 HCC Projects coming from the Holland Computing Center at the University of Nebraska\u2013Lincoln osg04 CMS High-energy physics experiment from the Large Hadron Collider at CERN osg05 Fermilab Experiments from the Fermi National Accelerator Laboratory osg07 IGWN Gravitational wave detection experiments osg08 IGWN Gravitational wave detection experiments osg09 ATLAS High-energy physics experiment from the Large Hadron Collider at CERN osg10 GlueX Study of quark and gluon degrees of freedom in hadrons using high-energy photons osg11 DUNE Experiment for neutrino science and proton decay studies osg12 IceCube Research based on data from the IceCube neutrino detector osg13 XENON Dark matter search experiment osg14 JLab Experiments from the Thomas Jefferson National Accelerator Facility osg15 - osg20 - Unassigned For example, the activities in your batch system corresponding to the user osg02 will always be associated with the GLOW resource pool. Security \u00b6 OSG takes multiple precautions to maintain security and prevent unauthorized usage of resources: Access to the OSG system with SSH keys are restricted to the OSG staff maintaining them Users are carefully vetted before they are allowed to submit jobs to OSG Jobs running through OSG can be traced back to the user that submitted them Job submission can quickly be disabled if needed Our security team is readily contactable in case of an emergency: https://osg-htc.org/security/#reporting-a-security-incident How to Get Help \u00b6 Is your site not receiving jobs from an OSG Hosted CE? Consult our status page for Hosted CE outages. If there isn't an outage, you need help with setup, or otherwise have questions, contact us .","title":"Request a Hosted CE"},{"location":"compute-element/hosted-ce/#requesting-an-osg-hosted-ce","text":"An OSG Hosted Compute Entrypoint (CE) is the entry point for resource requests coming from the OSG; it handles authorization and delegation of resource requests to your existing campus HPC/HTC cluster. Many sites set up their compute entrypoint locally. As an alternative, OSG offers a no-cost Hosted CE option wherein the OSG team will host and operate the HTCondor Compute Entrypoint, and configure it for the communities that you choose to support. This document explains the requirements and the procedure for requesting an OSG Hosted CE. Running more than 10,000 resource requests The Hosted CE can support thousands of concurrent resource request submissions. If you wish to run your own local compute entrypoint or expect to support more than 10,000 concurrently running OSG resource requests, see this page for installing the HTCondor-CE.","title":"Requesting an OSG Hosted CE"},{"location":"compute-element/hosted-ce/#before-starting","text":"Before preparing your cluster for OSG resource requests, consider the following requirements: An existing compute cluster with a supported batch system running on a supported operating system Outbound network connectivity from the worker nodes (they can be behind NAT) One or more Unix accounts on your cluster's submit server with the following capabilities: Accessible via SSH key Use of SSH remote port forwarding ( AllowTcpForwarding yes ) and SSH multiplexing ( MaxSessions 10 or greater) Permission to submit jobs to your local cluster. Shared user home directories between the submit server and the worker nodes. Not required for HTCondor clusters: see this section for more details. Temporary scratch space on each worker node; site administrators should ensure that files in this directory are regularly cleaned out. OSG resource contributors must inform the OSG of any relevant changes to their site. Site downtimes For an improved turnaround time regarding an outage or downtime at your site, contact us and include downtime in the subject or body of the email. For additional technical details, please consult the reference section below. Don't meet the requirements? If your site does not meet these conditions, please contact us to discuss your options for contributing to the OSG.","title":"Before Starting"},{"location":"compute-element/hosted-ce/#scheduling-a-planning-consultation","text":"Before participating in the OSG, either as a computational resource contributor or consumer, we ask that you contact us to set up a consultation. During this consultation, OSG staff will introduce you and your team to the OSG and develop a plan to meet your resource contribution and/or research goals.","title":"Scheduling a Planning Consultation"},{"location":"compute-element/hosted-ce/#preparing-your-local-cluster","text":"After the consultation, ensure that your local cluster meets the requirements as outlined above . In particular, you should now know which accounts to create for the communities that you wish to serve at your cluster. Also consider the size and number of jobs that the OSG should send to your site (e.g., number of cores, memory, GPUs, walltime) as well as their scheduling policy (e.g. preemptible backfill partitions). Additionally, OSG staff may have directed you to follow installation instructions from one or more of the following sections:","title":"Preparing Your Local Cluster"},{"location":"compute-element/hosted-ce/#recommended-providing-access-to-cvmfs","text":"Maximize resource utilization; required for GPU support Installing CVMFS on your cluster makes your resources more attractive to OSG user jobs! Additionally, if you plan to contribute GPUs to the OSG, installation of CVMFS is required . Many users in the OSG make of use software modules and/or containers provided by their collaborations or by the OSG Research Facilitation team. In order to support these users without having to install specific software modules on your cluster, you may provide a distributed software repository system called CernVM File System (CVMFS). In order to provide CVMFS at your site, you will need the following: A cluster-wide Frontier Squid proxy service with at least 50GB of cache space; installation instructions for Frontier Squid are provided here . A local CVMFS cache per worker node (10 GB minimum, 20 GB recommended) After setting up the Frontier Squid proxy and worker node local caches, install CVMFS on each worker node.","title":"(Recommended) Providing access to CVMFS"},{"location":"compute-element/hosted-ce/#htcondor-clusters-only-installing-the-osg-worker-node-client","text":"Skip this section if you have CVMFS or shared home directories! If you have CVMFS installed or shared home directories on your worker nodes, you can skip manual installation of the OSG Worker Node Client. All OSG sites need to provide the OSG Worker Node Client on each worker node in their local cluster. This is normally handled by OSG staff for a Hosted CE but that requires shared home directories across the cluster. However, for sites with an HTCondor batch system, often there is no shared filesystem set up. If you run an HTCondor site and it is easier to install and maintain the Worker Node Client on each worker node than to install CVMFS or maintain shared file system, you have the following options: Install the Worker Node Client from RPM Install the Worker Node Client from tarball","title":"(HTCondor clusters only) Installing the OSG Worker Node Client"},{"location":"compute-element/hosted-ce/#requesting-an-osg-hosted-ce_1","text":"After preparing your local cluster, apply for a Hosted CE by filling out the cluster integration questionnaire. Your answers will help our operators submit resource requests to your local cluster of the appropriate size and scale. Cluster Integration Questionnaire Can I change my answers at a later date? Yes! If you want the OSG to change the size (i.e. CPU, RAM), type (e.g., GPU requests), or number of resource requests, contact us with the FQDN of your login host and the details of your changes.","title":"Requesting an OSG Hosted CE"},{"location":"compute-element/hosted-ce/#finalizing-installation","text":"After applying for an OSG Hosted CE, our staff will contact you with the following information: IP ranges of OSG hosted services Public SSH key to be installed in the OSG accounts Once this is done, OSG staff will work with you and your team to begin submitting resource requests to your site, first with some tests, then with a steady ramp-up to full production.","title":"Finalizing Installation"},{"location":"compute-element/hosted-ce/#validating-contributions","text":"In addition to any internal validation processes that you may have, the OSG provides monitoring to view which communities and projects within said communities are accessing your site, their fields of science, and home institution. Below is an example of the monitoring views that will be available for your cluster. To view your contributions, select your site from the Facility dropdown of the Payload job summary dashboard. Note that accounting data may take up to 24 hours to display.","title":"Validating contributions"},{"location":"compute-element/hosted-ce/#reference","text":"","title":"Reference"},{"location":"compute-element/hosted-ce/#user-accounts","text":"Each resource pool in the OSG Consortium that uses Hosted CEs is mapped to your site as a fixed, specific account; we request the account names are of the form osg01 through osg20 . The mappings from Unix usernames to resource pools are as follows: Username Pool Supported Research osg01 OSPool Projects (primarily single PI) supported directly by the OSG organization osg02 GLOW Projects coming from the Center for High Throughput Computing at the University of Wisconsin-Madison osg03 HCC Projects coming from the Holland Computing Center at the University of Nebraska\u2013Lincoln osg04 CMS High-energy physics experiment from the Large Hadron Collider at CERN osg05 Fermilab Experiments from the Fermi National Accelerator Laboratory osg07 IGWN Gravitational wave detection experiments osg08 IGWN Gravitational wave detection experiments osg09 ATLAS High-energy physics experiment from the Large Hadron Collider at CERN osg10 GlueX Study of quark and gluon degrees of freedom in hadrons using high-energy photons osg11 DUNE Experiment for neutrino science and proton decay studies osg12 IceCube Research based on data from the IceCube neutrino detector osg13 XENON Dark matter search experiment osg14 JLab Experiments from the Thomas Jefferson National Accelerator Facility osg15 - osg20 - Unassigned For example, the activities in your batch system corresponding to the user osg02 will always be associated with the GLOW resource pool.","title":"User accounts"},{"location":"compute-element/hosted-ce/#security","text":"OSG takes multiple precautions to maintain security and prevent unauthorized usage of resources: Access to the OSG system with SSH keys are restricted to the OSG staff maintaining them Users are carefully vetted before they are allowed to submit jobs to OSG Jobs running through OSG can be traced back to the user that submitted them Job submission can quickly be disabled if needed Our security team is readily contactable in case of an emergency: https://osg-htc.org/security/#reporting-a-security-incident","title":"Security"},{"location":"compute-element/hosted-ce/#how-to-get-help","text":"Is your site not receiving jobs from an OSG Hosted CE? Consult our status page for Hosted CE outages. If there isn't an outage, you need help with setup, or otherwise have questions, contact us .","title":"How to Get Help"},{"location":"compute-element/htcondor-ce-overview/","text":"HTCondor-CE Overview \u00b6 This document serves as an introduction to HTCondor-CE and how it works. Before continuing with the overview, make sure that you are familiar with the following concepts: An OSG site plan What is a batch system and which one will you use ( HTCondor , PBS, LSF, SGE, or SLURM )? Security via host certificates to authenticate servers and bearer tokens to authenticate clients Pilot jobs, frontends, and factories (i.e., GlideinWMS , AutoPyFactory) What is a Compute Entrypoint? \u00b6 An OSG Compute Entrypoint (CE) is the door for remote organizations to submit requests to temporarily allocate local compute resources. At the heart of the CE is the job gateway software, which is responsible for handling incoming jobs, authenticating and authorizing them, and delegating them to your batch system for execution. Most jobs that arrive at a CE (here referred to as \"CE jobs\") are not end-user jobs, but rather pilot jobs submitted from factories. Successful pilot jobs create and make available an environment for actual end-user jobs to match and ultimately run within the pilot job container. Eventually pilot jobs remove themselves, typically after a period of inactivity. Note The Compute Entrypoint was previously known as the \"Compute Element\". What is HTCondor-CE? \u00b6 HTCondor-CE is a special configuration of the HTCondor software designed to be a job gateway solution for the OSG Fabric of Services. It is configured to use the JobRouter daemon to delegate jobs by transforming and submitting them to the site\u2019s batch system. Benefits of running the HTCondor-CE: Scalability: HTCondor-CE is capable of supporting job workloads of large sites Debugging tools: HTCondor-CE offers many tools to help troubleshoot issues with jobs Routing as configuration: HTCondor-CE\u2019s mechanism to transform and submit jobs is customized via configuration variables, which means that customizations will persist across upgrades and will not involve modification of software internals to route jobs How CE Jobs Run \u00b6 Once an incoming CE job is authorized, it is placed into HTCondor-CE\u2019s scheduler where the JobRouter creates a transformed copy (called the routed job ) and submits the copy to the batch system (called the batch system job ). After submission, HTCondor-CE monitors the batch system job and communicates its status to the original CE job, which in turn notifies the original submitter (e.g., job factory) of any updates. When the job completes, files are transferred along the same chain: from the batch system to the CE, then from the CE to the original submitter. Hosted CE over SSH \u00b6 The Hosted CE is intended for small sites or as an introduction to providing capacity to collaborations. OSG staff configure and maintain an HTCondor-CE on behalf of the site. The Hosted CE is a special configuration of HTCondor-CE that can submit jobs to a remote cluster over SSH. It provides a simple starting point for opportunistic resource owners that want to start contributing capacity with minimal effort: an organization will be able to accept CE jobs by allowing SSH access to a login node in their cluster. If your site intends to run over 10,000 concurrent CE jobs, you will need to host your own HTCondor-CE because the Hosted CE has not yet been optimized for such loads. If you are interested in a Hosted CE solution, please follow the instructions on this page . On HTCondor batch systems \u00b6 For a site with an HTCondor batch system , the JobRouter can use HTCondor protocols to place a transformed copy of the CE job directly into the batch system\u2019s scheduler, meaning that the routed and batch system jobs are one and the same. Thus, there are three representations of your job, each with its own ID (see diagram below): Access point: the HTCondor job ID in the original queue HTCondor-CE: the incoming CE job\u2019s ID HTCondor batch system: the routed job\u2019s ID In an HTCondor-CE/HTCondor setup, files are transferred from HTCondor-CE\u2019s spool directory to the batch system\u2019s spool directory using internal HTCondor protocols. Note The JobRouter copies the job directly into the batch system and does not make use of condor_submit . This means that if the HTCondor batch system is configured to add attributes to incoming jobs when they are submitted (i.e., SUBMIT_EXPRS ), these attributes will not be added to the routed jobs. On other batch systems \u00b6 For non-HTCondor batch systems, the JobRouter transforms the CE job into a routed job on the CE and the routed job submits a job into the batch system via a process called the BLAHP. Thus, there are four representations of your job, each with its own ID (see diagram below): Login node: the HTCondor job ID in the original queue HTCondor-CE: the incoming CE job\u2019s ID and the routed job\u2019s ID HTCondor batch system: the batch system\u2019s job ID Although the following figure specifies the PBS case, it applies to all non-HTCondor batch systems: With non-HTCondor batch systems, HTCondor-CE cannot use internal HTCondor protocols to transfer files so its spool directory must be exported to a shared file system that is mounted on the batch system\u2019s worker nodes. How the CE is Customized \u00b6 Aside from the basic configuration required in the CE installation, there are two main ways to customize your CE (if you decide any customization is required at all): Deciding which collaborations are allowed to run at your site: collaborations will submit resource allocation requests to your CE using bearer tokens, and you can configure which collaboration's tokens you are willing to accept. How to filter and transform the CE jobs to be run on your batch system: Filtering and transforming CE jobs (i.e., setting site-specific attributes or resource limits), requires configuration of your site\u2019s job routes. For examples of common job routes, consult the JobRouter recipes page. Note If you are running HTCondor as your batch system, you will have two HTCondor configurations side-by-side (one residing in /etc/condor/ and the other in /etc/condor-ce ) and will need to make sure to differentiate the two when editing any configuration. How Security Works \u00b6 Among OSG services, communication is secured between various parties using a combination of PKI infrastructure involving Certificate Authorities (CAs) and bearer tokens. Services such as a Compute Entrypoint, present host certificates to prove their identity to clients, much like your browser verifies websites that you may visit. And to use these services, clients present bearer tokens declaring their association with a given collaboration and what permissions the collaboration has given the client. In turn, the service may be configured to authorize the client based on their collaboration. Next steps \u00b6 Once the basic installation is done, additional activities include: Setting up job routes to customize incoming jobs Submitting jobs to a HTCondor-CE Troubleshooting the HTCondor-CE Register the CE Register with the OSG GlideinWMS factories and/or the ATLAS AutoPyFactory","title":"HTCondor-CE Overview"},{"location":"compute-element/htcondor-ce-overview/#htcondor-ce-overview","text":"This document serves as an introduction to HTCondor-CE and how it works. Before continuing with the overview, make sure that you are familiar with the following concepts: An OSG site plan What is a batch system and which one will you use ( HTCondor , PBS, LSF, SGE, or SLURM )? Security via host certificates to authenticate servers and bearer tokens to authenticate clients Pilot jobs, frontends, and factories (i.e., GlideinWMS , AutoPyFactory)","title":"HTCondor-CE Overview"},{"location":"compute-element/htcondor-ce-overview/#what-is-a-compute-entrypoint","text":"An OSG Compute Entrypoint (CE) is the door for remote organizations to submit requests to temporarily allocate local compute resources. At the heart of the CE is the job gateway software, which is responsible for handling incoming jobs, authenticating and authorizing them, and delegating them to your batch system for execution. Most jobs that arrive at a CE (here referred to as \"CE jobs\") are not end-user jobs, but rather pilot jobs submitted from factories. Successful pilot jobs create and make available an environment for actual end-user jobs to match and ultimately run within the pilot job container. Eventually pilot jobs remove themselves, typically after a period of inactivity. Note The Compute Entrypoint was previously known as the \"Compute Element\".","title":"What is a Compute Entrypoint?"},{"location":"compute-element/htcondor-ce-overview/#what-is-htcondor-ce","text":"HTCondor-CE is a special configuration of the HTCondor software designed to be a job gateway solution for the OSG Fabric of Services. It is configured to use the JobRouter daemon to delegate jobs by transforming and submitting them to the site\u2019s batch system. Benefits of running the HTCondor-CE: Scalability: HTCondor-CE is capable of supporting job workloads of large sites Debugging tools: HTCondor-CE offers many tools to help troubleshoot issues with jobs Routing as configuration: HTCondor-CE\u2019s mechanism to transform and submit jobs is customized via configuration variables, which means that customizations will persist across upgrades and will not involve modification of software internals to route jobs","title":"What is HTCondor-CE?"},{"location":"compute-element/htcondor-ce-overview/#how-ce-jobs-run","text":"Once an incoming CE job is authorized, it is placed into HTCondor-CE\u2019s scheduler where the JobRouter creates a transformed copy (called the routed job ) and submits the copy to the batch system (called the batch system job ). After submission, HTCondor-CE monitors the batch system job and communicates its status to the original CE job, which in turn notifies the original submitter (e.g., job factory) of any updates. When the job completes, files are transferred along the same chain: from the batch system to the CE, then from the CE to the original submitter.","title":"How CE Jobs Run"},{"location":"compute-element/htcondor-ce-overview/#hosted-ce-over-ssh","text":"The Hosted CE is intended for small sites or as an introduction to providing capacity to collaborations. OSG staff configure and maintain an HTCondor-CE on behalf of the site. The Hosted CE is a special configuration of HTCondor-CE that can submit jobs to a remote cluster over SSH. It provides a simple starting point for opportunistic resource owners that want to start contributing capacity with minimal effort: an organization will be able to accept CE jobs by allowing SSH access to a login node in their cluster. If your site intends to run over 10,000 concurrent CE jobs, you will need to host your own HTCondor-CE because the Hosted CE has not yet been optimized for such loads. If you are interested in a Hosted CE solution, please follow the instructions on this page .","title":"Hosted CE over SSH"},{"location":"compute-element/htcondor-ce-overview/#on-htcondor-batch-systems","text":"For a site with an HTCondor batch system , the JobRouter can use HTCondor protocols to place a transformed copy of the CE job directly into the batch system\u2019s scheduler, meaning that the routed and batch system jobs are one and the same. Thus, there are three representations of your job, each with its own ID (see diagram below): Access point: the HTCondor job ID in the original queue HTCondor-CE: the incoming CE job\u2019s ID HTCondor batch system: the routed job\u2019s ID In an HTCondor-CE/HTCondor setup, files are transferred from HTCondor-CE\u2019s spool directory to the batch system\u2019s spool directory using internal HTCondor protocols. Note The JobRouter copies the job directly into the batch system and does not make use of condor_submit . This means that if the HTCondor batch system is configured to add attributes to incoming jobs when they are submitted (i.e., SUBMIT_EXPRS ), these attributes will not be added to the routed jobs.","title":"On HTCondor batch systems"},{"location":"compute-element/htcondor-ce-overview/#on-other-batch-systems","text":"For non-HTCondor batch systems, the JobRouter transforms the CE job into a routed job on the CE and the routed job submits a job into the batch system via a process called the BLAHP. Thus, there are four representations of your job, each with its own ID (see diagram below): Login node: the HTCondor job ID in the original queue HTCondor-CE: the incoming CE job\u2019s ID and the routed job\u2019s ID HTCondor batch system: the batch system\u2019s job ID Although the following figure specifies the PBS case, it applies to all non-HTCondor batch systems: With non-HTCondor batch systems, HTCondor-CE cannot use internal HTCondor protocols to transfer files so its spool directory must be exported to a shared file system that is mounted on the batch system\u2019s worker nodes.","title":"On other batch systems"},{"location":"compute-element/htcondor-ce-overview/#how-the-ce-is-customized","text":"Aside from the basic configuration required in the CE installation, there are two main ways to customize your CE (if you decide any customization is required at all): Deciding which collaborations are allowed to run at your site: collaborations will submit resource allocation requests to your CE using bearer tokens, and you can configure which collaboration's tokens you are willing to accept. How to filter and transform the CE jobs to be run on your batch system: Filtering and transforming CE jobs (i.e., setting site-specific attributes or resource limits), requires configuration of your site\u2019s job routes. For examples of common job routes, consult the JobRouter recipes page. Note If you are running HTCondor as your batch system, you will have two HTCondor configurations side-by-side (one residing in /etc/condor/ and the other in /etc/condor-ce ) and will need to make sure to differentiate the two when editing any configuration.","title":"How the CE is Customized"},{"location":"compute-element/htcondor-ce-overview/#how-security-works","text":"Among OSG services, communication is secured between various parties using a combination of PKI infrastructure involving Certificate Authorities (CAs) and bearer tokens. Services such as a Compute Entrypoint, present host certificates to prove their identity to clients, much like your browser verifies websites that you may visit. And to use these services, clients present bearer tokens declaring their association with a given collaboration and what permissions the collaboration has given the client. In turn, the service may be configured to authorize the client based on their collaboration.","title":"How Security Works"},{"location":"compute-element/htcondor-ce-overview/#next-steps","text":"Once the basic installation is done, additional activities include: Setting up job routes to customize incoming jobs Submitting jobs to a HTCondor-CE Troubleshooting the HTCondor-CE Register the CE Register with the OSG GlideinWMS factories and/or the ATLAS AutoPyFactory","title":"Next steps"},{"location":"compute-element/install-htcondor-ce/","text":"Installing and Maintaining HTCondor-CE \u00b6 The HTCondor-CE software is a job gateway for an OSG Compute Entrypoint (CE). As such, the OSG will submit resource allocation requests (RARs) jobs to your HTCondor-CE and it will handle authorization and delegation of RARs to your local batch system. In OSG today, RARs are sent to CEs as pilot jobs from a factory, which in turn are able to accept and run end-user jobs. See the upstream documentation for a more detailed introduction. Use this page to learn how to install, configure, run, test, and troubleshoot an OSG HTCondor-CE. OSG Hosted CE Unless you plan on running more than 10k concurrently running RARs or plan on making frequent configuration changes, we suggest requesting an OSG Hosted CE . Note If you are installing an HTCondor-CE for use outside of the OSG, consult the upstream documentation instead. Before Starting \u00b6 Before starting the installation process, consider the following points, consulting the upstream references as needed ( HTCondor-CE 23 ): User IDs: If they do not exist already, the installation will create the Linux users condor (UID 4716) and gratia You will also need to create Unix accounts for each collaboration that you wish to support. See details in the 'Configuring authentication' section below . SSL certificate: The HTCondor-CE service uses a host certificate and an accompanying key. If using a Let's Encrypt cert, install these as /etc/pki/tls/certs/localhost.crt and /etc/pki/tls/private/localhost.key If using an IGTF cert, install these as /etc/grid-security/hostcert.pem and /etc/grid-security/hostkey.pem See details in the Host Certificates overview . DNS entries: Forward and reverse DNS must resolve for the HTCondor-CE host Network ports: The pilot factories must be able to contact your HTCondor-CE service on port 9619 (TCP) Access point/login node: HTCondor-CE should be installed on a host that already has the ability to submit jobs into your local cluster File Systems : Non-HTCondor batch systems require a shared file system between the HTCondor-CE host and the batch system worker nodes. As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Install the appropriate EPEL and OSG Yum repositories for your operating system. Obtain root access to the host Install CA certificates Installing HTCondor-CE \u00b6 An HTCondor-CE installation consists of the job gateway (i.e., the HTCondor-CE job router) and other support software (e.g., osg-configure , a Gratia probe for OSG accounting). To simplify installation, OSG provides convenience RPMs that install all required software. Clean yum cache: root@host # yum clean all --enablerepo = * Update software: root@host # yum update This command will update all packages (Optional) If your batch system is already installed via non-RPM means and is in the following list, install the appropriate 'empty' RPM. Otherwise, skip to the next step. If your batch system is\u2026 Then run the following command\u2026 HTCondor yum install empty-condor --enablerepo=osg-empty SLURM yum install empty-slurm --enablerepo=osg-empty (Optional) If your HTCondor batch system is already installed via non-OSG RPM means, add the line below to /etc/yum.repos.d/osg.repo . Otherwise, skip to the next step. exclude=condor Select the appropriate convenience RPM: If your batch system is... Then use the following package... HTCondor osg-ce-condor LSF osg-ce-lsf PBS osg-ce-pbs SGE osg-ce-sge SLURM osg-ce-slurm Install the CE software where is the package you selected in the above step.: root@host # yum install Configuring HTCondor-CE \u00b6 There are a few required configuration steps to connect HTCondor-CE with your batch system and authentication method. For more advanced configuration, see the section on optional configurations . Configuring the local batch system \u00b6 To configure HTCondor-CE to integrate with your local batch system, please refer to the upstream documentation . Configuring authentication \u00b6 HTCondor-CE clients will submit RARs accompanied by bearer tokens declaring their association with a given collaboration and what permissions the collaboration has given the client The osg-scitokens-mapfile , pulled in by the osg-ce package, provides default token to local user mappings. To accept RARs from a particular collaboration: Create the Unix account(s) corresponding to the last field in the default mapfile: /usr/share/condor-ce/mapfiles.d/osg-scitokens-mapfile.conf . For example, to add support for the OSPool, create the osg user account on the CE and across your cluster. (Optional) if you wish to change the user mapping, copy the relevant mapping from /usr/share/condor-ce/mapfiles.d/osg-scitokens-mapfile.conf to a .conf file in /etc/condor-ce/mapfiles.d/ and change the last field to the desired username. For example, if you wish to add support for the OSPool but prefer to map OSPool pilot jobs to the osgpilot account that you created on your CE and across your cluster, you could add the following to /etc/condor-ce/mapfiles.d/50-ospool.conf : # OSG SCITOKENS /^https\\:\\/\\/scitokens\\.org\\/osg\\-connect,/ osgpilot For more details of the mapfile format, consult the \"SciTokens\" section of the upstream documentation . Bannning a collaboration \u00b6 Implicit banning Note that if you have not created the mapped user per the above section , it is not strictly necessary to add a ban mapping. HTCondor-CE will only authenticate remote RAR submission for the relevant credential if the Unix user exists. To explicitly ban a remote submitter from your HTCondor-CE, add a line like the following to a file in /etc/condor-ce/mapfiles.d/*.conf : SCITOKENS /,/ @banned.htcondor.org Replacing with a regular expression and with an arbitrary user name. For example, to ban OSPool pilots from your site, you could add the following to /etc/condor-ce/config.d/99-bans.conf : SCITOKENS /^https\\:\\/\\/scitokens\\.org\\/osg\\-connect,/ osgpilot@banned.htcondor.org Automatic configuration \u00b6 The OSG CE metapackage brings along a configuration tool, osg-configure , that is designed to automatically configure the different pieces of software required for an OSG HTCondor-CE: Enable your batch system in the HTCondor-CE configuration by editing the enabled field in the /etc/osg/config.d/20-.ini : enabled = True Read through the other .ini files in the /etc/osg/config.d directory and make any necessary changes. See the osg-configure documentation for details. Validate the configuration settings root@host # osg-configure -v Fix any errors (at least) that osg-configure reports. Once the validation command succeeds without errors, apply the configuration settings: root@host # osg-configure -c Optional configuration \u00b6 In addition to the configurations above, you may need to further configure how pilot jobs are filtered and transformed before they are submitted to your local batch system or otherwise change the behavior of your CE. For detailed instructions, please refer to the upstream documentation: Configuring the Job Router Optional configuration Accounting with multiple CEs or local user jobs \u00b6 Note For non-HTCondor batch systems only If your site has multiple CEs or you have local users submitting to the same local batch system, the OSG accounting software needs to be configured so that it doesn't over report the number of jobs. Modify the value of SuppressNoDNRecords in /etc/gratia/htcondor-ce/ProbeConfig on each of your CE's so that it reads: SuppressNoDNRecords=\"1\" Starting and Validating HTCondor-CE \u00b6 For information on how to start and validate the core HTCondor-CE services, please refer to the upstream documentation Troubleshooting HTCondor-CE \u00b6 For information on how to troubleshoot your HTCondor-CE, please refer to the upstream documentation: Common issues Debugging tools Helpful logs Registering the CE \u00b6 To contribute capacity, your CE must be registered with the OSG Consortium . To register your resource: Identify the facility, site, and resource group where your HTCondor-CE is hosted. For example, the Center for High Throughput Computing at the University of Wisconsin-Madison uses the following information: Facility: University of Wisconsin Site: CHTC Resource Group: CHTC Using the above information, create or update the appropriate YAML file, using this template as a guide. Getting Help \u00b6 To get assistance, please use the this page .","title":"Install HTCondor-CE"},{"location":"compute-element/install-htcondor-ce/#installing-and-maintaining-htcondor-ce","text":"The HTCondor-CE software is a job gateway for an OSG Compute Entrypoint (CE). As such, the OSG will submit resource allocation requests (RARs) jobs to your HTCondor-CE and it will handle authorization and delegation of RARs to your local batch system. In OSG today, RARs are sent to CEs as pilot jobs from a factory, which in turn are able to accept and run end-user jobs. See the upstream documentation for a more detailed introduction. Use this page to learn how to install, configure, run, test, and troubleshoot an OSG HTCondor-CE. OSG Hosted CE Unless you plan on running more than 10k concurrently running RARs or plan on making frequent configuration changes, we suggest requesting an OSG Hosted CE . Note If you are installing an HTCondor-CE for use outside of the OSG, consult the upstream documentation instead.","title":"Installing and Maintaining HTCondor-CE"},{"location":"compute-element/install-htcondor-ce/#before-starting","text":"Before starting the installation process, consider the following points, consulting the upstream references as needed ( HTCondor-CE 23 ): User IDs: If they do not exist already, the installation will create the Linux users condor (UID 4716) and gratia You will also need to create Unix accounts for each collaboration that you wish to support. See details in the 'Configuring authentication' section below . SSL certificate: The HTCondor-CE service uses a host certificate and an accompanying key. If using a Let's Encrypt cert, install these as /etc/pki/tls/certs/localhost.crt and /etc/pki/tls/private/localhost.key If using an IGTF cert, install these as /etc/grid-security/hostcert.pem and /etc/grid-security/hostkey.pem See details in the Host Certificates overview . DNS entries: Forward and reverse DNS must resolve for the HTCondor-CE host Network ports: The pilot factories must be able to contact your HTCondor-CE service on port 9619 (TCP) Access point/login node: HTCondor-CE should be installed on a host that already has the ability to submit jobs into your local cluster File Systems : Non-HTCondor batch systems require a shared file system between the HTCondor-CE host and the batch system worker nodes. As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Install the appropriate EPEL and OSG Yum repositories for your operating system. Obtain root access to the host Install CA certificates","title":"Before Starting"},{"location":"compute-element/install-htcondor-ce/#installing-htcondor-ce","text":"An HTCondor-CE installation consists of the job gateway (i.e., the HTCondor-CE job router) and other support software (e.g., osg-configure , a Gratia probe for OSG accounting). To simplify installation, OSG provides convenience RPMs that install all required software. Clean yum cache: root@host # yum clean all --enablerepo = * Update software: root@host # yum update This command will update all packages (Optional) If your batch system is already installed via non-RPM means and is in the following list, install the appropriate 'empty' RPM. Otherwise, skip to the next step. If your batch system is\u2026 Then run the following command\u2026 HTCondor yum install empty-condor --enablerepo=osg-empty SLURM yum install empty-slurm --enablerepo=osg-empty (Optional) If your HTCondor batch system is already installed via non-OSG RPM means, add the line below to /etc/yum.repos.d/osg.repo . Otherwise, skip to the next step. exclude=condor Select the appropriate convenience RPM: If your batch system is... Then use the following package... HTCondor osg-ce-condor LSF osg-ce-lsf PBS osg-ce-pbs SGE osg-ce-sge SLURM osg-ce-slurm Install the CE software where is the package you selected in the above step.: root@host # yum install ","title":"Installing HTCondor-CE"},{"location":"compute-element/install-htcondor-ce/#configuring-htcondor-ce","text":"There are a few required configuration steps to connect HTCondor-CE with your batch system and authentication method. For more advanced configuration, see the section on optional configurations .","title":"Configuring HTCondor-CE"},{"location":"compute-element/install-htcondor-ce/#configuring-the-local-batch-system","text":"To configure HTCondor-CE to integrate with your local batch system, please refer to the upstream documentation .","title":"Configuring the local batch system"},{"location":"compute-element/install-htcondor-ce/#configuring-authentication","text":"HTCondor-CE clients will submit RARs accompanied by bearer tokens declaring their association with a given collaboration and what permissions the collaboration has given the client The osg-scitokens-mapfile , pulled in by the osg-ce package, provides default token to local user mappings. To accept RARs from a particular collaboration: Create the Unix account(s) corresponding to the last field in the default mapfile: /usr/share/condor-ce/mapfiles.d/osg-scitokens-mapfile.conf . For example, to add support for the OSPool, create the osg user account on the CE and across your cluster. (Optional) if you wish to change the user mapping, copy the relevant mapping from /usr/share/condor-ce/mapfiles.d/osg-scitokens-mapfile.conf to a .conf file in /etc/condor-ce/mapfiles.d/ and change the last field to the desired username. For example, if you wish to add support for the OSPool but prefer to map OSPool pilot jobs to the osgpilot account that you created on your CE and across your cluster, you could add the following to /etc/condor-ce/mapfiles.d/50-ospool.conf : # OSG SCITOKENS /^https\\:\\/\\/scitokens\\.org\\/osg\\-connect,/ osgpilot For more details of the mapfile format, consult the \"SciTokens\" section of the upstream documentation .","title":"Configuring authentication"},{"location":"compute-element/install-htcondor-ce/#bannning-a-collaboration","text":"Implicit banning Note that if you have not created the mapped user per the above section , it is not strictly necessary to add a ban mapping. HTCondor-CE will only authenticate remote RAR submission for the relevant credential if the Unix user exists. To explicitly ban a remote submitter from your HTCondor-CE, add a line like the following to a file in /etc/condor-ce/mapfiles.d/*.conf : SCITOKENS /,/ @banned.htcondor.org Replacing with a regular expression and with an arbitrary user name. For example, to ban OSPool pilots from your site, you could add the following to /etc/condor-ce/config.d/99-bans.conf : SCITOKENS /^https\\:\\/\\/scitokens\\.org\\/osg\\-connect,/ osgpilot@banned.htcondor.org","title":"Bannning a collaboration"},{"location":"compute-element/install-htcondor-ce/#automatic-configuration","text":"The OSG CE metapackage brings along a configuration tool, osg-configure , that is designed to automatically configure the different pieces of software required for an OSG HTCondor-CE: Enable your batch system in the HTCondor-CE configuration by editing the enabled field in the /etc/osg/config.d/20-.ini : enabled = True Read through the other .ini files in the /etc/osg/config.d directory and make any necessary changes. See the osg-configure documentation for details. Validate the configuration settings root@host # osg-configure -v Fix any errors (at least) that osg-configure reports. Once the validation command succeeds without errors, apply the configuration settings: root@host # osg-configure -c","title":"Automatic configuration"},{"location":"compute-element/install-htcondor-ce/#optional-configuration","text":"In addition to the configurations above, you may need to further configure how pilot jobs are filtered and transformed before they are submitted to your local batch system or otherwise change the behavior of your CE. For detailed instructions, please refer to the upstream documentation: Configuring the Job Router Optional configuration","title":"Optional configuration"},{"location":"compute-element/install-htcondor-ce/#accounting-with-multiple-ces-or-local-user-jobs","text":"Note For non-HTCondor batch systems only If your site has multiple CEs or you have local users submitting to the same local batch system, the OSG accounting software needs to be configured so that it doesn't over report the number of jobs. Modify the value of SuppressNoDNRecords in /etc/gratia/htcondor-ce/ProbeConfig on each of your CE's so that it reads: SuppressNoDNRecords=\"1\"","title":"Accounting with multiple CEs or local user jobs"},{"location":"compute-element/install-htcondor-ce/#starting-and-validating-htcondor-ce","text":"For information on how to start and validate the core HTCondor-CE services, please refer to the upstream documentation","title":"Starting and Validating HTCondor-CE"},{"location":"compute-element/install-htcondor-ce/#troubleshooting-htcondor-ce","text":"For information on how to troubleshoot your HTCondor-CE, please refer to the upstream documentation: Common issues Debugging tools Helpful logs","title":"Troubleshooting HTCondor-CE"},{"location":"compute-element/install-htcondor-ce/#registering-the-ce","text":"To contribute capacity, your CE must be registered with the OSG Consortium . To register your resource: Identify the facility, site, and resource group where your HTCondor-CE is hosted. For example, the Center for High Throughput Computing at the University of Wisconsin-Madison uses the following information: Facility: University of Wisconsin Site: CHTC Resource Group: CHTC Using the above information, create or update the appropriate YAML file, using this template as a guide.","title":"Registering the CE"},{"location":"compute-element/install-htcondor-ce/#getting-help","text":"To get assistance, please use the this page .","title":"Getting Help"},{"location":"compute-element/job-router-recipes/","text":"Up-to-date documentation can be found at https://osg-htc.org/docs/compute-element/install-htcondor-ce/","title":"Job router recipes"},{"location":"compute-element/slurm-recipes/","text":"Slurm Configuration Recipes \u00b6 This document contains examples of common Slurm configurations used by sites to contribute capacity to the OSPool. Contributing X% of Your Cluster \u00b6 To contribute a percentage of your Slurm cluster to the OSPool, set aside a number of whole nodes for a dedicated OSPool partition : Determine the percentage of your cluster that you would like to contribute and use that to calculate the number of cores to meet that percentage Select nodes and sum the number of cores to meet your desired contribution In slurm.conf , configure the NodeName for each type of chassis and assign specific nodes to PartitionName=ospool For example, if your cluster is 5120 cores and you wanted to contribute 10% of the cluster to the OSPool, your slurm.conf could contain the following: # Dell PowerEdge C6525, AMD EPYC 7513 32-Core Processor @ 2.6GHz NodeName=spark-a[002-004,006-028] CPUs=64 Boards=1 SocketsPerBoard=2 CoresPerSocket=32 ThreadsPerCore=1 RealMemory=256000 State=UNKNOWN Features=amd,avx,avx2 # Dell PowerEdge R6525, AMD EPYC 7763 64-Core Processor NodeName=spark-a[029-071,204-206] CPUs=128 Boards=1 SocketsPerBoard=2 CoresPerSocket=64 ThreadsPerCore=1 RealMemory=512000 State=UNKNOWN Features=amd,avx,avx2 # OSPool Partition, -- 10% of Shared is approx 512 cores; 6x64cores + 1x128 cores = 512 PartitionName=ospool State=UP Nodes=spark-a[002-004,006-008,029] DefaultTime=0-04:00:00 MaxTime=1-00:00:00 PreemptMode=OFF Priority=50 AllowGroups=slurm-admin,osg01","title":"Slurm recipes"},{"location":"compute-element/slurm-recipes/#slurm-configuration-recipes","text":"This document contains examples of common Slurm configurations used by sites to contribute capacity to the OSPool.","title":"Slurm Configuration Recipes"},{"location":"compute-element/slurm-recipes/#contributing-x-of-your-cluster","text":"To contribute a percentage of your Slurm cluster to the OSPool, set aside a number of whole nodes for a dedicated OSPool partition : Determine the percentage of your cluster that you would like to contribute and use that to calculate the number of cores to meet that percentage Select nodes and sum the number of cores to meet your desired contribution In slurm.conf , configure the NodeName for each type of chassis and assign specific nodes to PartitionName=ospool For example, if your cluster is 5120 cores and you wanted to contribute 10% of the cluster to the OSPool, your slurm.conf could contain the following: # Dell PowerEdge C6525, AMD EPYC 7513 32-Core Processor @ 2.6GHz NodeName=spark-a[002-004,006-028] CPUs=64 Boards=1 SocketsPerBoard=2 CoresPerSocket=32 ThreadsPerCore=1 RealMemory=256000 State=UNKNOWN Features=amd,avx,avx2 # Dell PowerEdge R6525, AMD EPYC 7763 64-Core Processor NodeName=spark-a[029-071,204-206] CPUs=128 Boards=1 SocketsPerBoard=2 CoresPerSocket=64 ThreadsPerCore=1 RealMemory=512000 State=UNKNOWN Features=amd,avx,avx2 # OSPool Partition, -- 10% of Shared is approx 512 cores; 6x64cores + 1x128 cores = 512 PartitionName=ospool State=UP Nodes=spark-a[002-004,006-008,029] DefaultTime=0-04:00:00 MaxTime=1-00:00:00 PreemptMode=OFF Priority=50 AllowGroups=slurm-admin,osg01","title":"Contributing X% of Your Cluster"},{"location":"compute-element/submit-htcondor-ce/","text":"Up-to-date documentation can be found at https://osg-htc.org/docs/compute-element/install-htcondor-ce/","title":"Submit htcondor ce"},{"location":"compute-element/troubleshoot-htcondor-ce/","text":"Up-to-date documentation can be found at https://osg-htc.org/docs/compute-element/install-htcondor-ce/","title":"Troubleshoot htcondor ce"},{"location":"data/external-oasis-repos/","text":"Install an OASIS Repository \u00b6 OASIS (the OSG A pplication S oftware I nstallation S ervice) is an infrastructure, based on CVMFS , for distributing software throughout the OSG. Once software is installed into an OASIS repository, the goal is to make it available across about 90% of the OSG within an hour. OASIS consists of keysigning infrastructure, a content distribution network (CDN), and a shared CVMFS repository that is hosted by the OSG. Many use cases will be covered by utilizing the shared repository ; this document covers how to install, configure, and host your own CVMFS repository server . This server will distribute software via OASIS, but will be hosted and operated externally from the OSG project. OASIS-based distribution and key signing is available to OSG VOs or repositories affiliated with an OSG VO. See the policy page for more information on what repositories OSG is willing to distribute. Before Starting \u00b6 The host OS must be: RHEL7 or RHEL8 (or equivalent). Additionally, User IDs: If it does not exist already, the installation will create the cvmfs Linux user Group IDs: If they do not exist already, the installation will create the Linux groups cvmfs and fuse Network ports: This page will configure the repository to distribute using Apache HTTPD on port 8000. At the minimum, the repository needs in-bound access from the OASIS CDN. Disk space: This host will need enough free disk space to host two copies of the software: one compressed and one uncompressed. /srv/cvmfs will hold all the published data (compressed and de-deuplicated). The /var/spool/cvmfs directory will contain all the data in all current transactions (uncompressed). Root access will be needed to install. Installation of software into the repository itself will be done as an unprivileged user. Yum will need to be configured to use the OSG repositories . Overlay-FS limitations CVMFS on RHEL7 only supports Overlay-FS if the underlying filesystem is ext3 or ext4 ; make sure /var/spool/cvmfs is one of these filesystem types. If this is not possible, add CVMFS_DONT_CHECK_OVERLAYFS_VERSION=yes to your CVMFS configuration. Using xfs will work if it was created with ftype=1 Installation \u00b6 Installation is a straightforward install via yum : root@host # yum install cvmfs-server osg-oasis Apache and Repository Mounts \u00b6 For all installs, we recommend mounting all the local repositories on startup: root@host # echo \"cvmfs_server mount -a\" >>/etc/rc.local root@host # chmod +x /etc/rc.local The Apache HTTPD service should be configured to listen on port 8000, have the KeepAlive option enabled, and be started: root@host # echo Listen 8000 >>/etc/httpd/conf.d/cvmfs.conf root@host # echo KeepAlive on >>/etc/httpd/conf.d/cvmfs.conf root@host # chkconfig httpd on root@host # service httpd start Check Firewalls Make sure that port 8000 is available to the Internet. Check the setting of the host- and site-level firewalls. The next steps will fail if the web server is not accessible. Creating a Repository \u00b6 Prior to creation, the repository administrator will need to make two decisions: Select a repository name ; typically, this is derived from the VO or project's name and ends in opensciencegrid.org . For example, the NoVA VO runs the repository nova.opensciencegrid.org . For this section, we will use . Select a repository owner : Software publication will need to run by a non- root Unix user account; for this document, we will use as the account name of the repository owner. The initial repository creation must be run as root : root@host # echo -e \"\\*\\\\t\\\\t-\\\\tnofile\\\\t\\\\t16384\" >>/etc/security/limits.conf root@host # ulimit -n 16384 root@host # cvmfs_server mkfs -o root@host # cat >/srv/cvmfs//.htaccess <>/etc/cvmfs/repositories.d//server.conf <>/etc/cvmfs/repositories.d//server.conf </.cvmfswhitelist | cat -v That should print several lines including some gibberish at the end. Hosting a Repository on OASIS \u00b6 In order to host a repository on OASIS, perform the following steps: Verify your VO's registration is up-to-date . All repositories need to be associated with a VO; the VO needs to assign an OASIS manager in Topology who would be responsible for the contents of any of the VO's repositories and will be contacted in case of issues. To designate an OASIS manager, have the VO manager update the Topology registration . Send a message to OSG support using the following template: Please add a new CVMFS repository to OASIS for VO using the URL http://:8000/cvmfs/ The VO responsible manager will be . Replace the items with the appropriate values. If the repository name matches *.opensciencegrid.org or *.osgstorage.org , wait for the go-ahead from the OSG representative before continuing with the remaining instructions; for all other repositories (such as *.egi.eu ), you are done. When you are told in the ticket to proceed to the next step, first if the repository might be in a transaction abort it: root@host # su -c \"cvmfs_server abort \" Then execute the following commands: root@host # wget -O /srv/cvmfs//.cvmfswhitelist \\ http://oasis.opensciencegrid.org/cvmfs//.cvmfswhitelist root@host # cp /etc/cvmfs/keys/opensciencegrid.org/opensciencegrid.org.pub \\ /etc/cvmfs/keys/.pub Replace as appropriate. If the cp command prompts about overwriting an existing file, type 'y'. Verify that publishing operation succeeds: root@host # su -c \"cvmfs_server transaction \" root@host # su -c \"cvmfs_server publish \" Within an hour, the repository updates should appear at the OSG Operations and FNAL Stratum-1 servers. On success, make sure the whitelist update happens daily by creating /etc/cron.d/fetch-cvmfs-whitelist with the following contents: 5 4 * * * cd /srv/cvmfs/ && wget -qO .cvmfswhitelist.new http://oasis.opensciencegrid.org/cvmfs//.cvmfswhitelist && mv .cvmfswhitelist.new .cvmfswhitelist Note This cronjob eliminates the need for the repository service administrator to periodically use cvmfs_server resign to update .cvmfswhitelist as described in the upstream CVMFS documentation. Update the open support ticket to indicate that the previous steps have been completed Once the repository is fully replicated on the OSG, the VO may proceed in publishing into CVMFS using the account on the repository server. Tip We strongly recommend the repository maintainer read through the upstream documentation on maintaining repositories and content limitations . Finally, if the new repository will be used outside of the U.S., the VO should open a GGUS ticket following EGI's PROC20 to get the repository replicated onto worldwide Stratum 1s. Replacing an Existing OASIS Repository Server \u00b6 If a need arises to replace a server for an existing *.opensciencegrid.org or *.osgstorage.org repository, there are two ways to do it: one without changing the DNS name and one with changing it. The latter can take longer because it requires OSG Operations intervention. Revision numbers must increase CVMFS does not allow repository revision numbers to decrease, so the instructions below make sure the revision numbers only go up. Without changing the server DNS name \u00b6 If you are recreating the repository on the same machine, use the following command to remove the repository configuration while preserving the data and keys: root@host # cvmfs_server rmfs -p Otherwise if it is a new machine, copy the keys from /etc/cvmfs/keys/ .* and the data from /srv/cvmfs/ from the old server to the new, making sure that no publish operations happen on the old server while you copy the data. Then in either case use cvmfs_server import instead of cvmfs_server mkfs in the above instructions for Creating the Repository , in order to reuse old data and keys. Note that you wil need to reapply any custom configuration changes under /etc/cvmfs/repositories.d/ ` that was on the old server. If you run an old and a new machine in parallel for a while, make sure that when you put the new machine into production (by moving the DNS name) that the new machine has had at least as many publishes as the old machine, so the revision number does not decrease. With changing the server DNS name \u00b6 Note If you create a repository from scratch, as opposed to copying the data and keys from an old server, it is in fact better to change the DNS name of the server because that causes the OSG Operations server to reinitialize the .cvmfswhitelist. If you create a replacement repository on a new machine from scratch, follow the normal instructions on this page above, but with the following differences in the Hosting a Repository on OASIS section: In step 2, instead of asking in the support ticket to create a new repository, give the new URL and ask them to change the repository registration to that URL. When you do the publish in step 5, add a -n NNNN option where NNNN is a revision number greater than the number on the existing repository. That number can be found by this command on a client machine: user@host $ attr -qg revision /cvmfs/ Skip step 6; there is no need to tell OSG Operations when you are finished. After enough time has elapsed for the publish to propagate to clients, typically around 15 minutes, verify that the new chosen revision has reached a client. Removing a Repository from OASIS \u00b6 In order to remove a repository that is being hosted on OASIS, perform the following steps: If the repository has been replicated outside of the U.S., open a GGUS ticket assigned to support unit \"Software and Data Distribution (CVMFS)\" asking that the replication be removed from EGI Stratum-1s. Remind them in the ticket that there are worldwide Stratum-1s that automatically replicate all OSG repositories that RAL replicates, so those Stratum-1s cannot remove their replicas before RAL does but their administrators will need to be notified to remove their replicas within 8 hours after RAL does to avoid alarms. Wait until this ticket is resolved before proceeding. Open a support ticket asking to shut down the repository, giving the repository name (e.g., ), and the corresponding VO.","title":"Install an OASIS Repo"},{"location":"data/external-oasis-repos/#install-an-oasis-repository","text":"OASIS (the OSG A pplication S oftware I nstallation S ervice) is an infrastructure, based on CVMFS , for distributing software throughout the OSG. Once software is installed into an OASIS repository, the goal is to make it available across about 90% of the OSG within an hour. OASIS consists of keysigning infrastructure, a content distribution network (CDN), and a shared CVMFS repository that is hosted by the OSG. Many use cases will be covered by utilizing the shared repository ; this document covers how to install, configure, and host your own CVMFS repository server . This server will distribute software via OASIS, but will be hosted and operated externally from the OSG project. OASIS-based distribution and key signing is available to OSG VOs or repositories affiliated with an OSG VO. See the policy page for more information on what repositories OSG is willing to distribute.","title":"Install an OASIS Repository"},{"location":"data/external-oasis-repos/#before-starting","text":"The host OS must be: RHEL7 or RHEL8 (or equivalent). Additionally, User IDs: If it does not exist already, the installation will create the cvmfs Linux user Group IDs: If they do not exist already, the installation will create the Linux groups cvmfs and fuse Network ports: This page will configure the repository to distribute using Apache HTTPD on port 8000. At the minimum, the repository needs in-bound access from the OASIS CDN. Disk space: This host will need enough free disk space to host two copies of the software: one compressed and one uncompressed. /srv/cvmfs will hold all the published data (compressed and de-deuplicated). The /var/spool/cvmfs directory will contain all the data in all current transactions (uncompressed). Root access will be needed to install. Installation of software into the repository itself will be done as an unprivileged user. Yum will need to be configured to use the OSG repositories . Overlay-FS limitations CVMFS on RHEL7 only supports Overlay-FS if the underlying filesystem is ext3 or ext4 ; make sure /var/spool/cvmfs is one of these filesystem types. If this is not possible, add CVMFS_DONT_CHECK_OVERLAYFS_VERSION=yes to your CVMFS configuration. Using xfs will work if it was created with ftype=1","title":"Before Starting"},{"location":"data/external-oasis-repos/#installation","text":"Installation is a straightforward install via yum : root@host # yum install cvmfs-server osg-oasis","title":"Installation"},{"location":"data/external-oasis-repos/#apache-and-repository-mounts","text":"For all installs, we recommend mounting all the local repositories on startup: root@host # echo \"cvmfs_server mount -a\" >>/etc/rc.local root@host # chmod +x /etc/rc.local The Apache HTTPD service should be configured to listen on port 8000, have the KeepAlive option enabled, and be started: root@host # echo Listen 8000 >>/etc/httpd/conf.d/cvmfs.conf root@host # echo KeepAlive on >>/etc/httpd/conf.d/cvmfs.conf root@host # chkconfig httpd on root@host # service httpd start Check Firewalls Make sure that port 8000 is available to the Internet. Check the setting of the host- and site-level firewalls. The next steps will fail if the web server is not accessible.","title":"Apache and Repository Mounts"},{"location":"data/external-oasis-repos/#creating-a-repository","text":"Prior to creation, the repository administrator will need to make two decisions: Select a repository name ; typically, this is derived from the VO or project's name and ends in opensciencegrid.org . For example, the NoVA VO runs the repository nova.opensciencegrid.org . For this section, we will use . Select a repository owner : Software publication will need to run by a non- root Unix user account; for this document, we will use as the account name of the repository owner. The initial repository creation must be run as root : root@host # echo -e \"\\*\\\\t\\\\t-\\\\tnofile\\\\t\\\\t16384\" >>/etc/security/limits.conf root@host # ulimit -n 16384 root@host # cvmfs_server mkfs -o root@host # cat >/srv/cvmfs//.htaccess <>/etc/cvmfs/repositories.d//server.conf <>/etc/cvmfs/repositories.d//server.conf </.cvmfswhitelist | cat -v That should print several lines including some gibberish at the end.","title":"Creating a Repository"},{"location":"data/external-oasis-repos/#hosting-a-repository-on-oasis","text":"In order to host a repository on OASIS, perform the following steps: Verify your VO's registration is up-to-date . All repositories need to be associated with a VO; the VO needs to assign an OASIS manager in Topology who would be responsible for the contents of any of the VO's repositories and will be contacted in case of issues. To designate an OASIS manager, have the VO manager update the Topology registration . Send a message to OSG support using the following template: Please add a new CVMFS repository to OASIS for VO using the URL http://:8000/cvmfs/ The VO responsible manager will be . Replace the items with the appropriate values. If the repository name matches *.opensciencegrid.org or *.osgstorage.org , wait for the go-ahead from the OSG representative before continuing with the remaining instructions; for all other repositories (such as *.egi.eu ), you are done. When you are told in the ticket to proceed to the next step, first if the repository might be in a transaction abort it: root@host # su -c \"cvmfs_server abort \" Then execute the following commands: root@host # wget -O /srv/cvmfs//.cvmfswhitelist \\ http://oasis.opensciencegrid.org/cvmfs//.cvmfswhitelist root@host # cp /etc/cvmfs/keys/opensciencegrid.org/opensciencegrid.org.pub \\ /etc/cvmfs/keys/.pub Replace as appropriate. If the cp command prompts about overwriting an existing file, type 'y'. Verify that publishing operation succeeds: root@host # su -c \"cvmfs_server transaction \" root@host # su -c \"cvmfs_server publish \" Within an hour, the repository updates should appear at the OSG Operations and FNAL Stratum-1 servers. On success, make sure the whitelist update happens daily by creating /etc/cron.d/fetch-cvmfs-whitelist with the following contents: 5 4 * * * cd /srv/cvmfs/ && wget -qO .cvmfswhitelist.new http://oasis.opensciencegrid.org/cvmfs//.cvmfswhitelist && mv .cvmfswhitelist.new .cvmfswhitelist Note This cronjob eliminates the need for the repository service administrator to periodically use cvmfs_server resign to update .cvmfswhitelist as described in the upstream CVMFS documentation. Update the open support ticket to indicate that the previous steps have been completed Once the repository is fully replicated on the OSG, the VO may proceed in publishing into CVMFS using the account on the repository server. Tip We strongly recommend the repository maintainer read through the upstream documentation on maintaining repositories and content limitations . Finally, if the new repository will be used outside of the U.S., the VO should open a GGUS ticket following EGI's PROC20 to get the repository replicated onto worldwide Stratum 1s.","title":"Hosting a Repository on OASIS"},{"location":"data/external-oasis-repos/#replacing-an-existing-oasis-repository-server","text":"If a need arises to replace a server for an existing *.opensciencegrid.org or *.osgstorage.org repository, there are two ways to do it: one without changing the DNS name and one with changing it. The latter can take longer because it requires OSG Operations intervention. Revision numbers must increase CVMFS does not allow repository revision numbers to decrease, so the instructions below make sure the revision numbers only go up.","title":"Replacing an Existing OASIS Repository Server"},{"location":"data/external-oasis-repos/#without-changing-the-server-dns-name","text":"If you are recreating the repository on the same machine, use the following command to remove the repository configuration while preserving the data and keys: root@host # cvmfs_server rmfs -p Otherwise if it is a new machine, copy the keys from /etc/cvmfs/keys/ .* and the data from /srv/cvmfs/ from the old server to the new, making sure that no publish operations happen on the old server while you copy the data. Then in either case use cvmfs_server import instead of cvmfs_server mkfs in the above instructions for Creating the Repository , in order to reuse old data and keys. Note that you wil need to reapply any custom configuration changes under /etc/cvmfs/repositories.d/ ` that was on the old server. If you run an old and a new machine in parallel for a while, make sure that when you put the new machine into production (by moving the DNS name) that the new machine has had at least as many publishes as the old machine, so the revision number does not decrease.","title":"Without changing the server DNS name"},{"location":"data/external-oasis-repos/#with-changing-the-server-dns-name","text":"Note If you create a repository from scratch, as opposed to copying the data and keys from an old server, it is in fact better to change the DNS name of the server because that causes the OSG Operations server to reinitialize the .cvmfswhitelist. If you create a replacement repository on a new machine from scratch, follow the normal instructions on this page above, but with the following differences in the Hosting a Repository on OASIS section: In step 2, instead of asking in the support ticket to create a new repository, give the new URL and ask them to change the repository registration to that URL. When you do the publish in step 5, add a -n NNNN option where NNNN is a revision number greater than the number on the existing repository. That number can be found by this command on a client machine: user@host $ attr -qg revision /cvmfs/ Skip step 6; there is no need to tell OSG Operations when you are finished. After enough time has elapsed for the publish to propagate to clients, typically around 15 minutes, verify that the new chosen revision has reached a client.","title":"With changing the server DNS name"},{"location":"data/external-oasis-repos/#removing-a-repository-from-oasis","text":"In order to remove a repository that is being hosted on OASIS, perform the following steps: If the repository has been replicated outside of the U.S., open a GGUS ticket assigned to support unit \"Software and Data Distribution (CVMFS)\" asking that the replication be removed from EGI Stratum-1s. Remind them in the ticket that there are worldwide Stratum-1s that automatically replicate all OSG repositories that RAL replicates, so those Stratum-1s cannot remove their replicas before RAL does but their administrators will need to be notified to remove their replicas within 8 hours after RAL does to avoid alarms. Wait until this ticket is resolved before proceeding. Open a support ticket asking to shut down the repository, giving the repository name (e.g., ), and the corresponding VO.","title":"Removing a Repository from OASIS"},{"location":"data/frontier-squid/","text":"Install the Frontier Squid HTTP Caching Proxy \u00b6 Frontier Squid is a distribution of the well-known squid HTTP caching proxy software that is optimized for use with applications on the Worldwide LHC Computing Grid (WLCG). It has many advantages over regular squid for common distributed computing applications, especially Frontier and CVMFS. The OSG distribution of frontier-squid is a straight rebuild of the upstream frontier-squid package for the convenience of OSG users. This document is intended for System Administrators who are installing frontier-squid , the OSG distribution of the Frontier Squid software. Frontier Squid Is Recommended \u00b6 OSG recommends that all sites run a caching proxy for HTTP and HTTPS to help reduce bandwidth and improve throughput. To that end, Compute Element (CE) installations include Frontier Squid automatically. We encourage all sites to configure and use this service, as described below. For large sites that expect heavy load on the proxy, it is best to run the proxy on its own host. If you are unsure if your site qualifies, we recommend initially running the proxy on your CE host and monitoring its bandwidth. If the network usage regularly peaks at over one third of the bandwidth capacity, move the proxy to a new host. Before Starting \u00b6 Before starting the installation process, consider the following points (consulting the Reference section below as needed): Hardware requirements: If you will be supporting the Frontier application at your site, review the hardware recommendations to determine how to size your equipment. User IDs: If it does not exist already, the installation will create the squid Linux user Network ports: Clients within your cluster (e.g., OSG user jobs) will communicate with Frontier Squid on port 3128 (TCP). Additionally, central infrastructure will monitor Frontier Squid through port 3401 (UDP); see this section for more details. As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Installing Frontier Squid \u00b6 To install Frontier Squid, make sure that your host is up to date before installing the required packages: Clean yum cache: root@host # yum clean all --enablerepo = * Update software: root@host # yum update This command will update all packages Install Frontier Squid: root@host # yum install frontier-squid Configuring Frontier Squid \u00b6 Configuring the Frontier Squid Service \u00b6 To configure the Frontier Squid service itself: Follow the Configuration section of the upstream Frontier Squid documentation . Enable, start, and test the service (as described below). Register the squid (also as described below ). Note An important difference between the standard Squid software and the Frontier Squid variant is that Frontier Squid changes are in /etc/squid/customize.sh instead of /etc/squid/squid.conf . Configuring the OSG CE \u00b6 To configure the OSG Compute Entrypoint (CE) to know about your Frontier Squid service: On your CE host (which may be different than your Frontier Squid host), edit /etc/osg/config.d/01-squid.ini Make sure that enabled is set to True Set location to the hostname and port of your Frontier Squid service (e.g., my.squid.host.edu:3128 ) Leave the other settings at DEFAULT unless you have specific reasons to change them Run osg-configure -c to propagate the changes on your CE. Note You may want to finish other CE configuration tasks before running osg-configure . Just be sure to run it once before starting CE services. Using Frontier-Squid \u00b6 Start the frontier-squid service and enable it to start at boot time. As a reminder, here are common service commands (all run as root ): To... Run the command... Start the service systemctl start frontier-squid Stop the service systemctl stop frontier-squid Enable the service to start on boot systemctl enable frontier-squid Disable the service from starting on boot systemctl disable frontier-squid Validating Frontier Squid \u00b6 As any user on another computer, do the following (where is the fully qualified domain name of your squid server): user@host $ export http_proxy = http://:3128 user@host $ wget -qdO/dev/null http://frontier.cern.ch 2 > & 1 | grep X-Cache X-Cache: MISS from user@host $ wget -qdO/dev/null http://frontier.cern.ch 2 > & 1 | grep X-Cache X-Cache: HIT from If the grep doesn't print anything, try removing it from the pipeline to see if errors are obvious. If the second try says MISS again, something is probably wrong with the squid cache writes. Look at the squid access.log file to try to see what's wrong. Registering Frontier Squid \u00b6 To register your Frontier Squid host, follow the general registration instructions here with the following Frontier Squid-specific details. Alternatively, contact us for assistance with the registration process. Add a Squid: section to the Services: list, with any relevant fields for that service. This is a partial example: ... FQDN: Services: Squid: Description: Generic squid service ... Replacing with your Frontier Squid server's DNS entry or in the case of multiple Frontier Squid servers for a single resource, the round-robin DNS entry. See the BNL_ATLAS_Frontier_Squid for a complete example. Normally registered squids will be monitored by WLCG. This is strongly recommended even for non-WLCG sites so operations experts can help with diagnosing problems. However, if a site declines monitoring, that can be indicated by setting Monitored: false in a Details: section below Description: . Registration is still important for the sake of excluding squids from worker node failover monitors. The default if Details: Monitored: is not set is true . If you set Monitored to true, also enable monitoring as described in the upstream documentation on enabling monitoring . A few hours after a squid is registered and marked Active (and not marked Monitored: false ), verify that it is monitored by WLCG . Reference \u00b6 Users \u00b6 The frontier-squid installation will create one user account unless it already exists. User Comment squid Reduced privilege user that the squid process runs under. Set the default gid of the \"squid\" user to be a group that is also called \"squid\". The package can instead use another user name of your choice if you create a configuration file before installation. Details are in the upstream documentation Preparation section . Networking \u00b6 Open the following ports on your Frontier Squid hosts: Port Number Protocol WAN LAN Comment 3128 tcp \u2713 Also limited in squid ACLs. Should be limited to access from your worker nodes 3401 udp \u2713 Also limited in squid ACLs. Should be limited to public monitoring server addresses The addresses of the WLCG monitoring servers for use in firewalls are listed in the upstream documentation Enabling monitoring section . Frontier Squid Log Files \u00b6 Log file contents are explained in the upstream documentation Log file contents section .","title":"Install Frontier Squid RPM"},{"location":"data/frontier-squid/#install-the-frontier-squid-http-caching-proxy","text":"Frontier Squid is a distribution of the well-known squid HTTP caching proxy software that is optimized for use with applications on the Worldwide LHC Computing Grid (WLCG). It has many advantages over regular squid for common distributed computing applications, especially Frontier and CVMFS. The OSG distribution of frontier-squid is a straight rebuild of the upstream frontier-squid package for the convenience of OSG users. This document is intended for System Administrators who are installing frontier-squid , the OSG distribution of the Frontier Squid software.","title":"Install the Frontier Squid HTTP Caching Proxy"},{"location":"data/frontier-squid/#frontier-squid-is-recommended","text":"OSG recommends that all sites run a caching proxy for HTTP and HTTPS to help reduce bandwidth and improve throughput. To that end, Compute Element (CE) installations include Frontier Squid automatically. We encourage all sites to configure and use this service, as described below. For large sites that expect heavy load on the proxy, it is best to run the proxy on its own host. If you are unsure if your site qualifies, we recommend initially running the proxy on your CE host and monitoring its bandwidth. If the network usage regularly peaks at over one third of the bandwidth capacity, move the proxy to a new host.","title":"Frontier Squid Is Recommended"},{"location":"data/frontier-squid/#before-starting","text":"Before starting the installation process, consider the following points (consulting the Reference section below as needed): Hardware requirements: If you will be supporting the Frontier application at your site, review the hardware recommendations to determine how to size your equipment. User IDs: If it does not exist already, the installation will create the squid Linux user Network ports: Clients within your cluster (e.g., OSG user jobs) will communicate with Frontier Squid on port 3128 (TCP). Additionally, central infrastructure will monitor Frontier Squid through port 3401 (UDP); see this section for more details. As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories","title":"Before Starting"},{"location":"data/frontier-squid/#installing-frontier-squid","text":"To install Frontier Squid, make sure that your host is up to date before installing the required packages: Clean yum cache: root@host # yum clean all --enablerepo = * Update software: root@host # yum update This command will update all packages Install Frontier Squid: root@host # yum install frontier-squid","title":"Installing Frontier Squid"},{"location":"data/frontier-squid/#configuring-frontier-squid","text":"","title":"Configuring Frontier Squid"},{"location":"data/frontier-squid/#configuring-the-frontier-squid-service","text":"To configure the Frontier Squid service itself: Follow the Configuration section of the upstream Frontier Squid documentation . Enable, start, and test the service (as described below). Register the squid (also as described below ). Note An important difference between the standard Squid software and the Frontier Squid variant is that Frontier Squid changes are in /etc/squid/customize.sh instead of /etc/squid/squid.conf .","title":"Configuring the Frontier Squid Service"},{"location":"data/frontier-squid/#configuring-the-osg-ce","text":"To configure the OSG Compute Entrypoint (CE) to know about your Frontier Squid service: On your CE host (which may be different than your Frontier Squid host), edit /etc/osg/config.d/01-squid.ini Make sure that enabled is set to True Set location to the hostname and port of your Frontier Squid service (e.g., my.squid.host.edu:3128 ) Leave the other settings at DEFAULT unless you have specific reasons to change them Run osg-configure -c to propagate the changes on your CE. Note You may want to finish other CE configuration tasks before running osg-configure . Just be sure to run it once before starting CE services.","title":"Configuring the OSG CE"},{"location":"data/frontier-squid/#using-frontier-squid","text":"Start the frontier-squid service and enable it to start at boot time. As a reminder, here are common service commands (all run as root ): To... Run the command... Start the service systemctl start frontier-squid Stop the service systemctl stop frontier-squid Enable the service to start on boot systemctl enable frontier-squid Disable the service from starting on boot systemctl disable frontier-squid","title":"Using Frontier-Squid"},{"location":"data/frontier-squid/#validating-frontier-squid","text":"As any user on another computer, do the following (where is the fully qualified domain name of your squid server): user@host $ export http_proxy = http://:3128 user@host $ wget -qdO/dev/null http://frontier.cern.ch 2 > & 1 | grep X-Cache X-Cache: MISS from user@host $ wget -qdO/dev/null http://frontier.cern.ch 2 > & 1 | grep X-Cache X-Cache: HIT from If the grep doesn't print anything, try removing it from the pipeline to see if errors are obvious. If the second try says MISS again, something is probably wrong with the squid cache writes. Look at the squid access.log file to try to see what's wrong.","title":"Validating Frontier Squid"},{"location":"data/frontier-squid/#registering-frontier-squid","text":"To register your Frontier Squid host, follow the general registration instructions here with the following Frontier Squid-specific details. Alternatively, contact us for assistance with the registration process. Add a Squid: section to the Services: list, with any relevant fields for that service. This is a partial example: ... FQDN: Services: Squid: Description: Generic squid service ... Replacing with your Frontier Squid server's DNS entry or in the case of multiple Frontier Squid servers for a single resource, the round-robin DNS entry. See the BNL_ATLAS_Frontier_Squid for a complete example. Normally registered squids will be monitored by WLCG. This is strongly recommended even for non-WLCG sites so operations experts can help with diagnosing problems. However, if a site declines monitoring, that can be indicated by setting Monitored: false in a Details: section below Description: . Registration is still important for the sake of excluding squids from worker node failover monitors. The default if Details: Monitored: is not set is true . If you set Monitored to true, also enable monitoring as described in the upstream documentation on enabling monitoring . A few hours after a squid is registered and marked Active (and not marked Monitored: false ), verify that it is monitored by WLCG .","title":"Registering Frontier Squid"},{"location":"data/frontier-squid/#reference","text":"","title":"Reference"},{"location":"data/frontier-squid/#users","text":"The frontier-squid installation will create one user account unless it already exists. User Comment squid Reduced privilege user that the squid process runs under. Set the default gid of the \"squid\" user to be a group that is also called \"squid\". The package can instead use another user name of your choice if you create a configuration file before installation. Details are in the upstream documentation Preparation section .","title":"Users"},{"location":"data/frontier-squid/#networking","text":"Open the following ports on your Frontier Squid hosts: Port Number Protocol WAN LAN Comment 3128 tcp \u2713 Also limited in squid ACLs. Should be limited to access from your worker nodes 3401 udp \u2713 Also limited in squid ACLs. Should be limited to public monitoring server addresses The addresses of the WLCG monitoring servers for use in firewalls are listed in the upstream documentation Enabling monitoring section .","title":"Networking"},{"location":"data/frontier-squid/#frontier-squid-log-files","text":"Log file contents are explained in the upstream documentation Log file contents section .","title":"Frontier Squid Log Files"},{"location":"data/run-frontier-squid-container/","text":"Running Frontier Squid in a Container \u00b6 Frontier Squid is a distribution of the well-known squid HTTP caching proxy software that is optimized for use with applications on the Worldwide LHC Computing Grid (WLCG). It has many advantages over regular squid for common distributed computing applications, especially Frontier and CVMFS. The OSG distribution of frontier-squid is a straight rebuild of the upstream frontier-squid package for the convenience of OSG users. Tip OSG recommends that all sites run a caching proxy for HTTP to help reduce bandwidth and improve throughput. This document outlines how to run Frontier Squid in a Docker container. Before Starting \u00b6 Before starting the installation process, consider the following points (consulting the Frontier Squid Reference section as needed): Docker: For the purpose of this guide, the host must have a running docker service and you must have the ability to start containers (i.e., belong to the docker Unix group). Network ports: Frontier squid communicates on ports 3128 (TCP) and 3401 (UDP). We encourage sites to allow monitoring on port 3401 via UDP from CERN IP address ranges, 128.142.0.0/16, 188.184.128.0/17, 188.185.48.0/20 and 188.185.128.0/17. See the CERN monitoring documentation for additional details. If outgoing connections are filtered, note that CVMFS always uses ports 8000, 80, or 8080. Host choice: If you will be supporting the Frontier application at your site, review the upstream documentation to determine how to size your equipment. Configuring Squid \u00b6 Environment variables (optional) \u00b6 In addition to the required configuration above (ports and file systems), you may also configure the behavior of your cache with the following environment variables: Variable name Description Defaults SQUID_IPRANGE Limits the incoming connections to the provided whitelist. By default only standard private network addresses are whitelisted. SQUID_CACHE_DISK Sets the cache_dir option which determines the disk size squid uses. Must be an integer value, and its unit is MBs. Note: The cache disk area is located at /var/cache/squid. Defaults to 10000. SQUID_CACHE_MEM Sets the cache_mem option which regulates the size squid reserves for caching small objects in memory. Includes a space and unit, e.g. \"MB\". Defaults to \"128 MB\". Cache Disk Size For production deployments, OSG recommends allocating at least 50 to 100 GB (50000 to 100000 MB) to SQUID_CACHE_DISK. Mount points \u00b6 In order to preserve the cache between redeployments, you should map the following areas to persistent storage outside the container: Mountpoint Description Example docker mount /var/cache/squid This directory contains the cache for squid. See also SQUID_CACHE_DISK above. -v /tmp/squid:/var/cache/squid /var/log/squid This directory contains the squid logs. -v /tmp/log:/var/log/squid For more details, see the Frontier Squid documentation . Configuration customization (optional) \u00b6 More complicated configuration customization can be done by mounting .sh and .awk files into /etc/squid/customize.d. For details on the names and content of those files see the comments in the customization script and see the upstream documentation on configuration customization. Running a Frontier Squid Container \u00b6 To run a Frontier Squid container with the defaults: user@host $ docker run --rm --name frontier-squid \\ -v :/var/cache/squid \\ -v :/var/log/squid \\ -p :3128 opensciencegrid/frontier-squid:23-release You may pass configuration variables in KEY=VALUE format with either docker -e options or in a file specified with --env-file= . Running a Frontier Squid container with systemd \u00b6 An example systemd service file for Frontier Squid. This will require creating the environment file in the directory /opt/xcache/.env . Note This example systemd file assumes is 3128 and is /tmp/squid and is /tmp/log . Create the systemd service file /etc/systemd/system/docker.frontier-squid.service as follows: [Unit] Description=Stash Cache Container After=docker.service Requires=docker.service [Service] TimeoutStartSec=0 Restart=always ExecStartPre=-/usr/bin/docker stop %n ExecStartPre=-/usr/bin/docker rm %n ExecStartPre=/usr/bin/docker pull opensciencegrid/frontier-squid:23-release ExecStart=/usr/bin/docker run --rm --name %n --publish 3128:3128 -v /tmp/squid:/var/cache/squid -v /tmp/log:/var/log/squid --env-file /opt/xcache/.env opensciencegrid/frontier-squid:23-release [Install] WantedBy=multi-user.target Enable and start the service with: root@host $ systemctl enable docker.frontier-squid root@host $ systemctl start docker.frontier-squid Validating the Frontier Squid Cache \u00b6 The cache server functions as a normal HTTP server and can interact with typical HTTP clients, such as curl or wget . Here, is the port chosen in the docker run command, 3128 by default. user@host $ export http_proxy = http://localhost: user@host $ wget -qdO/dev/null http://frontier.cern.ch 2 > & 1 | grep X-Cache X-Cache: MISS from 797a56e426cf user@host $ wget -qdO/dev/null http://frontier.cern.ch 2 > & 1 | grep X-Cache X-Cache: HIT from 797a56e426cf Registering Frontier Squid \u00b6 See the Registering Frontier Squid instructions to register your Frontier Squid host. Getting Help \u00b6 To get assistance, please use the this page .","title":"Running Frontier Squid in a Container"},{"location":"data/run-frontier-squid-container/#running-frontier-squid-in-a-container","text":"Frontier Squid is a distribution of the well-known squid HTTP caching proxy software that is optimized for use with applications on the Worldwide LHC Computing Grid (WLCG). It has many advantages over regular squid for common distributed computing applications, especially Frontier and CVMFS. The OSG distribution of frontier-squid is a straight rebuild of the upstream frontier-squid package for the convenience of OSG users. Tip OSG recommends that all sites run a caching proxy for HTTP to help reduce bandwidth and improve throughput. This document outlines how to run Frontier Squid in a Docker container.","title":"Running Frontier Squid in a Container"},{"location":"data/run-frontier-squid-container/#before-starting","text":"Before starting the installation process, consider the following points (consulting the Frontier Squid Reference section as needed): Docker: For the purpose of this guide, the host must have a running docker service and you must have the ability to start containers (i.e., belong to the docker Unix group). Network ports: Frontier squid communicates on ports 3128 (TCP) and 3401 (UDP). We encourage sites to allow monitoring on port 3401 via UDP from CERN IP address ranges, 128.142.0.0/16, 188.184.128.0/17, 188.185.48.0/20 and 188.185.128.0/17. See the CERN monitoring documentation for additional details. If outgoing connections are filtered, note that CVMFS always uses ports 8000, 80, or 8080. Host choice: If you will be supporting the Frontier application at your site, review the upstream documentation to determine how to size your equipment.","title":"Before Starting"},{"location":"data/run-frontier-squid-container/#configuring-squid","text":"","title":"Configuring Squid"},{"location":"data/run-frontier-squid-container/#environment-variables-optional","text":"In addition to the required configuration above (ports and file systems), you may also configure the behavior of your cache with the following environment variables: Variable name Description Defaults SQUID_IPRANGE Limits the incoming connections to the provided whitelist. By default only standard private network addresses are whitelisted. SQUID_CACHE_DISK Sets the cache_dir option which determines the disk size squid uses. Must be an integer value, and its unit is MBs. Note: The cache disk area is located at /var/cache/squid. Defaults to 10000. SQUID_CACHE_MEM Sets the cache_mem option which regulates the size squid reserves for caching small objects in memory. Includes a space and unit, e.g. \"MB\". Defaults to \"128 MB\". Cache Disk Size For production deployments, OSG recommends allocating at least 50 to 100 GB (50000 to 100000 MB) to SQUID_CACHE_DISK.","title":"Environment variables (optional)"},{"location":"data/run-frontier-squid-container/#mount-points","text":"In order to preserve the cache between redeployments, you should map the following areas to persistent storage outside the container: Mountpoint Description Example docker mount /var/cache/squid This directory contains the cache for squid. See also SQUID_CACHE_DISK above. -v /tmp/squid:/var/cache/squid /var/log/squid This directory contains the squid logs. -v /tmp/log:/var/log/squid For more details, see the Frontier Squid documentation .","title":"Mount points"},{"location":"data/run-frontier-squid-container/#configuration-customization-optional","text":"More complicated configuration customization can be done by mounting .sh and .awk files into /etc/squid/customize.d. For details on the names and content of those files see the comments in the customization script and see the upstream documentation on configuration customization.","title":"Configuration customization (optional)"},{"location":"data/run-frontier-squid-container/#running-a-frontier-squid-container","text":"To run a Frontier Squid container with the defaults: user@host $ docker run --rm --name frontier-squid \\ -v :/var/cache/squid \\ -v :/var/log/squid \\ -p :3128 opensciencegrid/frontier-squid:23-release You may pass configuration variables in KEY=VALUE format with either docker -e options or in a file specified with --env-file= .","title":"Running a Frontier Squid Container"},{"location":"data/run-frontier-squid-container/#running-a-frontier-squid-container-with-systemd","text":"An example systemd service file for Frontier Squid. This will require creating the environment file in the directory /opt/xcache/.env . Note This example systemd file assumes is 3128 and is /tmp/squid and is /tmp/log . Create the systemd service file /etc/systemd/system/docker.frontier-squid.service as follows: [Unit] Description=Stash Cache Container After=docker.service Requires=docker.service [Service] TimeoutStartSec=0 Restart=always ExecStartPre=-/usr/bin/docker stop %n ExecStartPre=-/usr/bin/docker rm %n ExecStartPre=/usr/bin/docker pull opensciencegrid/frontier-squid:23-release ExecStart=/usr/bin/docker run --rm --name %n --publish 3128:3128 -v /tmp/squid:/var/cache/squid -v /tmp/log:/var/log/squid --env-file /opt/xcache/.env opensciencegrid/frontier-squid:23-release [Install] WantedBy=multi-user.target Enable and start the service with: root@host $ systemctl enable docker.frontier-squid root@host $ systemctl start docker.frontier-squid","title":"Running a Frontier Squid container with systemd"},{"location":"data/run-frontier-squid-container/#validating-the-frontier-squid-cache","text":"The cache server functions as a normal HTTP server and can interact with typical HTTP clients, such as curl or wget . Here, is the port chosen in the docker run command, 3128 by default. user@host $ export http_proxy = http://localhost: user@host $ wget -qdO/dev/null http://frontier.cern.ch 2 > & 1 | grep X-Cache X-Cache: MISS from 797a56e426cf user@host $ wget -qdO/dev/null http://frontier.cern.ch 2 > & 1 | grep X-Cache X-Cache: HIT from 797a56e426cf","title":"Validating the Frontier Squid Cache"},{"location":"data/run-frontier-squid-container/#registering-frontier-squid","text":"See the Registering Frontier Squid instructions to register your Frontier Squid host.","title":"Registering Frontier Squid"},{"location":"data/run-frontier-squid-container/#getting-help","text":"To get assistance, please use the this page .","title":"Getting Help"},{"location":"data/update-oasis/","text":"Updating Software in OASIS \u00b6 OASIS is the OSG Application Software Installation Service that can be used to publish and update software on OSG Worker Nodes under /cvmfs/oasis.opensciencegrid.org . It is implemented using CernVM FileSystem (CVMFS) technology and is the recommended method to make software available to researchers in the OSG Consortium. This document is a step by step explanation of how a member of a Virtual Organization (VO) can become an OASIS manager for their VO and gain access to the shared OASIS service for software management. The shared OASIS service is especially appropropriate for VOs that have a relatively small number of members and a relatively small amount of software to distribute. Larger VOs should consider hosting their own separate repositories . Note For information on how to configure an OASIS client see the CVMFS installation documentation . Requirements \u00b6 To begin the process to distribute software on OASIS using the service, you must: Register as an OSG contact and upload your SSH Key . Submit a request to help@osg-htc.org to become an OASIS manager with the following: The names of the VO(s) whose software that you would like to manage with the shared OASIS login host The names of any other VO members that should be OASIS managers The name of a member of the VO(s) that can verify your affiliation, and Cc that person on your emailed request How to use OASIS \u00b6 Log in with SSH \u00b6 The shared OASIS login server is accessible via SSH for all OASIS managers with registered SSH keys: user@host $ ssh -i ouser.@oasis-login.opensciencegrid.org Change for the name of the Virtual Organization you are trying to access and with the path to the private part of the SSH key whose public part you registered with the OSG . Instead of putting -i or ouser.@ on the command line, you can put it in your ~/.ssh/config : Host oasis-login.opensciencegrid.org User ouser. IdentityFile Install and update software \u00b6 Once you log in, you can add/modify/remove content on a staging area at /stage/oasis/$VO where $VO is the name of the VO represented by the manager. Files here are visible to both oasis-login and the Stratum 0 server (oasis.opensciencegrid.org). There is a symbolic link at /cvmfs/oasis.opensciencegrid.org/$VO that points to the same staging area. Request an oasis publish with this command: user@host $ osg-oasis-update This command queues a process to sync the content of OASIS with the content of /stage/oasis/$VO osg-oasis-update returns immediately, but only one update can run at a time (across all VOs); your request may be queued behind a different VO. If you encounter severe delays before the update is finished being published (more than 4 hours), please file a support ticket . Limitations on repository content \u00b6 Although CVMFS provides a POSIX filesystem, it does not work well with all types of content. Content in OASIS is expected to adhere to the CVMFS repository content limitations so please review those guidelines carefully. Testing \u00b6 After osg-oasis-update completes and the changes have been propagated to the CVMFS stratum 1 servers (typically between 0 and 60 minutes, but possibly longer if the servers are busy with updates of other repositories) then the changes can be visible under /cvmfs/oasis.opensciencegrid.org on a computer that has the CVMFS client installed . A client normally only checks for updates if at least an hour has passed since it last checked, but people who have superuser access on the client machine can force it to check again with root@host # cvmfs_talk -i oasis.opensciencegrid.org remount This can be done while the filesystem is mounted (despite the name, it does not do an OS-level umount/mount of the filesystem). If the filesystem is not mounted, it will automatically check for new updates the next time it is mounted. In order to find out if an update has reached the CVMFS stratum 1 server, you can find out the latest osg-oasis-update time seen by the stratum 1 most favored by your CVMFS client with the following long command on your client machine: user@host $ date -d \"1970-1-1 GMT + $( wget -qO- $( attr -qg host /cvmfs/oasis.opensciencegrid.org ) /.cvmfspublished | \\ cat -v | sed -n '/^T/{s/^T//p;q;}' ) sec\" References \u00b6 CVMFS Documentation","title":"Update OASIS Shared Repo"},{"location":"data/update-oasis/#updating-software-in-oasis","text":"OASIS is the OSG Application Software Installation Service that can be used to publish and update software on OSG Worker Nodes under /cvmfs/oasis.opensciencegrid.org . It is implemented using CernVM FileSystem (CVMFS) technology and is the recommended method to make software available to researchers in the OSG Consortium. This document is a step by step explanation of how a member of a Virtual Organization (VO) can become an OASIS manager for their VO and gain access to the shared OASIS service for software management. The shared OASIS service is especially appropropriate for VOs that have a relatively small number of members and a relatively small amount of software to distribute. Larger VOs should consider hosting their own separate repositories . Note For information on how to configure an OASIS client see the CVMFS installation documentation .","title":"Updating Software in OASIS"},{"location":"data/update-oasis/#requirements","text":"To begin the process to distribute software on OASIS using the service, you must: Register as an OSG contact and upload your SSH Key . Submit a request to help@osg-htc.org to become an OASIS manager with the following: The names of the VO(s) whose software that you would like to manage with the shared OASIS login host The names of any other VO members that should be OASIS managers The name of a member of the VO(s) that can verify your affiliation, and Cc that person on your emailed request","title":"Requirements"},{"location":"data/update-oasis/#how-to-use-oasis","text":"","title":"How to use OASIS"},{"location":"data/update-oasis/#log-in-with-ssh","text":"The shared OASIS login server is accessible via SSH for all OASIS managers with registered SSH keys: user@host $ ssh -i ouser.@oasis-login.opensciencegrid.org Change for the name of the Virtual Organization you are trying to access and with the path to the private part of the SSH key whose public part you registered with the OSG . Instead of putting -i or ouser.@ on the command line, you can put it in your ~/.ssh/config : Host oasis-login.opensciencegrid.org User ouser. IdentityFile ","title":"Log in with SSH"},{"location":"data/update-oasis/#install-and-update-software","text":"Once you log in, you can add/modify/remove content on a staging area at /stage/oasis/$VO where $VO is the name of the VO represented by the manager. Files here are visible to both oasis-login and the Stratum 0 server (oasis.opensciencegrid.org). There is a symbolic link at /cvmfs/oasis.opensciencegrid.org/$VO that points to the same staging area. Request an oasis publish with this command: user@host $ osg-oasis-update This command queues a process to sync the content of OASIS with the content of /stage/oasis/$VO osg-oasis-update returns immediately, but only one update can run at a time (across all VOs); your request may be queued behind a different VO. If you encounter severe delays before the update is finished being published (more than 4 hours), please file a support ticket .","title":"Install and update software"},{"location":"data/update-oasis/#limitations-on-repository-content","text":"Although CVMFS provides a POSIX filesystem, it does not work well with all types of content. Content in OASIS is expected to adhere to the CVMFS repository content limitations so please review those guidelines carefully.","title":"Limitations on repository content"},{"location":"data/update-oasis/#testing","text":"After osg-oasis-update completes and the changes have been propagated to the CVMFS stratum 1 servers (typically between 0 and 60 minutes, but possibly longer if the servers are busy with updates of other repositories) then the changes can be visible under /cvmfs/oasis.opensciencegrid.org on a computer that has the CVMFS client installed . A client normally only checks for updates if at least an hour has passed since it last checked, but people who have superuser access on the client machine can force it to check again with root@host # cvmfs_talk -i oasis.opensciencegrid.org remount This can be done while the filesystem is mounted (despite the name, it does not do an OS-level umount/mount of the filesystem). If the filesystem is not mounted, it will automatically check for new updates the next time it is mounted. In order to find out if an update has reached the CVMFS stratum 1 server, you can find out the latest osg-oasis-update time seen by the stratum 1 most favored by your CVMFS client with the following long command on your client machine: user@host $ date -d \"1970-1-1 GMT + $( wget -qO- $( attr -qg host /cvmfs/oasis.opensciencegrid.org ) /.cvmfspublished | \\ cat -v | sed -n '/^T/{s/^T//p;q;}' ) sec\"","title":"Testing"},{"location":"data/update-oasis/#references","text":"CVMFS Documentation","title":"References"},{"location":"data/stashcache/install-cache/","text":"Installing the OSDF Cache \u00b6 This document describes how to install an Open Science Data Federation (OSDF) cache service. This service allows a site or regional network to cache data frequently used on the OSG, reducing data transfer over the wide-area network and decreasing access latency. Minimum version for this documentation This document describes features introduced in XCache 3.3.0, released on 2022-12-08. When installing, ensure that your version of the stash-cache RPM is at least 3.3.0. Note The OSDF cache was previously named \"Stash Cache\" and some documentation and software may use the old name. Before Starting \u00b6 Before starting the installation process, consider the following requirements: Operating system: Ensure the host has a supported operating system User IDs: If they do not exist already, the installation will create the Linux user IDs condor and xrootd Host certificate: Required for authentication. See our host certificate documentation for instructions on how to request and install host certificates. Network ports: Your host may run a public cache instance (for serving public data only), an authenticated cache instance (for serving protected data), or both. A public cache instance requires the following ports open: Inbound TCP port 1094 for file access via the XRootD protocol Inbound TCP port 8000 for file access via HTTP(S) Outbound UDP port 9930 for reporting to xrd-report.osgstorage.org and xrd-mon.osgstorage.org for monitoring An authenticated cache instance requires the following ports open: Inbound TCP port 8443 for authenticated file access via HTTPS Outbound UDP port 9930 for reporting to xrd-report.osgstorage.org and xrd-mon.osgstorage.org for monitoring Hardware requirements: We recommend that a cache has at least 10Gbps connectivity, 1TB of disk space for the cache directory, and 12GB of RAM. As with all OSG software installations, there are some one-time steps to prepare in advance: Obtain root access to the host Prepare the required Yum repositories Install CA certificates Registering the Cache \u00b6 To be part of the OSDF, your cache must be registered with the OSG. You will need basic information like the resource name, hostname, host certificate DN, and the administrative and security contacts. Initial registration \u00b6 To register your cache host, follow the general registration instructions here . The service type is XRootD cache server . Info This step must be completed before installation. In your registration, you must specify which VOs your cache will serve by adding an AllowedVOs list, with each line specifying a VO whose data you are willing to cache. There are special values you may use in AllowedVOs : ANY_PUBLIC indicates that the cache is willing to serve public data from any VO. ANY indicates that the cache is willing to serve data from any VO, both public and protected. ANY implies ANY_PUBLIC . There are extra requirements for serving protected data: In addition to the cache allowing a VO in the AllowedVOs list, that VO must also allow the cache in its AllowedCaches list. See the page on getting your VO's data into OSDF . There must be an authenticated XRootD instance on the cache server. There must be a DN attribute in the resource registration with the subject DN of the host certificate This is an example registration for a cache server that serves all public data: MY_OSDF_CACHE : FQDN : my-cache.example.net Services : XRootD cache server : Description : OSDF cache server AllowedVOs : - ANY_PUBLIC This is an example registration for a cache server that only serves protected data for the Open Science Pool: MY_AUTH_OSDF_CACHE : FQDN : my-auth-cache.example.net Services : XRootD cache server : Description : OSDF cache server AllowedVOs : - OSG DN : /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=my-auth-cache.example.net This is an example registration for a cache server that serves all public data and protected data from the OSG VO: MY_COMBO_OSDF_CACHE : FQDN : my-combo-cache.example.net Services : XRootD cache server : Description : OSDF cache server AllowedVOs : - OSG - ANY_PUBLIC DN : /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=my-combo-cache.example.net Non-standard ports \u00b6 By default, an unauthenticated cache instance serves public data on port 8000, and an authenticated cache instance serves protected data on port 8443. If you change the ports for your cache instances, you must specify the new endpoints under the service, as follows: MY_COMBO_OSDF_CACHE2 : FQDN : my-combo-cache2.example.net Services : XRootD cache server : Description : OSDF cache server Details : endpoint_override : my-combo-cache2.example.net:8080 auth_endpoint_override : my-combo-cache2.example.net:8444 Finalizing registration \u00b6 Once initial registration is complete, you may start the installation process. In the meantime, open a help ticket with your cache name. Mention in your ticket that you would like to \"Finalize the cache registration.\" Installing the Cache \u00b6 The OSDF software consists of an XRootD server with special configuration and supporting services. To simplify installation, OSG provides convenience RPMs that install all required packages with a single command: root@host # yum install stash-cache Configuring the Cache \u00b6 First, you must create a \"cache directory\", which will be used to store downloaded files. By default this is /mnt/stash . We recommend using a separate file system for the cache directory, with at least 1 TB of storage available. Note The cache directory must be writable by the xrootd:xrootd user and group. The stash-cache package provides default configuration files in /etc/xrootd/xrootd-stash-cache.cfg and /etc/xrootd/config.d/ . Administrators may provide additional configuration by placing files in /etc/xrootd/config.d/1*.cfg (for files that need to be processed BEFORE the OSG configuration) or /etc/xrootd/config.d/9*.cfg (for files that need to be processed AFTER the OSG configuration). You must configure every variable in /etc/xrootd/config.d/10-common-site-local.cfg . The mandatory variables to configure are: set rootdir = /mnt/stash : the mounted filesystem path to export. This document refers to this as /mnt/stash . set resourcename = YOUR_RESOURCE_NAME : the resource name registered with the OSG. Ensure the xrootd service has a certificate \u00b6 The service will need a certificate for reporting and to authenticate to origins. The easiest solution for this is to use your host certificate and key as follows: Copy the host certificate to /etc/grid-security/xrd/xrd{cert,key}.pem Set the owner of the directory and contents /etc/grid-security/xrd/ to xrootd:xrootd : root@host # chown -R xrootd:xrootd /etc/grid-security/xrd/ Note You must repeat the above steps whenever you renew your host certificate. If you automate certificate renewal, you should automate copying as well. In addition, you will need to restart the XRootD services ( xrootd@stash-cache and/or xrootd@stash-cache-auth ) so they load the updated certificates. For example, if you are using Certbot for Let's Encrypt, you should write a \"deploy hook\" as documented on the Certbot site . Configuring Optional Features \u00b6 Adjust disk utilization \u00b6 To adjust the disk utilization of your cache, create or edit a file named /etc/xrootd/config.d/90-local.cfg and set the values of pfc.diskusage . pfc.diskusage 0.90 0.95 The two values correspond to the low and high usage water marks, respectively. When usage goes above the high water mark, the XRootD service will delete cached files until usage goes below the low water mark. Enable remote debugging \u00b6 XRootD provides remote debugging via a read-only file system named digFS. This feature is disabled by default, but you may enable it if you need help troubleshooting your server. Warning Remote debugging should only be enabled for long as it is needed to troubleshoot your server. To enable remote debugging, edit /etc/xrootd/digauth.cfg and specify the authorizations for reading digFS. An example of authorizations: all allow gsi g=/glow h=*.cs.wisc.edu This gives access to the config file, log files, core files, and process information to anyone from *.cs.wisc.edu in the /glow VOMS group. See the XRootD manual for the full syntax. Remote debugging should only be enabled for as long as you need assistance. As soon as your issue has been resolved, revert any changes you have made to /etc/xrootd/digauth.cfg . Enable HTTPS on the unauthenticated cache \u00b6 By default, the unauthenticated cache instance uses plain HTTP, not HTTPS. To use HTTPS: Add a certificate according to the instructions above Uncomment set EnableVoms = 1 in /etc/xrootd/config.d/10-osg-xrdvoms.cfg Upgrading from OSG 3.5 If upgrading from OSG 3.5, you may have a file with the following contents in /etc/xrootd/config.d : # Support HTTPS access to unauthenticated cache if named stash-cache http.cadir /etc/grid-security/certificates http.cert /etc/grid-security/xrd/xrdcert.pem http.key /etc/grid-security/xrd/xrdkey.pem http.secxtractor /usr/lib64/libXrdLcmaps.so fi You must delete this config block or XRootD will fail to start. Manually Setting the FQDN (optional) \u00b6 The FQDN of the cache server that you registered in Topology may be different than its internal hostname (as reported by hostname -f ). For example, this may be the case if your cache is behind a load balancer such as LVS. In this case, you must manually tell the cache services which FQDN to use for topology lookups. Create the file /etc/systemd/system/stash-authfile@.service.d/override.conf (note the @ in the directory name) with the following contents: [Service] Environment = CACHE_FQDN=