Skip to content

Releases: dmwm/WMCore

WMCore 2.3.7.1 production WMAgent release

22 Oct 14:39
Compare
Choose a tag to compare

This WMAgent stable releases comes with changes to the software stack. First one being a minor update of the Python base image, from 3.8.16 to 3.8.20; second is an update to the MariaDB from version 10.6.5 to 10.6.19; lastly there has been also an update to the dbs3-client, which is now 4.0.19.
Additionally, the CouchDB replication filter functions have been replaced by native selector functionality. File stage-out has been instrumented for improved debugging information. Lastly, there are many enhancements and quality of life changes.

Release date: 22 October 2024.
Changes since release: 2.3.3.

WMAgent

Software stack

  • Update dbs3-client version to 4.0.19 (Valentin Kuznetsov) #12098
  • Fix dbs3-client to version 4.0.19 and update imports for Microservices (Dario Mapelli) #12102
  • Update python version to 3.8.20, with dmwm-base and wmagent-base images (Alan Malta Rodrigues) dmwm/CMSKubernetes#1547
  • Update WM Dockerfiles to tag pypi-20240923-stable (Alan Malta Rodrigues) dmwm/CMSKubernetes#1551

Features and/or feature changes

  • Update workqueue selector filter to no longer use ParentQueueUrl (Alan Malta Rodrigues) #12154
  • Replace WorkQueue replication filter by selector function (Alan Malta Rodrigues) #12143
  • Augmentation of gfal-cp debuggin for StageOutImpl and GFAL2Impl scripts (Andrea Piccinelli) #12081
  • check UserDrainMode before updating AgentDrainMode (Muhammad Hassan Ahmed) #12142
  • Update restartComponent.sh script to notify everyone in the WMCore team (Alan Malta Rodrigues) #12080 (backported to 2.3.4)
  • Only dump CMSSW process attributes in the logs when needed (Alan Malta Rodrigues) #12049 (backported to 2.3.4)

Bug Fixes

  • wmstats - consistent reporting of wmagent disk use (Dario Mapelli) #12140
  • WorkQueue: convert PNN to PSN for pileup location (Alan Malta Rodrigues) #12066 (backported to 2.3.4)
  • Use PNN to PSN conversion in WQE mapping (Kenyi Hurtado) #12094 (backported to 2.3.4)
  • Check if elements from workqueue workRestrictions are indeed Available (Alan Malta Rodrigues) #12050 (backported to 2.3.4)
  • Persist only logArch1 step output for jobs marked as failed inside JobAccountant (Alan Malta Rodrigues) #12093 (backported to 2.3.4)
  • skip possible JVM msg in voms-proxy-info output. Fix #12075 (Stefano Belforte) #12076 #12084
  • Catch exit code for the xrdcp command and use it to generate the exit code of the script (Andrea Piccinelli) #12058

Enhancements

  • make documentation a list of the code review (Alan Malta Rodrigues)
  • Add first draft of wmagent-component-standalone script (Todor Ivanov) #12024
  • Fix patchComponent logic to find the correct python lib && Add logic to zero the codebase before every patch to the version/tag deployed at the host (Todor Ivanov) #12074
  • Remove CMSCouch functions unused since 2016, see #6928 (Fredrik) #12053

WMCore 2.3.6 production central services release

10 Oct 19:34
Compare
Choose a tag to compare

With this release, Release Validation output data placement has been disabled in MSOutput (now relying on Rucio subscriptions).
It also disables alerts for Rucio containers not found in Rucio server. Additionally, the base python docker image has been updated,
bringing Python version to version 3.8.20 (from 3.8.16). As a requirement, dbs3-client package has also been updated.

Release date: 10 October 2024.
Changes since release: 2.3.5.

Central services

Software stack

Features and/or feature changes

  • Disable relval workflows going to disk by default in MSOutput. (Kenyi Hurtado) #12090
  • Fix dbs3-client to version 4.0.19 and update imports for Microservices (Dario Mapelli) #12102
  • Disable MS-output DID not found (Muhammad Hassan Ahmed) #12135

Bug Fixes

  • Use PNN to PSN conversion in WQE mapping (Kenyi Hurtado) #12094

Enhancements

  • New docker wrapper script to manage docker actions on stable images, and its implementation in the CI/CD template (Valentin Kuznetsov) #12005 #12088

WMCore 2.3.5 production central services release

08 Aug 11:19
Compare
Choose a tag to compare

It's been more than 3 months that we do not upgrade WM central services, so here we go!
This production release brings in some important bug-fixes to some WM microservices and global workqueue.
In addition, the PyPi CD pipeline has been updated to use 2FA and Trusted Publishers.

Release date: 8 August 2024.
Changes since release: 2.3.2.

Central services

Software stack

  • Update CD pipeline to used trusted publishers (Alan Malta Rodrigues) #12071

Features and/or feature changes

  • Remove dependency on deprecated method xml..getchildren (Todor Ivanov) #11984
  • MSOutput: Do not consider data size for Tape data placement based on dm_weight (Alan Malta Rodrigues) #12045

Bug Fixes

  • MSPileup: Validate Pileup document with expectedRSEs instead of currentRSEs (Alan Malta Rodrigues) #11995
  • MSPileup: Check status of attachDIDs API and act upon it (Valentin Kuznetsov) #11995
  • MSPileup: Typo for customName in MSPileupData module (Fredrik) #12052
  • WorkQueue: convert PNN to PSN for pileup location (Alan Malta Rodrigues) #12066

Enhancements

  • Unused older Unified functions removed (Fredrik) #12047
  • Fix log message misspellings (Fredrik) #12054

WMCore 2.3.3 production WMAgent release

01 May 18:36
Compare
Choose a tag to compare

This WMAgent release has a major change affecting stage in/out, where storage JSON has been adopted and the XML format is now deprecated.
In addition, it has a fully functional WorkflowUpdater component, which continuously updates workflow sandboxes with up-to-date pileup information (artifact of the partial pileup feature). On this pileup feature, the agent no longer resolves pileup data location via Rucio, it now fetches this information solely from the MSPileup service.
Furthermore, the Lexicon file has also been updated to support datatiers with digits (e.g. L1SCOUT). Another feature added is the support to EL9 workflows and the automatic detection of container OS during runtime.
Lastly, a few important bugfixes and enhancements are provided with this release.

Release date: 30 April 2024.
Changes since release: 2.3.1.

WMAgent

Software stack

Features and/or feature changes

  • Created MSPileupUtils module and modified GQ CherryPy thread to use MSPileup instead of Rucio (Dennis Lee) #11870
  • New function to get dataset locations from MSPileup and its implementation in StartPolicyInterface #11620 (Andrea Piccinelli) #11879
  • add support for rhel9. Fix #11985 (Stefano Belforte) #11897
  • Add pileup availability logic in WorkflowUpdater component (Valentin Kuznetsov) #11884
  • Abandon storage.xml in favor of storage.json for stage-in and stage-out (nhduongvn) #11869 #11917
  • Initial implementation of DBSConcurrency module (Valentin Kuznetsov) #11913
  • Allow digits in data tier names (German Giraldo) #11930
  • Fix regex for Spec generation with alphanumerical data tiers (German Giraldo) #11951
  • Add new Merge Repack special cases to AccountantWorker (German Giraldo) #11952
  • Allow numbers in DataTier fields in all regular expressions (Todor Ivanov) #11962
  • [Container] Addition of the sites with HPC resources for the WMAgent docker deployment (Andrea Piccinelli) #11966

Bug Fixes

  • Dont cross check location when data is reported without location (Alan Malta Rodrigues) #11878
  • Discover the singularity container OS at runtime. (Todor Ivanov) #11896
  • Decode cmsRun stdout (German Giraldo) #11933
  • Fix print function in WorkflowUpdater (Alan Malta Rodrigues) #11971
  • Add DBS3_READER_URL and WorkflowUpdater.dbsUrl (Valentin Kuznetsov) #11972
  • Add check for creating updateBlockInfo return dictionary (Author: Todor Ivanov) #11975

Enhancements

  • Increase JobSubmitter queue size from 50k to 100k (Alan Malta Rodrigues) #11889

WMCore 2.3.2 production central services release

30 Apr 03:10
Compare
Choose a tag to compare

This central services release provides full functionality for partial pileup, including the relevant changes on the agent side with the WorkflowUpdater component. In addition, we have refactored the Rucio RSE expression for Tape output data placement, which now uses dm_weight attribute instead of the legacy ddm_quota. In addition, datatiers with digits are now supported across the system (e.g. L1SCOUT).

Release date: 30 April 2024.
Changes since release: 2.3.1.

Central services

Software stack

Features and/or feature changes

  • Use MSPileupUtils getPileupDocs in MSTransferor, WorkflowUpdaterPoller (Dennis Lee) #11910
  • Provide concurrent implementation for REST jobdetail API (Alan Malta Rodrigues) #11885
  • Revisit logic of transition records, put code into stand-alone function (Valentin Kuznetsov) #11921
  • Update Tape RSE attribute from ddm_quota to dm_weight in MSOutput (Alan Malta Rodrigues) #11940

Bug Fixes

  • Fix interval for WMStats DataCacheUpdate CherryPy thread (Alan Malta Rodrigues) #11924
  • add BossAir.Plugins.BasePlugin modules to crabtaskworker deps (Thanayut Seethongchuen) #11926
  • Use POST method for getting pileup documents in MSTransferor (Alan Malta Rodrigues) #11936
  • Adjust scopes use in MSPileup (Valentin Kuznetsov) #11938
  • Fix issue with customDID (Valentin Kuznetsov) #11948

Enhancements

  • Added MSPileupUtils getPileupDocs mock and emulator (Dennis Lee) #11905
  • Add code to insert transition record; code re-factoring (Valentin Kuznetsov) #11947

WMAgent

Features and/or feature changes

  • Add pileup availability logic in WorkflowUpdater component (Valentin Kuznetsov) #11884
  • Abandon storage.xml in favor of storage.json for stage-in and stage-out (nhduongvn) #11869 #11917
  • Initial implementation of DBSConcurrency module (Valentin Kuznetsov) #11913
  • Allow digits in data tier names (German Giraldo) #11930
  • Fix regex for Spec generation with alphanumerical data tiers (German Giraldo) #11951
  • Add new Merge Repack special cases to AccountantWorker (German Giraldo) #11952
  • Allow numbers in DataTier fields in all regular expressions (Todor Ivanov) #11962
  • [Container] Addition of the sites with HPC resources for the WMAgent docker deployment (Andrea Piccinelli) #11966

Bug Fixes

  • Decode cmsRun stdout (German Giraldo) #11933
  • Fix print function in WorkflowUpdater (Alan Malta Rodrigues) #11971
  • Add DBS3_READER_URL and WorkflowUpdater.dbsUrl (Valentin Kuznetsov) #11972
  • Add check for creating updateBlockInfo return dictionary (Author: Todor Ivanov) #11975

Enhancements

WMCore 2.3.1 production central services release

21 Feb 13:44
Compare
Choose a tag to compare

This release brings in full functionality for partial pileup data placement, note however that it requires further developments and deployment of a new WMAgent release before it can be adopted in operations.
We have also refactored pileup data location across WM services, now relying solely on MSPileup. In addition, DQMHarvest workflows will now have a full container input data placement, followed by the relevant changes in the WorkQueue Dataset start policy.
On the WMAgent side, the system now support workflows requesting EL9 Operating System (and their variations). Default ScramArch has been removed for Cleanup jobs, which now auto-discover the OS+Arch and bootstrap the code accordingly.

Release date: 21 Feb 2024.
Changes since release: 2.2.6.

Central services

Software stack

Features and/or feature changes

  • Implement partialPileupTask task logic (Valentin Kuznetsov) #11807
  • Python (standard library) implementation of update pileup object script (Valentin Kuznetsov) #11872
  • Created MSPileupUtils module and modified GQ CherryPy thread to use MSPileup instead of Rucio (Dennis Lee) #11870
  • New function to get dataset locations from MSPileup and its implementation in StartPolicyInterface #11620 (Andrea Piccinelli) #11879
  • Make container level data placement for DQMHarvest; update Start.Policy.Dataset to container too (Alan Malta Rodrigues) #11894

Bug Fixes

  • Dont cross check location when data is reported without location (Alan Malta Rodrigues) #11878
  • Parse MSPileup filter as string if only listing one (Dennis Lee) #11886
  • Correctly pass dbsUrl to the locationsFromMSPileup method (Todor Ivanov) #11906
  • Fix variable name in log record; complement to #11879 (Alan Malta Rodrigues) #11908

Enhancements

  • providing a checklist for code review (Alan Malta Rodrigues)

WMAgent

Features and/or feature changes

  • Created MSPileupUtils module and modified GQ CherryPy thread to use MSPileup instead of Rucio (Dennis Lee) #11870
  • New function to get dataset locations from MSPileup and its implementation in StartPolicyInterface #11620 (Andrea Piccinelli) #11879
  • add support for rhel9. Fix #11985 (Stefano Belforte) #11897

Bug Fixes

  • Discover the singularity container OS at runtime. (Todor Ivanov) #11896

Enhancements

  • Increase JobSubmitter queue size from 50k to 100k (Alan Malta Rodrigues) #11889

WMAgent 2.3.0 WMAgent production release

18 Jan 03:01
Compare
Choose a tag to compare

This version brings in an initial implementation of a new agent component, called WorkflowUpdater, which will be used to continuously update the pileup location and the workflow sandbox.
It also changes the behavior of site disallowed list, which now will be enforced across all tasks of a workflow. Concerning the job runtime, PSet tweaks have been made more verbose and we now dump some basic information about the environment and cmsRun steps, to be in the future used for job customization.
This release has some other important bug fixes and overall enhancements of the agent.

Release date: 17 January 2024.
Changes since release: 2.2.5.

WMAgent

Software stack

Features and/or feature changes

  • Inherit siteLists from upper level task while creating WMBS subscriptions (Todor Trendafilov Ivanov) #11724
  • Initial implementation for WorkflowUpdater component (Alan Malta Rodrigues) #11795 #11859
  • Give priority to older workflows when fetching from database for JobSubmitter (German Giraldo) #11804
  • Add loging and decode output of pre-scripts (German Giraldo) #11803
  • Add a generic script for deploying wmagent inside a virtual environment. (Todor Ivanov) #11624
  • Add runtime information json. (Kenyi Hurtado) #11812

Bug Fixes

  • Fix logic for updating task-level site thresholds (Alan Malta Rodrigues) #11776
  • Fix ChangeState logic for limiting number of docs to commit in bulk (Alan Malta Rodrigues) #11786
  • Do not fail workflows due to empty location in global workqueue (Valentin Kuznetsov) #11810 #11862
  • Deal with invalid blocks in WMBS for Dataset start policy (Alan Malta Rodrigues) #11838

Enhancements

  • Bump deploy-wmagent script to version 2.2.5; insert T3_US_Ookami (Alan Malta Rodrigues) #11766
  • Fix for 11239: Replaced instances of logging.warn to logging.warning (Dennis Lee) #11785
  • Make DBS3Upload slightly more verbose (Alan Malta Rodrigues) #11768
  • Fixed Inappropriate Logical Expression (fabihatasneem) #11808

WMCore 2.2.6 production central services release

16 Jan 11:39
Compare
Choose a tag to compare

Despite not being yet available to the end users, initial changes for supporting partial pileup have been integrated into this release. The StepChain parentage CherryPy thread has been refactored to support partial block resolution and to be more resilient. MSUnmerged had some performance related improvements. Cleanup of unneeded rules (MSRuleCleaner) is no longer dependent on the status of the StepChain parentage resolution. Global WorkQueue will no longer fail workflows that have an incomplete input data placement and/or in cases where the data has not arrived at the final destination, instead it relies on a continuous data location daemon. Lastly, this release brings in many WMAgent and central services improvements.

Release date: 16 Jan 2024.
Changes since release: 2.2.4.

Central services

Software stack

  • Replace svn by git in GitHub actions (Alan Malta Rodrigues) #11858

Features and/or feature changes

  • Refactor StepChainParentage thread to resolve by workflow (Alan Malta Rodrigues) #11694
  • Deal with partial block in the parentage fix (Alan Malta Rodrigues) #11757 #11779
  • Remove unused Unified configuration and code; add schema and validation (Alan Malta Rodrigues) #11770
  • MSUnmerged: Try to remove the base directory first and avoid recursive operations (#11781) (Todor Ivanov) #11781
  • MSUnmerged: Handle gfal exceptions while listing baseDirEntry && Avoid extra stat operations during recursion. (#11794) (Todor Ivanov) #11794
  • MSPileup: Add support for customName in MSPileup (Valentin Kuznetsov) #11765
  • MSPileup: Adjust to use customName along with pileupName (Valentin Kuznetsov) #11769
  • MSPileup: Introduce transition attribute in MSPileup record (Valentin Kuznetsov) #11802
  • MSRuleCleaner: Moved ParentageResolved check from dispatch to archive (Dennis Lee) #11805
  • MSTransferor: Ensure MSTransferor does not request more copies than RSEs available (Alan Malta Rodrigues) #11844

Bug Fixes

  • Do not fail workflows due to empty location in global workqueue (Valentin Kuznetsov) #11810 #11862

Enhancements

  • Update test json templates with MINIAODSIM; fix DQMHarvest ; GPU StepChain; ReReco 2022C; etc (Alan Malta Rodrigues) #11238
  • Fix for 11239: Replaced instances of logging.warn to logging.warning (Dennis Lee) #11785
  • Add MSPileup and refactor MongoDB in the WM Schematic (Alan Malta Rodrigues) #11789
  • New test campaigns added to parseUnifiedCampaigns (Alan Malta Rodrigues) #11806
  • Added aborted-completed to the state transition diagram (Alan Malta Rodrigues) #11809

WMAgent

Features and/or feature changes

  • Inherit siteLists from upper level task while creating WMBS subscriptions (Todor Trendafilov Ivanov) #11724
  • Initial implementation for WorkflowUpdater component (Alan Malta Rodrigues) #11795 #11859
  • Give priority to older workflows when fetching from database for JobSubmitter (German Giraldo) #11804
  • Add loging and decode output of pre-scripts (German Giraldo) #11803
  • Add a generic script for deploying wmagent inside a virtual environment. (Todor Ivanov) #11624
  • Add runtime information json. (Kenyi Hurtado) #11812

Bug Fixes

  • Fix logic for updating task-level site thresholds (Alan Malta Rodrigues) #11776
  • Fix ChangeState logic for limiting number of docs to commit in bulk (Alan Malta Rodrigues) #11786
  • Do not fail workflows due to empty location in global workqueue (Valentin Kuznetsov) #11810 #11862
  • Deal with invalid blocks in WMBS for Dataset start policy (Alan Malta Rodrigues) #11838

Enhancements

  • Bump deploy-wmagent script to version 2.2.5; insert T3_US_Ookami (Alan Malta Rodrigues) #11766
  • Fix for 11239: Replaced instances of logging.warn to logging.warning (Dennis Lee) #11785
  • Make DBS3Upload slightly more verbose (Alan Malta Rodrigues) #11768
  • Fixed Inappropriate Logical Expression (fabihatasneem) #11808

WMAgent 2.2.5 WMAgent production release

13 Oct 18:24
Compare
Choose a tag to compare

Most of the changes in this cycle have been integrated into WMAgent. Starting with grid jobs, that now carry out two new job classads:

  • CMS_extendedJobType: which is used to characterize the physics task type of the job (StepChain could have a comma separated list).
  • CMS_CampaignName: which now propagates the request high level description all the way to the job (StepChain could have a comma separated list).

More on the WMAgent job monitoring, it now propagates all of the CMSSW FJR performance metrics to MonIT (through WMArchive index).
In addition, there are two new sections with performance metrics that are uploaded to WMArchive, they are:

  • WMTiming: it contains timing information captured by the job wrapper, thus relative to the whole grid job.
  • WMCMSSWSubprocess: it contains timing information related to a given cmsRun step executed by the job.

Release date: 12 October 2023.
Changes since release: 2.2.3.1.

WMAgent

Software stack

Features and/or feature changes

  • Add characterization and propagation of task type based on cmsDriver step arguments (Kenyi Hurtado Anampa) #11680
  • Add campaign name attribute to base WMTask object and propagate it to the job level as a classad (Kenyi Hurtado Anampa) #11710 #11760
  • Add campaign names support for stepchain workflows (Kenyi Hurtado Anampa) #11738
  • Add CMSSW metrics to FJR (Valentin Kuznetsov) #11663
  • Add CMSSW performance metrics to WMArchive document (Valentin Kuznetsov) #11696
  • Adds WMCMSSWSubprocess metrics to FJR document (Valentin Kuznetsov) #11716
  • Add WMCMSSWSubprocess and WMTiming metrics to WMArchive document (Valentin Kuznetsov) #11692
  • Provide in the FWJR the CPU and wallclock time for CMSSW subprocesses (Valentin Kuznetsov) #11665
  • Provide timestamps metrics about WM job (Valentin Kuznetsov) #11656 #11726
  • Change default rucio pileup account (khurtado) #11673

Bug Fixes

  • Parse /proc//smaps_rollup if present && Reduce string concatenation operations (Todor Ivanov) #11676

Enhancements

  • Fix use of input function in unregister-wmstats script (Alan Malta Rodrigues) #11688

WMCore 2.2.4 production central services release

04 Oct 08:26
Compare
Choose a tag to compare

This release supports two new workload attributes: physics task type and campaign name, to properly characterize those on the worker nodes via condor job classads.
In addition, new performance metrics have been added to the job report file and propagated all the way to WMArchive. They are called WMTiming and WMCMSSWSubprocess, in addition to all of the CMSSW performance metrics that are now fetched from the Framework Job Report and published all the way to WMArchive.
It also includes a few bug fixes and usual code enhancements.

Release date: 4 Oct 2023.
Changes since release: 2.2.2.

Central services

Software stack

  • Update requirements for HTCondor 10.2.3 (Alan Malta Rodrigues) #11691

Features and/or feature changes

  • Add characterization and propagation of task type based on cmsDriver step arguments (Kenyi Hurtado Anampa) #11680
  • Add campaign name attribute to base WMTask object and propagate it to the job level as a classad (Kenyi Hurtado Anampa) #11710 #11760
  • Add campaign names support for stepchain workflows (Kenyi Hurtado Anampa) #11738
  • Adopt wmcore_pileup Rucio account in workqueue (Alan Malta Rodrigues) #11670

Bug Fixes

  • Update McM client (Geovanny Gonzalez-Rodriguez) #11672

Enhancements

  • Retry svn checkout in the CD pipeline up to 5 times (Alan Malta Rodrigues) #11752
  • Changes in src/Utils and test/Utils_t to remove py2 compatibilities (anpicci) #11618

WMAgent

Features and/or feature changes

  • Add CMSSW metrics to FJR (Valentin Kuznetsov) #11663
  • Add CMSSW performance metrics to WMArchive document (Valentin Kuznetsov) #11696
  • Add additional variables to WMAgent.secrets needed for the docker container intialization process. (Todor Ivanov) #11717
  • Adds WMCMSSWSubprocess metrics to FJR document (Valentin Kuznetsov) #11716
  • Add WMCMSSWSubprocess and WMTiming metrics to WMArchive document (Valentin Kuznetsov) #11692
  • Provide in the FWJR the CPU and wallclock time for CMSSW subprocesses (Valentin Kuznetsov) #11665
  • Provide timestamps metrics about WM job (Valentin Kuznetsov) #11656 #11726
  • Change default rucio pileup account (khurtado) #11673

Bug Fixes

  • Parse /proc//smaps_rollup if present && Reduce string concatenation operations (Todor Ivanov) #11676

Enhancements

  • Fix use of input function in unregister-wmstats script (Alan Malta Rodrigues) #11688