Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training: Data flow/acquisition in WM #11846

Closed
amaltaro opened this issue Jan 9, 2024 · 8 comments
Closed

Training: Data flow/acquisition in WM #11846

amaltaro opened this issue Jan 9, 2024 · 8 comments
Assignees

Comments

@amaltaro
Copy link
Contributor

amaltaro commented Jan 9, 2024

Impact of the new feature
WM in general

Is your feature request related to a problem? Please describe.
Sharing knowledge and empowering WM/Ops to understand and debug the system. Sub-task of #11837

Describe the solution you'd like
Topic of this ticket is: Data flow/acquisition in WM. This involves the workflow states, a sequence of steps/services dealing with them and an in-depth look at how workflows are acquired within WM.

For the expected solution, please refer to the meta-issue for further details (in short: documentation + recorded presentation).

Describe alternatives you've considered
None

Additional context
None

@vkuznet
Copy link
Contributor

vkuznet commented Feb 5, 2024

Alan, can you provide more details how you envision this training to be done. Am I correct that it is relevant to this document: https://gitlab.cern.ch/dmwm/wmcore-docs/-/blob/master/docs/wmcore/Request-Status.md?ref_type=heads or do you see a different data flow in WM?

@amaltaro
Copy link
Contributor Author

amaltaro commented Feb 5, 2024

That link explains the state transitions, so it will be useful.
However, when it comes to data acquisition in the system, this is likely a better resource: https://cms-wmcore.docs.cern.ch/wmcore/workflow-for-job-creation-and-submission/

@amaltaro
Copy link
Contributor Author

amaltaro commented Feb 5, 2024

This WorkQueueManager documentation is useful as well: https://cms-wmcore.docs.cern.ch/wmcore/WorkQueueManager/

@vkuznet
Copy link
Contributor

vkuznet commented Feb 6, 2024

I reviewed the docs you pointed out and I find that they require (at least for me) the following clarification:

  • We should start who and how place a request for data processing
  • Then, we must define each system (ReqMgr2, GlobalQueue, LocalQueue, WMBSm, JobSubmitter Cache and GlideIn(Global pool)), its presence in the infrastructure and its scope.
  • Them we should define WMAgent components and their role in processing workflows
  • Finally, present example of placing a request and monitoring its progress through the system

Please let me know your thoughts on how to proceed with this training issue and how you envision it.

@amaltaro
Copy link
Contributor Author

amaltaro commented Feb 6, 2024

Valentin, I think you captured it well. I am concerned about the last 2 bullets though and I think those should be discussed in their own training session.

We need to have a narrow scope for this training, otherwise it will become too hard to prepare and will probably not provide the best training experience as well.

@vkuznet
Copy link
Contributor

vkuznet commented Feb 7, 2024

Here is initial PR https://gitlab.cern.ch/dmwm/wmcore-docs/-/merge_requests/5 of first draft for this training.

@amaltaro amaltaro moved this from Todo to In Progress in WMCore quarterly developments Feb 19, 2024
@amaltaro
Copy link
Contributor Author

Valentin, as we just discussed over Zoom, these are points that would be great to touch base in this training. It assumes previous basic knowledge of the WM (which we might find from recent recordings in indico).

  1. (optional) We should start who and how place a request for data processing
  2. Conditions to have the workflow transitioning from assigned to staging (MSTransferor + MSPileup + Campaign + WF description)
  3. Conditions to have the workflow transitioning from staging to staged (MSMonitor + Campaign)
  4. Workflows go into acquired without really any conditions
  5. Conditions to have this workflow (workqueue elements) acquired by any given agent (LQ)
    *** Starting here, debugging becomes specific to ONE agent
  6. Conditions to have this workflow (LQ workqueue elements) acquired by the agent (WMBS)
  7. JobCreator will create these jobs, no special conditions
  8. Conditions to have the freshly created jobs into the JobSubmitter queue
  9. Conditions to have jobs submitted to condor by JobSubmitter
  10. Conditions to have condor jobs into running state
  11. Conditions to getting a workflow in completed status
    *** For running and (potentially) completed workflows
  12. Conditions for getting output data into Rucio (dbsbuffer tables)
  13. Conditions for getting output data into DBS (dbsbuffer tables)

@amaltaro
Copy link
Contributor Author

Documentation and training session took place yesterday, Apr/9. Indico with all the references and recording can be found here. I am closing this issue out.

@github-project-automation github-project-automation bot moved this from In Progress to Done in WMCore quarterly developments Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

2 participants