-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GPU support to the StepChain spec #11588
Conversation
Jenkins results:
|
Even though it's well covered by unit tests, I still want to run some real tests with this patch in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @amaltaro
I left only one comment inline. Please take a look.
currentCmsswStepHelper.setNumberOfCores(multicore, eventStreams) | ||
|
||
# GPU settings | ||
gpuRequired = self.requiresGPU | ||
gpuParams = json.loads(taskConf.get('GPUParams', None)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is dangerous. It may raise the following exceptions:
TypeError: the JSON object must be str, bytes or bytearray, not NoneType
- if theGPUParams
is missingJSONDecodeError: Expecting value: line 1 column 1 (char 0)
- if the keyGPUParams
is present but is an empty or non json formatted string
If we have those cases covered in some earlier checks that's ok, but I'd stay on the safe side and would do something like:
if taskConf.get('GPUParams', None):
gpuParams = json.loads(taskConf['GPUParams'])
else:
gpuParams = json.loads(self.gPUParams)
which would also cover the check from two lines bellow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch Todor! Let me set it to the json encoded representation of None, which is 'null'
.
Jenkins results:
|
gpuParams = json.loads(taskConf.get('GPUParams', 'null')) | ||
if taskConf.get('RequiresGPU', None): | ||
gpuRequired = taskConf['RequiresGPU'] | ||
if "GPUParams" not in taskConf: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @amaltaro,
But in this line here, you may still hit exactly the same set of exceptions, because the 'GPUParams'
key may be present in taskConf
but have an empty string as a value. Maybe we rely on some previous validation for not having empty or misconfigured values for those parameters... I think the safer way would be to have all checks related to the gpuParams
enclosed in a single if/then/else
block. But anyway, up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be covered by this attribute definition:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WMSpec/StdSpecs/StdBase.py#L1228
which already does the data type validation as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything seems fine to me
fix json.loads Get helper function for CMSSW/step gpu parameters
Jenkins results:
|
This has been tested and it is functional. Meanwhile, new discussions revealed new requirements at the job matchmaking, and it was also confirmed how to deal with different GPU requirements in different steps of the same job (StepChain). |
Fixes #10401
Status
ready
Description
With this PR we will be able to execute StepChain GPU workflows. A few important remarks about this PR are:
Is it backward compatible (if not, which system it affects?)
NO (new feature)
Related PRs
Definition of parameters and supported values are provided in this very first GH issue: #10388
External dependencies / deployment changes
Even though it is a Request Manager change, I suspect that WMAgent will need to have the relevant WMTask/WMStep/CMSSW changes in.