-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SiteList actions to transient statuses in ReqMgr2 #12077
Add SiteList actions to transient statuses in ReqMgr2 #12077
Conversation
Jenkins results:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@todor-ivanov , the current changes are solely giving permissions to update those 2 attributes in the relevant workflow statuses.
However, we still have to deal with the "setter" functions, such that we can properly update the WMWorkload object.
This is the entry port for updating a workflow configuration (without any state changes):
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/ReqMgr/Service/Request.py#L403
then it would have to call the setter functions defined in this module, e.g.:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WMSpec/WMWorkload.py#L676
thanks @amaltaro This helps. I was still developing and on the stage of testing the current minimal change. (Faced some bumps on the road with my long forgotten central services environment though, but this was another story). |
@amaltaro while reading and investigating the code surrounding this change here, I tried to understand this comment in the code:
And it seems to be related to this PR and this particular comment of yours: But the main reason imposing the need of this 'single workflow parameter change' seems to be hidden in this PR (according to your own words in the comment above): It seems now is the time for this to be revisited. Do you remember what exactly were the requirements and why couldn't it be solved back then? And if those are still a factor? |
Todor, in addition to the discussion we had today - and now that I properly read the pointers you provided above - I wanted to say that there is a different payload between ReqMgr2 interaction through script vs web UI.
This is probably not relevant, but just so you know what payload the server can receive. Additionally, it would be ideal to be able to say whether a parameter actually changed or not. In other words, we do not want to update Site lists unless there is really an update to the content. |
Jenkins results:
|
Jenkins results:
|
Hi @amaltaro
|
And here is the reason for the above error. From the [1]
|
Jenkins results:
|
And here (from my last commit f82b8cc) are both the cause and the fix:
Unfortunately I noticed, if the request properties are changed through the web interface, we cannot catch the cases when there is no change of the parameter value from the call compared to what is already in couchDB, and so ReqMgr always goes through the full set of modification calls, rather than only through those relevant to the actually changed parameters. This behaviour is not observed when I use an interactive instance of |
@todor-ivanov I think it is fine to not touch the web UI. The site lists are usually large and the vast majority interactions are performed through scripts. Changing site lists via Web UI can be left for a future development IMO. |
hi @amaltaro :
I'll need examples. BTW the WebUI works just fine - it executes correctly. And the WebUI is actually not a mandatory part, but I look at it rather like a sign that the sequence in which I call the validation routines/methods could be wrong. It would be interesting to understand what is going on before we close the issue. And those scripts, which you mention, I suspect they also use REST calls similar to the web UI, but I'll need to know the entry points they use in order to confirm things work just fine. So far this is what I use for these interactive investigations and tests: https://github.com/todor-ivanov/WMCoreAux/blob/master/bin/WMCStandalone/init.py |
For what concerns getting/creating/updating a request, it is always the same AFAIR, the main difference between Web UI and someone's script can be that WebUI always fetches the original document. In other words, it could be that user A makes a change to a request via script by simply providing the RequestName and SiteWhitelist key/value pairs. While user B makes a change to a request via Web UI, but the whole dictionary is sent back to the backend service, even though the user only wanted to update SiteWhitelist. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Todor, please find some comments/notes along the code.
# The first time it was called was during the validation of the REST call. | ||
if 'RequestName' not in reqArgsDiff: | ||
reqArgsDiff['RequestName'] = workload.name() | ||
_, reqArgsDiff = validate_request_update_args(reqArgsDiff, self.config, self.reqmgr_db_service, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed today, I understand this extra validation is going away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @amaltaro , I checked twice and it is actually all good. Upon properly patching my cluster I can see it works as expected. So my concerns about having this extra check here were at end irrelevant. Here are the reqmgr
logs from my test cluster while using the web interface for two consecutive attempts to change the same set of request parameters. As can be seen clearly, on the second attempt nothing was about to be changed. And this check happens at this exact piece of code, while if I remove this section, then we experience the ill-behavior of changing all request parameters upon the call.
[06/Sep/2024:12:12:12] reqmgr2-88b98c6b5-cxhkq 127.0.0.1 "GET /reqmgr2/data/info HTTP/1.1" 200 OK [data: 296 in 662 out 50211 us ] [auth: OK "" "" ] [ref: "" "Go-http-client/1.1" ]
[06/Sep/2024:12:12:17] Updating request "tivanov_SC_LumiMask_Rules_SiteListsTest_v4_240830_135942_6572" with these user-provided args: {'RequestPriority': 3, 'SiteWhitelist': ['T2_CERN_CH', 'T1_US_FNAL'], 'SiteBlacklist': ['T2_CERN_CH', 'T1_US_FNAL']}
[06/Sep/2024:12:12:17] Updated priority of "tivanov_SC_LumiMask_Rules_SiteListsTest_v4_240830_135942_6572" to: 3
[06/Sep/2024:12:12:17] Updated SiteWhitelist of "tivanov_SC_LumiMask_Rules_SiteListsTest_v4_240830_135942_6572", with: ['T2_CERN_CH', 'T1_US_FNAL']
[06/Sep/2024:12:12:17] Updated SiteBlacklist of "tivanov_SC_LumiMask_Rules_SiteListsTest_v4_240830_135942_6572", with: ['T2_CERN_CH', 'T1_US_FNAL']
[06/Sep/2024:12:12:17] reqmgr2-88b98c6b5-cxhkq 188.185.19.4:37334 "PUT /reqmgr2/data/request/tivanov_SC_LumiMask_Rules_SiteListsTest_v4_240830_135942_6572 HTTP/1.1" 200 OK [data: 2278 in 108 out 446906 us ] [auth: ok "/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=tivanov/CN=726807/CN=Todor Trendafilov Ivanov" "" ] [ref: "https://cmsweb-test1.cern.ch" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36" ]
[06/Sep/2024:12:12:43] reqmgr2-88b98c6b5-cxhkq 127.0.0.1 "GET /reqmgr2/data/info HTTP/1.1" 200 OK [data: 296 in 662 out 65762 us ] [auth: OK "" "" ] [ref: "" "Go-http-client/1.1" ]
[06/Sep/2024:12:12:44] Updating request "tivanov_SC_LumiMask_Rules_SiteListsTest_v4_240830_135942_6572" with these user-provided args: {'RequestPriority': 3, 'SiteWhitelist': ['T2_CERN_CH', 'T1_US_FNAL'], 'SiteBlacklist': ['T2_CERN_CH', 'T1_US_FNAL']}
[06/Sep/2024:12:12:44] Nothing to be changed at this stage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed it and tested without ti still works .. but as I mentioned in our chat yesterday.... the redundant parameters checks won't work properly if one does not go through the whole convoluted REST stack for WMCore.
reqArgsDiff['RequestName'] = workload.name() | ||
_, reqArgsDiff = validate_request_update_args(reqArgsDiff, self.config, self.reqmgr_db_service, None) | ||
|
||
def _reduceReport(report): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understood this function correctly, it is supposed to parse a list of dictionaries with outcome of CouchDB operations and decide whether the request was successful or not.
I don't think we need it, but if I am wrong, then I would suggest moving it to src/python/Utils and creating proper unit tests for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
validate_request_priority(reqArgsDiff) | ||
# must update three places: GQ elements, workload_cache and workload spec | ||
self.gq_service.updatePriority(workload.name(), reqArgsDiff['RequestPriority']) | ||
report.append(self.reqmgr_db_service.updateRequestProperty(workload.name(), reqArgsDiff, dn)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be calling updateRequestProperty
multiple times for each different argument, as that will generate separated HTTP requests and be source for failures and inconsistencies.
Having a single update call keeps it simple as well and does not require the reduce custom function that you implemented above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was considering updating it once per every parameter instead..... with a call of the form:
report.append(self.reqmgr_db_service.updateRequestProperty(workload.name(), {reqArg: reqArgsDiff[reqArg]}, dn))
Such that we go progressively with those workflow changes and update as much as we can at couch before we give up.... but anyway we can do it in a single go as well. I'll change that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and actually I am not sure if when updatgin request statistics we need to call: reqmgr_db_service.updateRequestProperty
, in the previous implementation this was skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
about this:
Having a single update call keeps it simple as well and does not require the reduce custom function that you implemented above
We would actually still need it, because we are aggregating reports from two different calls to couchdb:
reqmgr_db_service.updateRequestStats
reqmgr_db_service.updateRequestProperty
report.append(self.reqmgr_db_service.updateRequestProperty(workload.name(), reqArgsDiff, dn)) | ||
workload.setSiteWhitelist(reqArgsDiff["SiteWhitelist"]) | ||
cherrypy.log('Updated SiteWhitelist of "{}", with: {}'.format(workload.name(), reqArgsDiff['SiteWhitelist'])) | ||
elif reqArg == "SiteBlacklist": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though the setter functions that you are calling are correct, note that this if-elif
will result in a single if block getting executed. Whereas an user can provide multiple arguments to be updated in the same HTTP call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alan, you are inside a for
cycle here. Every check you do at this stage is always a single possible match during one iteration of the cycle. So it actually spares you extra/redundant steps, because the moment the parameter matches a check, we take the relevant action and we get out of the elif
sequence to continue with the next iteration of checks for the next request parameter from the for
cycle. The other possibility is to use multiple independent if
statements, which would have exactly the same effect, but at the cost of iterating through all checks every time, regardless if we have found a match or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I overlooked the fact that it was inside a for loop.
I see you have separated the call to self.reqmgr_db_service
, which can be either for updateRequestStats
or for updateRequestProperty
. Those are exclusive, so there is no need to deal with report
as a list and provide the extra complexity with the reduceReport function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are exclusive, so there is no need to deal with report as a list
I did not get this comment. We do need to accumulate the reports in a list exactly because of the calls to different methods from the reqmgr_db_service
. All of them use the same upstream mechanisms and functionalities from couchdb, so the set of report values from both types of calls are similar, ergo applying identical criteria for checking the reports from all of them is feasible. And the fact that we can get a request for changing multiple request parameters at once, leads to the need of those reports being accumulated in a single list and later crosschecked if any of the couchdb calls failed. So to me this is the logical and sensible approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Todor, I followed your code once again in a clean view (without GH diff):
https://github.com/dmwm/WMCore/blob/0da38322dfb816bd00b0df079cf24d58a9060966/src/python/WMCore/ReqMgr/Service/Request.py
and I do not see multiple requests being consumed in any moment. It is always returned as a single call report! So please remove the call to reduceReport() in (currently) L455.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well ... before we start this whole process of changing everything back and forth multiple times ... the report results were accumulated while iterating through the request parameters. They have never been intended to be accumulated through iteration over multiple request - because it simply never happens. But none of this matters any more, because as I sent you in a private chat this whole for
cycle here I removed in my next PR, where I make this whole set of calls to the workload
and global workque
a single action. Which is actually the main challenge of the next issue to fight in this line of coding. The whole thing is reduced to just two lines like this:
https://github.com/dmwm/WMCore/pull/12096/files#diff-120ee6838284a3d1c1799f511da7f147179d0a955f87d0da6fc8b58a8b66c794R428-R462
# reqArgsNothandled = []
# for reqArg in reqArgsDiff.copy():
# if reqArg == 'RequestPriority':
# validate_request_priority(reqArgsDiff)
# # must update three places: GQ elements, workload_cache and workload spec
# self.gq_service.updatePriority(workload.name(), reqArgsDiff['RequestPriority'])
# workload.setPriority(reqArgsDiff['RequestPriority'])
# cherrypy.log('Updated priority of "{}" to: {}'.format(workload.name(), reqArgsDiff['RequestPriority']))
# elif reqArg == "SiteWhitelist":
# workload.setSiteWhitelist(reqArgsDiff["SiteWhitelist"])
# cherrypy.log('Updated SiteWhitelist of "{}", with: {}'.format(workload.name(), reqArgsDiff['SiteWhitelist']))
# elif reqArg == "SiteBlacklist":
# workload.setSiteBlacklist(reqArgsDiff["SiteBlacklist"])
# cherrypy.log('Updated SiteBlacklist of "{}", with: {}'.format(workload.name(), reqArgsDiff['SiteBlacklist']))
# else:
# reqArgsNothandled.append(reqArg)
# cherrypy.log("Uunhandled argument for no-status update: %s" % reqArg)
# TODO: map setters to key names:
# Creating a setter method map - directly in the workload object.
# Update all workload parameters based on the full reqArgsDiff dictionary
workload.updateWorkloadArgs(reqArgsDiff)
# Commit the changes of the current workload object to the database:
workload.saveCouchUrl(workload.specUrl())
# Commit all Global WorkQueue changes per workflow in a single go:
# self.gq_service.updateElementsByWorkflow(workload.name(), reqArgsDiff)
# if reqArgsNothandled:
# msg = "There were unhandled arguments left for no-status update: %s" % reqArgsNothandled
# raise InvalidSpecParameterValue(msg)
P.S. the commented lines I left just for now to point the code evolution. they will all be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I also rearranged the code in this PR such that nothing is persisted in the Database in case of unsuported arguments sent by the user, so that we can merge this and move on.
workload.setSiteBlacklist(reqArgsDiff["SiteBlacklist"]) | ||
cherrypy.log('Updated SiteBlacklist of "{}", with: {}'.format(workload.name(), reqArgsDiff['SiteBlacklist'])) | ||
else: | ||
reqArgsNothandled.append(reqArg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I followed this logic properly, here you are simply gathering the list of arguments received by the server but not acted on. For debugging purposes is Okay, but I would say it's unnecessary to push it upstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I did not put this only for debugging purposes. It is a completely valid use case. If we get a set of request_arguments
sent here including a request parameter which we do not handle, we need to know once we get out of the cycle and properly log this, and eventually signal to the end user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my comment below as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Please fix the typo in the message below then "Uunhandled".
Actually, we might even delete that line and keep only the final one reporting all unhandled/unexpected parameters.
reqArgsNothandled.append(reqArg) | ||
cherrypy.log("Uunhandled argument for no-status update: %s" % reqArg) | ||
|
||
if reqArgsNothandled: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be removed. In practice, we would be potentially persisting changes in the workload object and later on failing the request back to the end user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to disagree... This would be unneeded only if we decide NOT to act on finding a request parameter change which we do not support and silently accept the user's call and never giving a feedback on what we have dropped. Of course this is one way to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you raise this exception, it will get mapped to a http status code >=400, which is error.
So we cannot execute some actions in the workload, and in the end report to the end user that his http request failed. It is inconsistent and misleading, at the very least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, give me your suggestion on this. How would you want to address the situation. To me it is all good. just let me know what is your preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion is not to change this behavior with this change. In other words, if we were not failing an HTTP call that provided us with extra arguments, I would be in favor of keeping that same behavior for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but we are failing it, that's the case. look at the current code:
WMCore/src/python/WMCore/ReqMgr/Service/Request.py
Lines 422 to 424 in 3af3dad
else: | |
msg = "There are invalid arguments for no-status update: %s" % request_args | |
raise InvalidSpecParameterValue(msg) |
The only difference is that we currently do not go and iterate on several parameters, simply because so far we already consider only tow parameters as changeable (which as you said are exclusive) and it is done on a single pass. From now on we will have to fail (raise the exception) again but we simply need to accumulate all non-handled parameters provided by the user and properly report them at the end. I was finding it already quite uncomfortable to have a misleading message with this exception and not giving any details about what is wrong with those arguments - which is redundant or which is more or missing etc .... And exactly because of this I decided ok - lets keep all non handled parameters in a single list during the cycle and report them back to the user(or some script used for the action) at the end so he is pointed to his exact mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, now I see it can have value in knowing which parameters from the user were dropped, so it's Okay to write a record of those in the logs.
However, this behavior is different than the behavior currently implemented in master. In current master, we only fail the HTTP call if the user didn't provide any of the expected parameters (RequestPriority or stats). While with this PR, it fails for any unsupported parameter provided by the end user (even worse, which could be after persisting changes for supported parameters).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amaltaro , this is addressed in my next PR. See my comment above where I point to the lines of change in my next PR. We will again warn the user for the unsupported parameters requested, but we will do it as early as the workload update in memory and we will never persist anything in the database, because it will happen not here, but rather where it need be - in the code for workload update.
else: | ||
reqArgsDiffKeys.append(key) | ||
|
||
# NOTE: We need to Explicitly add RequestStatus to reqArgsDiff, since it is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can assume that RequestStatus
parameter can be changed at any moment against any current status.
Upon requesting a status change, that is anyways validated in a different place with allowed transition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And that different place again loops back to calling this function. Which in the past was not going to ask for allowed actions per status.I did not like to have this workaround in the validation function either. But given that this logic applies not only for RequestStatus
but also to all stat parameters which by themselves are not a valid user defined request parameter I had to add it here. Of course the better way would be to move it up to the ALOWED_ACTIONS_FOR_STATUS
list or even better the whole logic to be moved as an addition to the get modifiable_properties
function and have the ALLOWED_ACTIONS_ALL_STATUS
list properly defined in the RequestStatus.py
file as the rest. Honestly this is my preference if you agree with me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
validate_request_priority(request_args) | ||
if 'RequestPriority' in reqArgsDiff: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is already performed inside the function, so we can keep the previous and shorter implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you refer to the _handleNoStatusUpdate
function, then no, there we check to find out which is the proper set of actions to be taken, here we validate the request priority value sent IIUC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to validate_request_priority
, in the line below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would we need to increase the stack with an extra function call, when we already know at this very stage that a RequestPriority
validation won't be needed. Anyway .. I'll remove it and will let us call the validate_request_priority
constantly
fe03f89
to
a765796
Compare
@amaltaro I have addressed all your comments and the final tests were sucessful. Please take another look |
Jenkins results:
|
a765796
to
e940a00
Compare
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
hi @amaltaro the unit tests failures seem to be not related to the current change. Same for the highlighted pylint notes. Could you please take another look at this code change? |
hi @amaltaro Could we finish this review here. I'd like to switch gears to #12038 and I strongly depend on this change, because the Last time all my tests were successful, so I'd consider this code as good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Todor, I don't think I cover 100% of code changes in my review, but I think there is enough information/material for you to work on.
In addition, please refer to the Jenkins report for a few things that you are expected to resolve in this PR.
|
||
validate_request_priority(request_args) | ||
if 'RequestPriority' in reqArgsDiff: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping
reqArgsNothandled.append(reqArg) | ||
cherrypy.log("Uunhandled argument for no-status update: %s" % reqArg) | ||
|
||
if reqArgsNothandled: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion is not to change this behavior with this change. In other words, if we were not failing an HTTP call that provided us with extra arguments, I would be in favor of keeping that same behavior for now.
workload.setSiteBlacklist(reqArgsDiff["SiteBlacklist"]) | ||
cherrypy.log('Updated SiteBlacklist of "{}", with: {}'.format(workload.name(), reqArgsDiff['SiteBlacklist'])) | ||
else: | ||
reqArgsNothandled.append(reqArg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my comment below as well.
workload.setPriority(reqArgsDiff['RequestPriority']) | ||
workload.saveCouchUrl(workload.specUrl()) | ||
cherrypy.log('Updated priority of "{}" to: {}'.format(workload.name(), reqArgsDiff['RequestPriority'])) | ||
elif workqueue_stat_validation(reqArgsDiff): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For workqueue stats, you are unnecessarily validating the whole user input arguments while you iterate over each of the input arguments.
As I said in one of my previous comments, this action is exclusive to the other updates. So I would remove this check from inside this loop for, move it above in this method and check/act/return before proceeding with the remaining logic of the other arguments.
In other words, I suggest something like:
if workqueue_stat_validation(request_args):
report = self.reqmgr_db_service.updateRequestStats(workload.name(), request_args)
cherrypy.log('Updated workqueue statistics of "{}", with: {}'.format(workload.name(), request_args))
return report
...
report = []
reqArgsDiff = deepcopy(request_args)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
raise InvalidSpecParameterValue(msg) | ||
|
||
return report | ||
report.append(self.reqmgr_db_service.updateRequestProperty(workload.name(), reqArgsDiff, dn)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that if you rollback your change above - about raise an exception for unsupported parameters - that you also need to change reqArgsDiff
here, otherwise you would be updating unsupported properties as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I do not want to roll it back. I know I am doing the correct and user friendly thing here. And I am doing it on purpose, because we safe what - tow lines of code here and now ... and then for the rest of the lifetime of the universe we give an incomplete and misleading debugging information, and who ever is reading the logs will always need background information in order to make proper sense of them and link his won action and the actual mechanism/reason why his call failed.
# must update three places: GQ elements, workload_cache and workload spec | ||
self.gq_service.updatePriority(workload.name(), reqArgsDiff['RequestPriority']) | ||
workload.setPriority(reqArgsDiff['RequestPriority']) | ||
workload.saveCouchUrl(workload.specUrl()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this call:
workload.saveCouchUrl(workload.specUrl())
I think it also needs to be executed for site list changes, such that those information get persisted in the workload object. Said that, I would suggest moving it outside of this loop, such that we only call it once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, as we also discussed it earlier during the chat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/python/Utils/Utilities.py
Outdated
case when all the entries in the list are identical or | ||
False in the case when any of them deviates from the expected value. | ||
""" | ||
if reduce(lambda x, y: x == y == expectedValue and expectedValue, reportList, expectedValue): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As already expressed in my previous comments, I don't think we actually need this function. In addition, this expression looks cumbersome and it is hard to say what exactly it does. Or it could be just me.
Nonetheless, let us keep discussing/reviewing this PR before taking any action here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does a simple thing:
- Takes a list of values (
reportList
) and applies a two parameterized lambda expression to each of them sequentially aggregating the previous result as an argument to the next step - IIUC this should be the standardreduce
behavior: https://docs.python.org/3/library/functools.html#functools.reduce, but more or less this is the 'key' expression for understanding it:((((1+2)+3)+4)+5)
it tells it all. - The lambda expression on its own does the following: It simply compares if the tow arguments which it gets during the call are equal between each other and if they are equal if they match the expected value on top of that and if all of this is true returns the expected value otherwice the result of the comparission (which should be simply
False
- For initialization the sequence of steps in the
reduce
call an initial value is provided, which is the last argument -expectedValue
, such that all comparisons are done against the expected value - even starting from the very first one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW this function is agnostic to the list provided... so it is applicable in many other cases as the so provided unittests shows it. I called it reduceReport
only because of the context under which I initially suggested it before we moved it here. If I need to be honest the best name for this function should be reduceList
or something in that line.
|
||
# NOTE: We need to explicitly add all stat keys during validation | ||
# They are needed for the workqueue_stat_validation() calls | ||
ALLOWED_STAT_KEYS = ['total_jobs', 'input_lumis', 'input_events', 'input_num_files'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of proliferating these stat parameters, please create it under:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/ReqMgr/DataStructs/RequestStatus.py
and them import it here.
It would be great if you could update this line as well:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/ReqMgr/Utils/Validation.py#L26
such that we stop using this hardcoded variable, and use it from the RequestStatus module variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, if you do the separation that I suggested in Request.py, you won't need these extra checks over here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Importing the ALLOWED_STAT_KEYS
is good. I'll change it, but we will still need the extra line allowedKeys.extend(ALLOWED_STAT_KEYS)
the other option would be to add it to the values returned from get_modifiable_properties(status)
irectly but this I am afraid may change behaviour for others who depend on this. Ergo this is why I used this mechanism to additionally extend the list here. rather than in RequestSatatus
Jenkins results:
|
938f2f4
to
125f4f6
Compare
Jenkins results:
|
Jenkins results:
|
19e1e3b
to
0da3832
Compare
hi @amaltaro I have addressed all of your comments. Please take another look. |
Jenkins results:
|
Jenkins results:
|
validate_request_priority(reqArgsDiff) | ||
# must update three places: GQ elements, workload_cache and workload spec | ||
self.gq_service.updatePriority(workload.name(), reqArgsDiff['RequestPriority']) | ||
workload.setPriority(reqArgsDiff['RequestPriority']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amaltaro let me ask something here about this line of code (not related to the current review process, though) ....
While working on #12038, I tried to follow the sequence of steps taken while updating a request priority and figure out if we could update both the workqueue elements and the workload priority from a single place - like the wmspec
for example (i.e. the workload itself). And then I realized this step is somehow redundant, because workload priority update is a step already performed intrinsically during the call of Global Queue priority update. See:
WMCore/src/python/WMCore/Services/WorkQueue/WorkQueue.py
Lines 250 to 256 in 803b7d5
# Update the spec, if it exists | |
if self.db.documentExists(wf): | |
wmspec = WMWorkloadHelper() | |
wmspec.load(self.hostWithAuth + "/%s/%s/spec" % (self.db.name, wf)) | |
wmspec.setPriority(priority) | |
dummy_values = {'name': wmspec.name()} | |
wmspec.saveCouch(self.hostWithAuth, self.db.name, dummy_values) |
Is there any particular reason why should we repeat the same step twice in a sequence .... or did we just happened to be living with this redundant step for sometime now and simply not noticing.... Or maybe I am wrong and I cannot grasp some minor detail, which makes a difference between those two calls to update the wmspec
at couchDB:
- the one coming from the workqueue service priority update and
- the other from the
request._handleNoStatutsUpdate
methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a great question! At some point in the last ~17 years, it was decided that workqueue would actually have its own copy of the workload spec (as you can see it uses self.db in here as well, which points to workqueue), while the "first" workload spec is created in reqmgr2 during the workflow creation in the system (I guess in reqmgr_workload_cache db). So yes, we need to keep the current logic as those are different updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Todor not like!!!
Todor think many copies of objects many sources of concurrency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@todor-ivanov I still have some follow up to be done here before we can converge with these changes. Please find my comments along the code.
@@ -184,7 +196,10 @@ def get_modifiable_properties(status=None): | |||
TODO: Currently gets the result from hardcoded list. change to get from configuration or db | |||
""" | |||
if status: | |||
return ALLOWED_ACTIONS_FOR_STATUS.get(status, 'all_attributes') | |||
allowedKeys = ALLOWED_ACTIONS_FOR_STATUS.get(status, 'all_attributes') | |||
if not allowedKeys == 'all_attributes': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not if allowedKeys != 'all_attributes':
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a matter of taste. It makes no difference here. If you refer to the object equality in memory check which is done with is
, this is not the case here. The not
here plays a role only to negate the result of the comparison, while the comparison operator is still ==
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a matter of taste, but code readability. If you can read "different than", why would you write "not equal to"?
|
||
if not reqArgsDiff: | ||
cherrypy.log("Nothing to be changed at this stage") | ||
return reduceReport(report) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why calling reduceReport
on an empty list? Won't it return False
from that function back to the end user (or after the REST layer consumes the False value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, calling reduceReport
on an empty list will return the expected value, because the expected value is set as a default.
In [10]: reduceReport([])
Out[10]: 'OK'
return reduceReport(report) | ||
|
||
reqArgsNothandled = [] | ||
for reqArg in reqArgsDiff.copy(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I failed to see where RequestName
is removed inside this loop. Can you please point me to the line/function doing so?
report.append(self.reqmgr_db_service.updateRequestProperty(workload.name(), reqArgsDiff, dn)) | ||
workload.setSiteWhitelist(reqArgsDiff["SiteWhitelist"]) | ||
cherrypy.log('Updated SiteWhitelist of "{}", with: {}'.format(workload.name(), reqArgsDiff['SiteWhitelist'])) | ||
elif reqArg == "SiteBlacklist": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Todor, I followed your code once again in a clean view (without GH diff):
https://github.com/dmwm/WMCore/blob/0da38322dfb816bd00b0df079cf24d58a9060966/src/python/WMCore/ReqMgr/Service/Request.py
and I do not see multiple requests being consumed in any moment. It is always returned as a single call report! So please remove the call to reduceReport() in (currently) L455.
workload.setSiteBlacklist(reqArgsDiff["SiteBlacklist"]) | ||
cherrypy.log('Updated SiteBlacklist of "{}", with: {}'.format(workload.name(), reqArgsDiff['SiteBlacklist'])) | ||
else: | ||
reqArgsNothandled.append(reqArg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Please fix the typo in the message below then "Uunhandled".
Actually, we might even delete that line and keep only the final one reporting all unhandled/unexpected parameters.
reqArgsNothandled.append(reqArg) | ||
cherrypy.log("Uunhandled argument for no-status update: %s" % reqArg) | ||
|
||
if reqArgsNothandled: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, now I see it can have value in knowing which parameters from the user were dropped, so it's Okay to write a record of those in the logs.
However, this behavior is different than the behavior currently implemented in master. In current master, we only fail the HTTP call if the user didn't provide any of the expected parameters (RequestPriority or stats). While with this PR, it fails for any unsupported parameter provided by the end user (even worse, which could be after persisting changes for supported parameters).
Add proper checks for allowed request properties changes && Stop reducing request_args only to RequestPriority for noStatusTransition actions Extend allowed arguments including stat_keys and RequestStatus && Call the relevant modifiers from _handleNoStatusUpdate Fix missing RequestName from request_args on mutipple calls of validate_request_update_args Add proper log messages for Sitelists changes Remove redundant validation calls && Update reqdb couch with single action && Move reduceReport to Utils. Typo Remove forgotten commented lines of code Review comments Source files pylint fixes Review comments
Unit tests Unit tests pylint fixes Unit tests - remove tests for reduceReport
09de473
to
901463d
Compare
@amaltaro take another look. And lets merge this and move on. |
Jenkins results:
|
Jenkins results:
|
test this please |
Jenkins results:
|
Jenkins results:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@todor-ivanov , I think you covered most of my comments in this PR.
As you started working in these 2 PRs in parallel and you are now having issues to track these modifications, we can follow up on anything else in the coming PR.
Fixes #12037
Status
Ready
Description
With the current change we allow SiteLists related actions for statuses:
staging, acquired, running-open
in ReqMgr2Before making a call to central services for changing any of the request parameters, an additional step is executed to
to check which are the allowed parameter modifications for the given status and if the so provided new values fromthe rest call actually differ from the workflow parameters already defined in central couchdb.
With the current change we no longer ignore all the rest of the request arguments provided with the REST call in the cases of a change of the Request priority. See: #8457 (comment)
Is it backward compatible (if not, which system it affects?)
YES
Related PRs
None
External dependencies / deployment changes
None