Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbsmigration server monitoring errors #644

Open
yuyiguo opened this issue Jan 4, 2021 · 6 comments
Open

dbsmigration server monitoring errors #644

yuyiguo opened this issue Jan 4, 2021 · 6 comments

Comments

@yuyiguo
Copy link
Member

yuyiguo commented Jan 4, 2021

I saw a lot of errors in the k8s dbsmigration log files like below:

INFO:cherrypy.access:[01/Jan/2021:00:57:14] dbs-migrate-5d847786d5-jsx5x 127.0.0.1 "GET /dbs/prod/global/DBSMigrate/ HTTP/1.1" 200 OK [data: 306 in 887 out 14474 us ] [auth: OK "" "" ] [ref: "" "Go-http-client/1.1" ]
<string>:4: (ERROR/3) Unexpected indentation.

In vm log files, they look like below:

127.0.0.1 - - [01/Jan/2021:02:20:49] "GET / HTTP/1.1" 200 22 "" "ServerMonitor/2.0"

@vkuznet can you take a look into the monitoring ?

@vkuznet
Copy link
Contributor

vkuznet commented Jan 4, 2021

Yuyi, I doubt it is issue with our monitoring tool, we apply the same tool/command to all DBS servers and it seems to me that only DBS migration server has error in a log. If you log into you pod and call this command:

cmsweb-ping --url=http://localhost:8257/dbs/prod/global/DBSMigrate/ --authz=/etc/hmac/hmac -verbose 0
# or use verbose 1 option to get more output

you'll see that it is nicely return 200 OK which is used for monitoring. But the log is generated by DBS cherrypy server and I think it is issue with DBS Migration server rather the cmsweb-ping command. May be it is issue with WMCore REST server which generates the log entry. For that you need to check WMCore/src/python/WMCore/WebTools/Root.py codebase.

@vkuznet
Copy link
Contributor

vkuznet commented Jan 4, 2021

Yuyi, the actual error seems to come from this line:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WebTools/Root.py#L97

It can be proven by running pylint over this piece of code e.g.

pylint /Users/vk/CMS/DMWM/GIT/WMCore/src/python/WMCore/WebTools/Root.py
No config file found, using default configuration
************* Module src.python.WMCore.WebTools.Root
W:192, 0: TODO: make this loads better (fixme)
W:300, 0: TODO: Show a maintenance page with a 503 Service Unavailable header (fixme)
W:321, 0: TODO: remove bits we don't need (fixme)
C: 98, 0: Wrong continued indentation (add 2 spaces).
                        and getattr(request.rfile.rfile, "bytes_read", None)
                        ^ | (bad-continuation)
C: 99, 0: Wrong continued indentation (add 2 spaces).
                        and request.rfile.rfile.bytes_read) or "-"
                        ^ | (bad-continuation)
....

I suggest that you ping @amaltaro to resolve this generic error in WMCore code.

@yuyiguo
Copy link
Member Author

yuyiguo commented Jan 4, 2021

Thanks @vkuznet !
@amaltaro Can you fix this in next WMCore release?

@amaltaro
Copy link
Contributor

amaltaro commented Jan 4, 2021

It looks like this issue only got exposed because this Go client does not provide a Content-Length header (or it's 0), thus falling into the try/except broken code. Issue seems to be there for almost 2 years now, introduced in: dmwm/WMCore#9197

Yuyi, can you please open a WMCore GH issue and clarify to which branch this fix needs to be backported? I assume you will need a new tag as well, right? Before we actually implement this fix, I'd suggest you to test the fix Valentin suggests in one of the k8s pods (or in your VM), then restart DBSMigrate and see whether the problem gets fixed.

@yuyiguo
Copy link
Member Author

yuyiguo commented Jan 5, 2021

@amaltaro

I opened ticket dmwm/WMCore#10207. The code can be fix as simple as below:

rbytes = (getattr(request.rfile, 'rfile', None) \
                        and getattr(request.rfile.rfile, "bytes_read", None) \
                        and request.rfile.rfile.bytes_read) or "-"

@vkuznet
I don't know how you run the monitoring differently in different pods or something was changed recently. Here is the log message I have from my pods on cmsweb-test3. My pods are 53 days old. As you can see there was no error as what in the prod pods.
INFO:cherrypy.access:[05/Jan/2021:01:00:05] dbs-migrate-8fc48f77c-5z76m 127.0.0.1 "GET /dbs/prod/global/DBSMigrate/ HTTP/1.1" 200 OK [data: 306 in 317 out 1266 us ] [auth: OK "" "" ] [ref: "" "Go-http-client/1.1" ]

@vkuznet
Copy link
Contributor

vkuznet commented Jan 5, 2021

Yuyi, the error messages comes from DBS Python server, therefore it is not related to anything in monitoring. But to clarify the subject, the monitoring is applied via livenessProbe declaration in service yaml file. For instance, in dbs-migrate.yaml you'll find it here:
https://github.com/dmwm/CMSKubernetes/blob/master/kubernetes/cmsweb/services/dbs-migrate.yaml#L64-L71
As you can see it defines a command k8s will run against service end-point. In this case the command is:

cmsweb-ping --url=http://localhost:8257/dbs/prod/global/DBSMigrate/ --authz=/etc/hmac/hmac -verbose 0

This tool simply makes HTTP GET request to provided url. Since all WMCore services (including DBS) requires authorization the cmsweb-ping takes deployed hmac file which is used to setup fake CMS HTTP headers (which are required for authorization function in WMCore). That's it! We can easily use another tool, e.g. curl to query your service. Since such tool only makes HTTP call it is processed by your service which provides log entries. The error has nothing to do with monitoring.

Since the issue is clearly in python code you better to inspect WMCore how it was deployed. There are many explanation I can come up with why it shows in one log and not in another, but all of them are python related and has nothing to do with monitoring/HTTP requests we make for monitoring purposes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants