Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build dbs3-client/dbs3-pycurl-client on a python3 stack #10306

Closed
amaltaro opened this issue Feb 23, 2021 · 24 comments
Closed

Build dbs3-client/dbs3-pycurl-client on a python3 stack #10306

amaltaro opened this issue Feb 23, 2021 · 24 comments

Comments

@amaltaro
Copy link
Contributor

amaltaro commented Feb 23, 2021

Impact of the new feature
WMCore services using the dbs3-client (DbsApi client), like global workqueue and WMAgent.

Is your feature request related to a problem? Please describe.
Yes, this is another WMAgent dependency that needs to be sorted before we can run any WMCore service depending on it.
Perhaps a similar issue should be created in the DBS repository as well(?)

Describe the solution you'd like
We need to have a python3 compatible spec for the dbs3-client and its dependency dbs3-pycurl-client. Then we can use the new spec in our wmagentpy3 spec and try to deploy and run that service within a python3 stack.

First alternative would be to create a new spec file for those clients, depending on python3 stack, and see if we can successfully build a package. I don't know how big is dbs3-client, but what if we manage to run it on python3 without any source code changes?

Describe alternatives you've considered
To be discovered once we start working on it.

Additional context
Issue spotted and mentioned here: #10302 (comment)

links to the relevant spec files:
https://github.com/cms-sw/cmsdist/blob/comp_gcc630/dbs3-client.spec
https://github.com/cms-sw/cmsdist/blob/comp_gcc630/dbs3-pycurl-client.spec

Link to the client source:
https://github.com/dmwm/DBS/tree/master/Client

@todor-ivanov
Copy link
Contributor

Just for logging purposes and for keeping everything we'll need for this issue at one place, I am putting a pointer here [1] to a message from Shahzad giving the instructions on how to install a previously compiled single package from the cmsdist repository.

p.s. Thanks @amaltaro and @smuzaffar for the instructions provided.

[1]
cms-sw/cmsdist#6440 (comment)

@todor-ivanov
Copy link
Contributor

todor-ivanov commented Apr 23, 2021

Again just to keep track of the process, here is the PR at cmsdist that introduces the py3 version of the dbs3-clinet packages:
cms-sw/cmsdist#6837

FYI @amaltaro

@todor-ivanov
Copy link
Contributor

Finally I managed to build those two packages py3-dbs3-client and py3-dbs3-pycurl-client, but I needed some fixes in the DBS code. Here is the related PR:

dmwm/DBS#649

@todor-ivanov
Copy link
Contributor

After uploading the so build py3-dbs3-client into my repository at cmsrep I was able to safely deploy it into a testing python3 virtual environment created especially for WMCore and py3-dbs3-client [1]. Currently I am trying to source the init.sh for the package and test the client itself.

FYI @amaltaro

[1]

(WMCore.venv3.dbs3) [tivanov@tivanov-unit02 WMCore.venv3.dbs3]$ ./common/cmspkg -a slc7_amd64_gcc630 install cms+py3-dbs3-client+3.16.0
Reading Package Lists...
Get:1 http://cmsrep.cern.ch cmssw/comp.tivanov/slc7_amd64_gcc630 cms+py3-dbs3-client+3.16.0-1-1.slc7_amd64_gcc630.rpm
Building cms+py3-dbs3-client+3.16.0 Dependency Tree...
Get:2 http://cmsrep.cern.ch cmssw/comp.tivanov/slc7_amd64_gcc630 cms+py3-dbs3-pycurl-client+3.16.0-1-1.slc7_amd64_gcc630.rpm
Get:3 http://cmsrep.cern.ch cmssw/comp.tivanov/slc7_amd64_gcc630 external+bz2lib+1.0.6-1-1.slc7_amd64_gcc630.rpm
Get:4 http://cmsrep.cern.ch cmssw/comp.tivanov/slc7_amd64_gcc630 external+curl+7.59.0-comp-1-1.slc7_amd64_gcc630.rpm
Get:5 http://cmsrep.cern.ch cmssw/comp.tivanov/slc7_amd64_gcc630 external+db6+6.0.30-1-1.slc7_amd64_gcc630.rpm
Get:6 http://cmsrep.cern.ch cmssw/comp.tivanov/slc7_amd64_gcc630 external+expat+2.0.1-1-1.slc7_amd64_gcc630.rpm
Get:7 http://cmsrep.cern.ch cmssw/comp.tivanov/slc7_amd64_gcc630 external+gcc+6.3.0-1-1.slc7_amd64_gcc630.rpm
Get:8 http://cmsrep.cern.ch cmssw/comp.tivanov/slc7_amd64_gcc630 external+gdbm+1.12-1-1.slc7_amd64_gcc630.rpm
Get:9 http://cmsrep.cern.ch cmssw/comp.tivanov/slc7_amd64_gcc630 external+libffi+3.2.1-1-1.slc7_amd64_gcc630.rpm
Get:10 http://cmsrep.cern.ch cmssw/comp.tivanov/slc7_amd64_gcc630 external+libuuid+2.22.2-1-1.slc7_amd64_gcc630.rpm
Get:11 http://cmsrep.cern.ch cmssw/comp.tivanov/slc7_amd64_gcc630 external+py3-pycurl+7.43.0.3-1-1.slc7_amd64_gcc630.rpm
Get:12 http://cmsrep.cern.ch cmssw/comp.tivanov/slc7_amd64_gcc630 external+py3-setuptools+39.2.0-1-1.slc7_amd64_gcc630.rpm
Get:13 http://cmsrep.cern.ch cmssw/comp.tivanov/slc7_amd64_gcc630 external+python3+3.8.2-comp-1-1.slc7_amd64_gcc630.rpm
Get:14 http://cmsrep.cern.ch cmssw/comp.tivanov/slc7_amd64_gcc630 external+sqlite+3.8.10.2-1-1.slc7_amd64_gcc630.rpm
Get:15 http://cmsrep.cern.ch cmssw/comp.tivanov/slc7_amd64_gcc630 external+xz+5.2.2-1-1.slc7_amd64_gcc630.rpm
Get:16 http://cmsrep.cern.ch cmssw/comp.tivanov/slc7_amd64_gcc630 external+zlib+1.2.8-1-1.slc7_amd64_gcc630.rpm
The following NEW packages will be installed:
  cms+py3-dbs3-client+3.16.0
  cms+py3-dbs3-pycurl-client+3.16.0
  external+bz2lib+1.0.6
  external+curl+7.59.0-comp
  external+db6+6.0.30
  external+expat+2.0.1
  external+gcc+6.3.0
  external+gdbm+1.12
  external+libffi+3.2.1
  external+libuuid+2.22.2
  external+py3-pycurl+7.43.0.3
  external+py3-setuptools+39.2.0
  external+python3+3.8.2-comp
  external+sqlite+3.8.10.2
  external+xz+5.2.2
  external+zlib+1.2.8
0 upgraded, 16 newly installed, 0 removed and 0 not upgraded.
TIME: Downlaod: 10.912024974822998 secs for 16 packages
Continue installation (Y/n): y
Downloaded 141MB of archives.
After unpacking 424MB of additional disk space will be used.
Executing RPM (source /afs/cern.ch/work/t/tivanov/private/WMCoreDev.d/WMCore.venv3.dbs3/share/cms/cmspkg/V00-00-44/rpm_env.sh slc7_amd64_gcc630; rpm -U -v -h -r /afs/cern.ch/work/t/tivanov/private/WMCoreDev.d/WMCore.venv3.dbs3 --prefix /afs/cern.ch/work/t/tivanov/private/WMCoreDev.d/WMCore.venv3.dbs3 --force --ignoreos --ignorearch --oldpackage)...
Preparing...                          ################################# [100%]
Updating / installing...
   1:external+gcc+6.3.0-1-1           ################################# [  6%]
   2:external+expat+2.0.1-1-1         ################################# [ 13%]
   3:external+zlib+1.2.8-1-1          ################################# [ 19%]
   4:external+bz2lib+1.0.6-1-1        ################################# [ 25%]
   5:external+db6+6.0.30-1-1          ################################# [ 31%]
   6:external+gdbm+1.12-1-1           ################################# [ 38%]
   7:external+libffi+3.2.1-1-1        ################################# [ 44%]
   8:external+libuuid+2.22.2-1-1      ################################# [ 50%]
   9:external+sqlite+3.8.10.2-1-1     ################################# [ 56%]
  10:external+xz+5.2.2-1-1            ################################# [ 63%]
  11:external+python3+3.8.2-comp-1-1  ################################# [ 69%]
Can't open /afs/cern.ch/work/t/tivanov/private/WMCoreDev.d/WMCore.venv3.dbs3/slc7_amd64_gcc630/external/python3/3.8.2-comp/lib/pkgconfig/python*.pc: No such file or directory.
  12:external+py3-setuptools+39.2.0-1-################################# [ 75%]
  13:external+curl+7.59.0-comp-1-1    ################################# [ 81%]
  14:external+py3-pycurl+7.43.0.3-1-1 ################################# [ 88%]
  15:cms+py3-dbs3-pycurl-client+3.16.0################################# [ 94%]
  16:cms+py3-dbs3-client+3.16.0-1-1   ################################# [100%]
TIME: Install: 64.60148811340332 secs for 16 packages

@klannon
Copy link

klannon commented Apr 29, 2021

It's good to see this progress. Thanks for the updates!

@todor-ivanov
Copy link
Contributor

todor-ivanov commented Apr 29, 2021

Thanks @klannon

Here is some more result from the test with my latest (made in a hurry) fixes to the DBS code [1], which made it possible to even import the client, but unfortunately the lack of cjson in python3, seems to be not that easy to overcome as we thought during the meeting. For some reason the so expected fallback mechanism was not triggered [2], and I am looking deeper now to understand why. @vkuznet were my expectations wrong due to misinterpretting your words? The possibility for me getting your explanation wrong is far from neglectable 😄

[1]
dmwm/DBS@4ec0d9d

[2]

In [5]: from dbs.apis.dbsClient import DbsApi

In [6]: dbsApi = DbsApi(url = 'https://cmsweb-testbed.cern.ch/dbs/int/global/DBSReader/')

In [7]: dbsApi.listDataTiers()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-7-2486c5b8464a> in <module>
----> 1 dbsApi.listDataTiers()

/afs/cern.ch/work/t/tivanov/private/WMCoreDev.d/DBS/Client/src/python/dbs/apis/dbsClient.py in listDataTiers(self, **kwargs)
    898         checkInputParameter(method="listDataTiers", parameters=kwargs.keys(), validParameters=validParameters)
    899 
--> 900         return self.__callServer("datatiers", params=kwargs)
    901 
    902     def listDataTypes(self, **kwargs):

/afs/cern.ch/work/t/tivanov/private/WMCoreDev.d/DBS/Client/src/python/dbs/apis/dbsClient.py in __callServer(self, method, params, data, callmethod, content)
    202         method_func = getattr(self.rest_api, callmethod.lower())
    203 
--> 204         data = cjson.encode(data)
    205 
    206         try:

NameError: name 'cjson' is not defined


@vkuznet
Copy link
Contributor

vkuznet commented Apr 29, 2021

@todor-ivanov , I said that if you'll replace cjson with jsonwrapper then it will work. Have a look at lxplus:/afs/cern.ch/user/v/valya/public/jsonparser where I put jsonwrapper and example code t.py

The jsonwrapper provides a wrapper for different json implementation. On my laptop it loads cjson while on lxplus it loads json, etc.

Now, to use it you should take jsonwrapper as is and replace all cjson.XXXX calls in DBS with jsonwrapper.XXXX where XXXX would be json function like encode or decode or load , etc. This will allow DBS code to be transparent to which json module it will use, in python2 it will use cjson, while in py3 it will use json.

@todor-ivanov
Copy link
Contributor

Thanks @vkuznet!
Looking into the code now!

@todor-ivanov
Copy link
Contributor

todor-ivanov commented Apr 30, 2021

Hi @vkuznet Thank you very much for providing the jsonWrapper. I did few tests with the code you suggested. It may do the job indeed, but for sure the situation is much more complicated than I was expecting, especially if we decide to include the logical alternative to cjson (based on performance metrics) which is ujson [0]. And the problem comes from two facts, one of which you tried to address in your code but the other one is still an open issue, which I already foresee where it will blowup [1]:

  • All those modules they have completely different interfaces [2]
  • All those modules they have completely different set of Exceptions [3].

So I allowed myself to take that piece of code and do some changes. I am quite sure they would not be enough to make it universal, but I hope they may at least try to address the needs of the dbs3 client. I am about to upload it in https://github.com/dmwm/DBS/tree/master/Client/utils . If there is another more centralized place where this code is hosted, please let me know so we avoid code duplication.

[0]
https://artem.krylysov.com/blog/2015/09/29/benchmark-python-json-libraries/

[1]
https://github.com/dmwm/DBS/blob/14df8bbe8ee8f874fe423399b18afef911fe78c7/Client/src/python/dbs/apis/dbsClient.py#L210

[2]

In [17]: dir(ujson)
Out[17]: 
['__doc__',
 '__file__',
 '__name__',
 '__package__',
 '__version__',
 'decode',
 'dump',
 'dumps',
 'encode',
 'load',
 'loads']

In [18]: dir(cjson)
Out[18]: 
['DecodeError',
 'EncodeError',
 'Error',
 '__doc__',
 '__file__',
 '__name__',
 '__package__',
 '__version__',
 'decode',
 'encode']

In [22]: dir(json)
Out[22]: 
['JSONDecoder',
 'JSONEncoder',
 '__all__',
 '__author__',
 '__builtins__',
 '__doc__',
 '__file__',
 '__name__',
 '__package__',
 '__path__',
 '__version__',
 '_default_decoder',
 '_default_encoder',
 'decoder',
 'dump',
 'dumps',
 'encoder',
 'load',
 'loads',
 'scanner']

In [25]: dir(yajl)
Out[25]: 
['Decoder',
 'Encoder',
 '__doc__',
 '__file__',
 '__name__',
 '__package__',
 'dump',
 'dumps',
 'load',
 'loads']

[3]

In [1]: import cjson
In [2]: cjson.decode(u'{')
---------------------------------------------------------------------------
DecodeError                               Traceback (most recent call last)
<ipython-input-5-6f9ceb1120ac> in <module>()
----> 1 cjson.decode(u'{')
DecodeError: unterminated object starting at position 0
In [9]: import ujson
In [12]: ujson.decode(u'{')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-c7021aad5bec> in <module>()
----> 1 ujson.decode(u'{')
ValueError: Expected object or value
In [32]: import json
In [33]: json.loads(u'{')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-33-16ff1080a2c0> in <module>()
----> 1 json.loads(u'{')

/usr/lib/python2.7/json/__init__.pyc in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    337             parse_int is None and parse_float is None and
    338             parse_constant is None and object_pairs_hook is None and not kw):
--> 339         return _default_decoder.decode(s)
    340     if cls is None:
    341         cls = JSONDecoder

/usr/lib/python2.7/json/decoder.pyc in decode(self, s, _w)
    362 
    363         """
--> 364         obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    365         end = _w(s, end).end()
    366         if end != len(s):

/usr/lib/python2.7/json/decoder.pyc in raw_decode(self, s, idx)
    378         """
    379         try:
--> 380             obj, end = self.scan_once(s, idx)
    381         except StopIteration:
    382             raise ValueError("No JSON object could be decoded")

ValueError: Expecting object: line 1 column 1 (char 0)

In [30]: import yajl
In [31]: yajl.loads(u'{')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-31-50c3d7f2906a> in <module>()
----> 1 yajl.loads(u'{')

ValueError: eof was met before the parse could complete

@todor-ivanov
Copy link
Contributor

Here [1] are the few minor changes I suggested to your jsonwrapper @vkuznet. I haven't tested them yet, though. Please let me know what you think.

[1]
dmwm/DBS@a0aeed0

@amaltaro
Copy link
Contributor Author

In my opinion, we should only support cjson and the standard library json. Dario has done quite some benchmarking and support analysis of many python JSON libraries:
https://cms-dmwm-test.docs.cern.ch/py2py3/json/

and AFAIR the conclusion was:

  • in python2, cjson performs much better than standard json, and we better not use json in DBS python2
  • in python3, json has one of the best performances. Given that it comes from the standard library, it's our best option as well.

So, if we follow this path, I'd suggest to keep it as simple and robust as possible. Especially after all these inconsistencies that Todor reported above.

@vkuznet
Copy link
Contributor

vkuznet commented Apr 30, 2021

I would agree with Alan, there is no need in another json implementation if standard library in py3 provide the best one.

@todor-ivanov
Copy link
Contributor

Thanks @amaltaro I am about to simplify the wrapper, as I said in my comment to the PR in the DBS repository. Maybe even get rid of the jsonwrapper at all. I do not doubt Dario's results but one thing/library is kind of outstanding in almost all tests he did: orjson:

https://cms-dmwm-test.docs.cern.ch/py2py3/json/analyse_py2py3.html

And it seems quite better than the standard json implementation. My question is then why do we not try to use this one in py3?

@vkuznet
Copy link
Contributor

vkuznet commented Apr 30, 2021

@todor-ivanov , it is decision which should be made by DMWM and DBS teams. The json handling is very critical in DBS as it affects server performance (CPU and RAM wise) as well as time of HTTP response. There are plenty of requests which can be large, I already shown that it is not uncommon that WMCore sends hundreds of MB of JSON to DBS, and this should be taken into account. But also you need to take a responsibility to support extra dependency. I think jsonwrapper allows to achieve both goals, since it can always fall back to standard library json. But as such it also requires to change all code which deals with json to use jsonwrapper. So, at the end as I wrote it is DMWM and DBS team call.

@amaltaro
Copy link
Contributor Author

All the concerns raised by Valentin are pretty valid, and I think most of them - if not all - have been covered by the benchmarks performed by Dario.

orjson seems to be better than stdlib json indeed, but again it's an extra third-party library that we need to support. Dario also reports it to be maintained by a single person, so probably not very good.

Stdlib json seems to be around the 2nd position in that evaluation for python3, and given that it's from the standard library, that would be my preference for python3 (at least that's what I would like to have in WMCore python3).

Regarding DBS, that's not really my call and I'd rather ask Yuyi's input here. @yuyiguo could you please have a look at the JSON libraries evaluation documented by Dario in the link I provided above and share with us what your thoughts are?

Just to be explicit, my opinion is the same conclusion we got when it was discussed in the WMCore chat, also replied here:
#10306 (comment)

@klannon
Copy link

klannon commented Apr 30, 2021

Thanks for the quick and careful work on this @todor-ivanov! Just a quick point to consider: for what we need to do (validate Py3 WMAgent), I don't think we need to chose the optimal option. For this case, I would choose the easiest/quickest path to get us to testing WMAgent in a Py3 environment. We can optimize if necessary in a later step.

@todor-ivanov
Copy link
Contributor

Thank you all for this discussion on cjson alternative libraries. I am working on simplifying the wrapper right now to include a switch only between standard library json and cjson. I am about to address also the @vkuznet remarks on the related PR in DBS.

@todor-ivanov
Copy link
Contributor

So following our meeting from Monday and also the decisions taken in the meeting between me @mapellidario and @amaltaro I was supposed to split that issue in two and follow the two processes separately:

  • Building the packages
  • Fixing the code (only for python3)

Which I did... [1] only not in all the affected repositories/sets of issues. So I followed all those actions only in the DBS repository [1] and forgot to reflect them in WMCore. Please take my apologies for that. And here are [2] the related cmsdist changes reflecting the build from the branch dedicated to the Pyton3 version of DBS Client. Currently they point to the latest version of the code in DBS and are not related to a specific tag - which we must do once the code is merged from those DBS PRs and the proper tags are created @yuyiguo

[1]
dmwm/DBS#653
dmwm/DBS#649

[2]
cms-sw/cmsdist#6837

@todor-ivanov
Copy link
Contributor

So what I basically wanted to stress in my previous comment was that the current issue may be considered resolved for both package build and py3 changes once we have the code merged in DBS into the proper branch and the relevant tags created. Again sorry for not creating two separate issues in WMcore.

@amaltaro
Copy link
Contributor Author

This should have been fixed with: cms-sw/cmsdist#6837
However, we still need to get an official DBS tag and update those client specs in cmsdist repository.

@todor-ivanov shall we keep this issue opened for a day or two more, just so we can update cmsdist with an official DBS tag for the python3 dbs3-client specs?

@todor-ivanov
Copy link
Contributor

As you wish @amaltaro, I am fine both ways.

@amaltaro
Copy link
Contributor Author

@todor-ivanov Yuyi has provided a client tag a few days ago (see dmwm/DBS#653).

Can you please make sure everything we need is in place? If so, please provide a new cmsdist PR with the new tag for the py3-dbs3 clients such that we can close this issue.

@todor-ivanov
Copy link
Contributor

Hi @amaltaro
Here is the relevant PR in cmsdist : cms-sw/cmsdist#6946
Once we get the +1 from jenkins we can merge there and close the current issue here too.

@amaltaro
Copy link
Contributor Author

Perfect! Thanks, Todor. Closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants