-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Building a base Almalinux 9 image for WMCore services; plus specific build for MSUnmerged #1452
base: master
Are you sure you want to change the base?
Conversation
Alan, thanks for putting this together. May I suggest few things:
Regarding apache rotatelogs. As you well aware it is our legacy approach based on VM based deployment. In k8s the logs can easily handed by kubernetes itself if we will yield them to stdout. We'll need to decide if this is still our mandatory requirements. If it is, I suggest to create another base image for only this package, then build it there from the source and install into custom area. Then use |
We still have to investigate whether I refactored the PR description and also cleaned up this development branch. @vkuznet I didn't address all your points yet, but you might want to revisit the description. Thanks |
Alan, thanks for description update, it looks good and properly state the issue. |
@amaltaro I made use of I haven't tested this yet, since I'm figuring out how to use |
@arooshap Aroosha, can you please review a new yaml file that I pushed in (named reqmgr2ms-unmerged-cern.yaml) and confirm if that is all that I need in order to install another service flavor under the reqmgr2ms-unmerged umbrella? I want to test a new docker image (based on Alma9), so I wanted to have a specific configuration for it for the moment. |
I updated the Dockerfile and built/uploaded a new image for MSUnmerged, with tag The service crashes - automatically restarted by CherryPy - whenever the service tries to remove a non-empty directory. This happened for all of the 4 RSEs that I enabled for the service so far and a signature of the logs look like:
google suggests this to be due to a SEGV. In addition, I also got the container killed with:
which seems to be caused due to exceeded usage of memory. Current memory limit is set to 2GB, so I really doubt this is the actual problem... For the record, I did not manage to reproduce this interactively inside the POD with the following script:
|
I want to provide additional insight into pod failure on k8s cluster:
Finally, I tried to measure time on
In above output it took 8 seconds to response to According to deployed k8s manifest file we have
which means that livemess probe will timeout after 5 seconds. Therefore, I conclude that we see pods restart due to poor performance of
|
To test effect of liveness probe, I adjusted |
I checked the pod and found no restarts after 40min,
So, it seems we correctly identified the problem of pod crashing states. |
Install EPEL repository and a few CA-related packages Use latest image
[PLEASE DO NOT MERGE]
This PR provides 2 dockerfiles:
cmsweb/pypi/alma-base:alma9-20240305
)cmsweb/pypi/reqmgr2ms-unmerged:2.3.1-20240305
)This is still not finished, and these images rely on the latest OS version (which likely explains the lack of security vulnerabilities), but current stats for these images are:
alma-base
has no vulnerabilities and compressed size (in harbor) of 109MiB (while dmwm-base image has 384MiB, with no security scan available)reqmgr2ms-unmerged
has a total of 4 vulnerabilities, with an image size of 142MiB (while the debian-based image has 2468 vulnerabilities and size of 880MiB)For the moment, I copied all the manage/run/monitor scripts from the pypi/dmwm-base folder to the pypi/alma-base one. The only change on those scripts is that for now it does not use
rotatelogs
to start the service up. It needs further discussion.NOTE though that
rotatelogs
is not available in Almalinux (provided byapache2-utils
package). So this is something that we must change if adopting Almalinux; or find an alternative way to deploy that package.========== Update as of Apr/18 ===========
The
reqmgr2ms-unmerged
Dockerfile has been updated with the gfal2-plugins and a new image created with2.3.2rc6-20240419
. These are the plugins available to GFAL2 now:Important references: