-
-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't run megalinter container image on OpenShift since v7.4.0 #3176
Comments
In all cases, that is a real bug for the runner/platform you are using, since they throw a segmentation violation where other runners don't. The only change I can suspect is that the base image changed from an alpine3.17 python image to an alpine3.18 python image. v7.3.0...v7.4.0#diff-dd2c0eb6ea5cfc6c4bd4eac30934e2d5746747af48fef6da689e85b752f39557R52 It has a newer kernel version, 6.1. Is there anything here that rises flags? https://wiki.alpinelinux.org/wiki/Release_Notes_for_Alpine_3.18.0 If you think of permission errors, we had an issue once, but it was before these releases. Some node packages have file user id / group id too big for some environments where they are more limited. The "fix" was to map uid/gid when building the image. When I encountered it, I didn't have the problem when building the image inside the problematic environment, only when pulling the image. You may want to try building the image inside that environment to make sure. But maybe before that, to make sure, 7.3.0 still works as of today on that environment, and 7.4.0/7.5.0/7.6.0 all don't work? And what about the beta images (it is the main branch). Is there a specific flavor that works? Or even a very simple single linter image that works? Something like https://hub.docker.com/r/oxsecurity/megalinter-only-go_revive that has nothing complicated inside. |
I don't think it's caused by the runtime per se, since all versions below the v7.3.0 are working without any problems. I have also taken a look at the changes from v7.3.0 to v7.4.0 (especially the Dockerfile of course) but couldn't make anything particular out which might cause a segmentation fault during startup. The beta image doesn't work either. You might actually be on to something with the uid/gid stuff since OpenShift assigns rather large ids in that respect. However this is done for all images during deploy time, so I don't think setting any uids during the image build would solve the issue in this particular case. Can you provide any sort of issue/PR to that past problem so I can dig a little deeper? I will test the single linter images next week and report back here. |
(End of: But I'm not that sure it would be that, since if your environment is able to pull (and extract) the image, then it's probably not it. A test for this is to have a container that you could go into and have docker inside, in order to pull. There was also this that was useful in understanding at that time. |
Other than that, I would have difficulty helping you out, as you didn't mention any version numbers of the failing software, and all the forums/docs for openshift are paywalled, so it's quite difficult to figure out what it is. All your three environments are not somethings that I have access or experience :S. maybe someone else can help |
Okay so I have tested with all 50 [Sarif] ERROR: there is no SARIF output file found, and stdout doesn't contain SARIF
[Sarif] stdout: Fatal error while calling devskim: [Errno 13] Permission denied: 'devskim'
Error while getting total errors from SARIF output.
Error:while parsing a block mapping
in "<unicode string>", line 1, column 1:
Fatal error while calling devski ...
^
expected <block end>, but found '<scalar>'
in "<unicode string>", line 1, column 47:
... ile calling devskim: [Errno 13] Permission denied: 'devskim'
^
stdout: Fatal error while calling devskim: [Errno 13] Permission denied: 'devskim'
Unable to process reporter CONSOLE[Errno 13] Permission denied: 'devskim' This should be something unrelated to the original issue, because in this job the script execution actually started whereas for the full image we can't even enter the script section. The original problem only occurs when using the normal, i.e. the full image. I am at a loss |
Well, the "
We'll get there by elimination I think :) |
True, I didn't see those images because I was searching for
These all ran successfully as well 🤔 |
Ok, that's a good thing. Since cupcake works and not full, it limits the scope again more ;). Just like that, does the full image work now, in these conditions before going too far? |
The full image still doesn't work, same error as before. |
And does any of the beta tags, created like 2 hours ago work? Depending on your workloads, does the cupcake flavor cover enough to be used for you temporarily? |
Tested again using the |
The changes are all the commits in main since the last release. If you want to pinpoint the commit, to know if it was the last commit (this morning) that largely changed the way dotnet is installed, or any other, you can try taking the sha256 from the action logs here https://github.com/oxsecurity/megalinter/actions/workflows/deploy-BETA.yml, more specifically at that output for each run https://github.com/oxsecurity/megalinter/actions/runs/7085248249/job/19281122693#step:15:7209 and use something like this to get the older betas:
The one you tried and worked was:
Commits of interest: The others are really just continuous updates of the tools versions, not the structure/installed packages changes. That means that if the breaking point is between the commits of interest, a linter should have caused your bug. Did your environment really pull the latest images when you tried the latest, ie, really pulling the 7.6.0 instead of an old tag (like when you run docker locally, you have to run |
Thanks for reporting the issue, and thanks @echoix for this great analysis and support as usual :) If the issue is solved in beta, I suggest we release a new minor version next week-end :) |
Yes, I pinned the image using tag and digest, therefore it must be the correct one. |
We're using GitLab and GitLab CI pipelines in our organisation. Here we defined a linter job, which uses the megalinter image provided by this repository. Jobs are then executed by GitLab runners of type
kubernetes executor
which then spawn a pod which subsequently executes the aformentioned job.Since v7.4.0 however, we can't run those jobs anymore, since the pod which executes the job can't be created successfully anymore. It exits with the following error message:
Log Output
On AWS with GitLab runners of type
docker executor
running on EC2 instances however, it still works as expected.On AWS in an EKS cluster with runners of type
kubernetes executor
it works as well.I therefore suspect a problem with the underlying runtime, which for the docker executors is
docker
of course, für EKScontainerd
and for OpenShiftcri-o
. Could also be permission-related, as in our OpenShift clusters a lot of capabilities are prohibited. We haven't however changed those restrictions, and past versions of the image still work.This happens only with versions higher than v7.4.0, all versions below this work just fine.
To Reproduce
Steps to reproduce the behavior:
k8s executor
runner in OpenShift to run itExpected behavior
Image works as it did before in all runtimes.
The text was updated successfully, but these errors were encountered: