Make buildah's handling of ulimits match podman #5590

chris-reeves · 2024-06-14T14:58:57Z

There was a bug in #5275 which assumed that non-rootless environments always had the ability to increase limits past the hard limit. This is not always the case. This was resolved by more accurately reflecting podman's handling of ulimits. The initial fix for this didn't account for the fact that privileged users can (usually) increase limits past the hard limit. This wasn't picked up when the tests were run locally, so they have been modified to ensure that this particular failure mode would be identified in future.

What type of PR is this?

/kind bug

What this PR does / why we need it:

There was a bug in #5275 which assumed that non-rootless environments always had the ability to increase limits past the hard limit. This is not always the case. For example:

[2/2] STEP 8/13: RUN addgroup -g 1000 -S app && adduser -u 1000 -h /app -S -G app app
Error: building at STEP "RUN addgroup -g 1000 -S app && adduser -u 1000 -h /app -S -G app app": setting "RLIMIT_NPROC" limit to soft=104857600,hard=104857600 (was soft=31723,hard=31723): operation not permitted

This was resolved by more accurately reflecting podman's handling of ulimits.

How to verify it

The original bug can be replicated by running the test suite (specifically bats tests/run.bats -f limit) in a non-rootless mode where nproc limits are less than RLimitDefaultValue (1048576). For me the easiest way to do this was to run the test as root with RLimitDefaultValue increased to 10485760.

Which issue(s) this PR fixes:

None - no issue has been opened for this bug.

Special notes for your reviewer:

No tests have been added for the scenario that was originally encountered (non-rootless environment where the user doesn't have privileges to increase limits past the hard limit) due to the difficulty in manipulating limits and recreating this scenario in a test environment, however the tests which fail when RLimitDefaultValue is increased to 10485760 now pass with this fix in place.

The initial fix for this didn't account for the fact that privileged users can (usually) increase limits past the hard limit. This wasn't picked up when the tests were run locally, so they have been modified to ensure that this particular failure mode would be identified in future.

The PR does make a call to Setrlimit() without checking the returned error value. This is intentional as the way podman handles the failure of this call is to try again with the current hard limit, but we effectively do this a little later with the call to AddProcessRlimits() making another get/set round trip redundant.

Does this PR introduce a user-facing change?

None

packit-as-a-service · 2024-06-14T15:02:55Z

Ephemeral COPR build failed. @containers/packit-build please check.

flouthoc

Integration tests are failing for multiple envs.

chris-reeves · 2024-06-17T00:13:25Z

The initial fix for this didn't account for the fact that privileged users can (usually) increase limits past the hard limit. This wasn't picked up when the tests were run locally, so they have been modified to ensure that this particular failure mode would be identified in future.

TomSweeneyRedHat · 2024-06-17T19:54:04Z

@chris-reeves thanks for the PR! The tests are all very unhappy and it looks like you need to update your branch.

chris-reeves · 2024-06-17T20:42:06Z

@chris-reeves thanks for the PR! The tests are all very unhappy and it looks like you need to update your branch.

@TomSweeneyRedHat It looks like this was just the linter complaining. I did wonder whether not checking Setrlimit()'s return value would annoy the linter so I ran make lint locally before committing and it didn't complain. I've tried again and can't replicate the lint output that Cirrus is returning, so rather than trying to set lint ignores blindly or seeing if I can get away with assigning to _, I've taken a fairly blunt approach and simply write a debug-level log message if this fails.

Also rebased and signed the commit that I missed.

TomSweeneyRedHat · 2024-06-20T21:50:00Z

@chris-reeves I kicked off the test again, a quick look seemed to show network hiccups. The branch looks like it needs a rebase too.

chris-reeves · 2024-06-20T22:03:29Z

@chris-reeves I kicked off the test again, a quick look seemed to show network hiccups. The branch looks like it needs a rebase too.

Could you point me towards these network hiccups? The only two failing builds are the same two that appear to fail for most PRs (and this appears to be an issue with missing build tools on those platforms).

I can rebase again if you like, but it looks like you're taking a branch/merge approach so I'm not sure what it gives us (other than another build to wait on!).

There was a bug in containers#5275 which meant that in some cases (not rootless, with low limits) buildah tried to set limits to more than the hard limit. Fixing that issue made some of the tests fail as they only passed due to this bug. This was resolved by more accurately reflecting podman's handling of ulmits. Signed-off-by: Chris Reeves <[email protected]>

This is really a no-op to keep the validate_test check happy... Signed-off-by: Chris Reeves <[email protected]>

Try to set nofile limit to RLimitDefaultValue - this could potentially increase the limit past the current hard limit in non-rootless environments. This makes buildah behaviour match podman when a non-rootless environment has lower limits set. Signed-off-by: Chris Reeves <[email protected]>

Signed-off-by: Chris Reeves <[email protected]>

rhatdan · 2024-06-21T08:34:26Z

LGTM

rhatdan · 2024-06-21T08:34:32Z

Thanks @chris-reeves

chris-reeves · 2024-07-02T22:18:57Z

Anything else I can do to help get this PR merged?

rhatdan · 2024-07-15T15:43:06Z

@nalind @flouthoc @giuseppe @mheon @Luap99 @giuseppe PTAL

giuseppe

LGTM

openshift-ci · 2024-07-15T20:23:14Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chris-reeves, giuseppe

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [giuseppe]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

nalind · 2024-07-15T20:29:16Z

run_linux.go

 		var rlimit unix.Rlimit
+		rlimit.Cur = define.RLimitDefaultValue
+		rlimit.Max = define.RLimitDefaultValue
+		err := unix.Setrlimit(unix.RLIMIT_NOFILE, &rlimit)


If the limits are set higher than this on entry, are we lowering them for the calling process? I don't know that I want that as a side-effect.

If the limits are currently greater than define.RLimitDefaultValue then yes, they will be lowered (but for the current buildah process, not for any calling process).

There are three reasons why I don't think this is an issue:

define.RLimitDefaultValue is 1048576, which matches the default kernel hard limit;

the purpose of Make buildah match podman for handling of ulimits #5275 (which this PR is a continuation of) was to match the behaviour of podman (and this is what podman does);

if a user does need a higher limit for some reason, they can simply set the --ulimit option.

rhatdan · 2024-07-27T09:57:14Z

@chris-reeves Are you still working on this?

chris-reeves · 2024-07-27T22:18:07Z

@chris-reeves Are you still working on this?

Sorry, I missed the previous comment (which I've now responded to). I'm still keen to see this merged as our current workaround is rather hacky.

github-actions · 2024-08-27T00:11:57Z

A friendly reminder that this PR had no activity for 30 days.

rhatdan · 2024-08-27T09:32:43Z

@nalind PTAL

github-actions · 2024-09-27T00:13:19Z

A friendly reminder that this PR had no activity for 30 days.

openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 14, 2024

chris-reeves changed the title ~~Make buildah handling of ulimits match podman~~ Make buildah's handling of ulimits match podman Jun 14, 2024

flouthoc reviewed Jun 16, 2024

View reviewed changes

chris-reeves requested a review from flouthoc June 17, 2024 00:16

chris-reeves force-pushed the match-podman-ulimits branch from d054820 to 6589a2e Compare June 17, 2024 20:34

chris-reeves added 4 commits June 20, 2024 23:04

Make limits tests have consistent wording

1036cfd

This is really a no-op to keep the validate_test check happy... Signed-off-by: Chris Reeves <[email protected]>

Check return value of Setrlimit() to keep linter happy

6646b57

Signed-off-by: Chris Reeves <[email protected]>

chris-reeves force-pushed the match-podman-ulimits branch from 6589a2e to 6646b57 Compare June 20, 2024 22:05

giuseppe approved these changes Jul 15, 2024

View reviewed changes

openshift-ci bot added the approved label Jul 15, 2024

nalind reviewed Jul 15, 2024

View reviewed changes

github-actions bot added the stale-pr label Aug 27, 2024

rhatdan removed the stale-pr label Aug 27, 2024

github-actions bot added the stale-pr label Sep 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make buildah's handling of ulimits match podman #5590

Make buildah's handling of ulimits match podman #5590

chris-reeves commented Jun 14, 2024 •

edited

Loading

packit-as-a-service bot commented Jun 14, 2024

flouthoc left a comment

chris-reeves commented Jun 17, 2024

TomSweeneyRedHat commented Jun 17, 2024

chris-reeves commented Jun 17, 2024

TomSweeneyRedHat commented Jun 20, 2024

chris-reeves commented Jun 20, 2024

rhatdan commented Jun 21, 2024

rhatdan commented Jun 21, 2024

chris-reeves commented Jul 2, 2024

rhatdan commented Jul 15, 2024

giuseppe left a comment

openshift-ci bot commented Jul 15, 2024

nalind Jul 15, 2024

chris-reeves Jul 27, 2024

rhatdan commented Jul 27, 2024

chris-reeves commented Jul 27, 2024

github-actions bot commented Aug 27, 2024

rhatdan commented Aug 27, 2024

github-actions bot commented Sep 27, 2024

Make buildah's handling of ulimits match podman #5590

Are you sure you want to change the base?

Make buildah's handling of ulimits match podman #5590

Conversation

chris-reeves commented Jun 14, 2024 • edited Loading

What type of PR is this?

What this PR does / why we need it:

How to verify it

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

packit-as-a-service bot commented Jun 14, 2024

flouthoc left a comment

Choose a reason for hiding this comment

chris-reeves commented Jun 17, 2024

TomSweeneyRedHat commented Jun 17, 2024

chris-reeves commented Jun 17, 2024

TomSweeneyRedHat commented Jun 20, 2024

chris-reeves commented Jun 20, 2024

rhatdan commented Jun 21, 2024

rhatdan commented Jun 21, 2024

chris-reeves commented Jul 2, 2024

rhatdan commented Jul 15, 2024

giuseppe left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Jul 15, 2024

nalind Jul 15, 2024

Choose a reason for hiding this comment

chris-reeves Jul 27, 2024

Choose a reason for hiding this comment

rhatdan commented Jul 27, 2024

chris-reeves commented Jul 27, 2024

github-actions bot commented Aug 27, 2024

rhatdan commented Aug 27, 2024

github-actions bot commented Sep 27, 2024

chris-reeves commented Jun 14, 2024 •

edited

Loading