-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
overlay: use private merged directory for AIS #2036
overlay: use private merged directory for AIS #2036
Conversation
not ready for review, I am still validating it |
6fa7b85
to
b4ad83d
Compare
b4ad83d
to
5c98373
Compare
store.go
Outdated
store.stopReading() | ||
if exists { | ||
return nil, fmt.Errorf("mounting read/only store images is not allowed: %w", ErrLayerUnknown) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn’t read this carefully, just a skim: Doesn’t this mean that it will be completely impossible to push an image which contains base layers in an additional image store and some child layers in the primary store, because the base layers will not be exportable?
Or is the assumption that the graph driver will iterate through the additional stores without the top levels knowing? In that case 1) this might not be reachable, but also 2) that completely gives up on store locking; why is that OK?
(I know I don’t understand the AIS indirections, possible combinations, and locking, and quite possibly I would save everyone time by just staying out of this.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the Diff/Changes operation can trigger a mount, we didn't notice the race before because native overlay doesn't need it (we just access the layer diff), but how can we ensure two processes do not step on each other when mount/unmount are involved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I don’t know how to solve this.
I continue to be confused about how AIS is supposed to work, I’m just noting that the PR, as is now, means that either things that used to work will stop working, or the PR is adding unnecessary code (and perhaps locking problems).
It seems to me that the basic situation is not even really additional-store related that much. See the LOCKING BUG
comments in layers.go
, isn’t that the same issue? (Again, saying that doesn’t mean I know how to fix it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should I relax the condition for allow mount from other stores but keep the writing lock for the rw store?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the primary driver is consulting AIS layers anyway, I don’t think that checking the AIS layers here, again, makes things any better.
But really my first intuition is that the graph driver should not be accessing the AIS layers itself, in the first place — but then that’s clearly not reasonable for mounting a child layer depending on a parent in AIS.
I’m just too confused about the code to have an opinion.
@@ -1875,13 +1877,21 @@ func (d *Driver) get(id string, disableShifting bool, options graphdriver.MountO | |||
return mergedDir, nil | |||
} | |||
|
|||
func (d *Driver) getMergedDir(id, dir string, inAdditionalStore bool) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one seems worthy of a doc comment, say
/// getMergedDir returns the directory path that should be used as the mount point for the overlayfs
@@ -1875,13 +1877,21 @@ func (d *Driver) get(id string, disableShifting bool, options graphdriver.MountO | |||
return mergedDir, nil | |||
} | |||
|
|||
func (d *Driver) getMergedDir(id, dir string, inAdditionalStore bool) string { | |||
if inAdditionalStore { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also
if inAdditionalStore { | |
// The merged directory may not exist in an additional store, so use a transient mount point in `/run` instead. | |
if inAdditionalStore { |
or so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
… and if locking is a concern, and is happening in a non-obvious way, that also should be written down somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason for using the path under /run
is that we hold only a read-only lock for the additional stores, so different processes can race when mounting/unmounting there and that is the root cause for #2033
if inAdditionalStore { | ||
// check the base name for extra safety | ||
if strings.HasPrefix(mountpoint, d.runhome) && filepath.Base(mountpoint) == "merged" { | ||
err := os.RemoveAll(filepath.Dir(mountpoint)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should never actually be any content underneath the merged
directory, right? We should be able to safely just use os.Remove
I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can split this one in os.Remove(mountpoint); os.Remove(filepath.Dir(mountpoint))
, I thought the RemoveAll
version was more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's clearly more readable to use RemoveAll
, but the two-step Remove
is a much stronger "extra safety" than checking the pathname. I don't have a strong opinion though.
Signed-off-by: Giuseppe Scrivano <[email protected]>
5c98373
to
120642a
Compare
120642a
to
05659bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A higher level comment, looking at the store
changes:
-
This adds exclusive locking to
Diff
. IIRC previously we have tried fairly hard to make that only hold a shared lock. I don’t know anything specific about the performance impact, apart from vaguely remembering this is supposed to be a big deal. Do we know that this is actually fine? -
A blocker for formal reasons: If I understand it correctly, the overlay graph driver how requires
layerStore.Diff
/layerStore.Changes
to hold the layer store lock for writing for the primary store. If that’s correct, I think that must be documented in thelayerStore
methods (// Requires startReading or startWriting.
) — and due to the unusual design, including the reason; typically we only require “writing” when modifyinglayerStore
state, so if this is the overlay driver requiring exclusivity, that reason must be documented there as well.- On second thought, wouldn’t it be viable for
overlay
to introduce its own special-purpose lock? Such a lock could only be used use it in the relevant cases, instead of this being visible at the top level.
- On second thought, wouldn’t it be viable for
@@ -2930,15 +2930,40 @@ func (s *store) Unmount(id string, force bool) (bool, error) { | |||
} | |||
|
|||
func (s *store) Changes(from, to string) ([]archive.Change, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn’t find anything, apart from cmd
here, that is using Changes
. It’s tempting to think that it should be deprecated or somehow made private instead of expanding functionality. That’s probably not this PR.
The obvious impact is that serializing the
Thinking about this a bit more, for concurrency, we would want the mounts to be reference-counted; and automatically cleaned up on crash; and at that point sharing cross-process is ~free. And we already have that kind of infrastructure in … except that it does not apply to the read-only AIS layers, at all. And, of course, the |
… and it’s actually being used in this |
when NaiveDiff is used, the Diff/Changes operations can trigger the mount of the layer. Prevent that multiple processes step on each other and one of them performs an unmount while the other one is still accessing the mount. Closes: containers#2033 Signed-off-by: Giuseppe Scrivano <[email protected]>
05659bf
to
1e60d87
Compare
Signed-off-by: Giuseppe Scrivano <[email protected]>
Signed-off-by: Giuseppe Scrivano <[email protected]>
but a Diff can trigger a mount, as it is in the race condition I am trying to fix; so the locking here shouldn't be too different from what we do in
it is every driver except plain native overlay to require that, and vfs since there is no "mount" operation for vfs.
Every driver, except vfs and native overlay, require a "mount" operation to access the files in the layer, so IMO it is better to make it the default behavior |
use a private "merged" directory when mounting from an additional store. Operations like "Diff()" and "Changes()" cause an implicit mount when the naive differ is used. The issue was not observed earlier because native overlay can achieve these operations without using a mount. Since these mounts are performed read-only, and overlay supports multiple mounts using the same lowerdirs, use a private location for the "merged" directory. The location is owned by the current writeable store, that is locked for writing. Closes: containers#2033 Signed-off-by: Giuseppe Scrivano <[email protected]>
1e60d87
to
683e806
Compare
tests pass both for Podmand and Buildah: |
@cgwalters @rhatdan @mtrmac if there are no blockers, let's get this one and #2031 merged so we can enable the Podman CI, we can fix/improve things later but at least we have a more solid base knowing the e2e tests pass |
I'm not going to pretend I understand all this code well, but narrowing in on this comment:
The model we have with AIS is basically that there's only one writable store, correct? In that case, I think I understand how this fixes the issue.
And you're saying we're not using that code though here? I'm not sure if we are or not, but it looks to me like this patch is mostly just modifying the location of the mounts. |
I guess bottom line I can just say |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters, giuseppe The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Right, and we can’t do an exclusive lock to mount, then downgrade to a shared lock for creating the diff (that part could work), and then upgrade again to unmount (we don’t have code that could do this, at least ATM, and I’m not immediately sure that can be done without risking deadlocks). I understand, at least in general terms, the difficulty.
Right now, making If composefs So I think we need to figure out a way to keep this read-locked for overlay, and until we do, composefs just isn’t ready. But I’m fully open to being overridden on this.
The reason I am considering this paragraph “blocker” is that I want (This is the only “blocker” from my side, to the extent that I’m ignorant about |
I was mistaken here. We are using the mounts reference counting … but that reference counting at the primary store also, AFAICS, assumes that layer records exist on the primary store; looking at that yesterday, I didn’t see how it can work for counting additional-store layers that are not known to the primary store. But, also, I’m unfamiliar with the details, and strongly time-constrained right now; this might well be fine. |
One thing I would say here is I see an intersection with #2018 ...if what we mounted was always a single flattened composefs image, the only cross-store dependency would be refcounting of the underlying files behind the composefs (cc containers/composefs#125 ), but with a read-only additional store we would be guaranteed that the underlying objects wouldn't go away. |
this is probably already broken with fuse-overlayfs, just nobody hit the race, because with fuse-overlayfs we are using the naive diff (thus causing the mount) |
If this is pointing at the start of … but I’m a bit lost at how that happens. AFAICS all calls (in Podman) are |
… and mistaken again? The I’m sorry. I’ve not been helping. |
/lgtm |
... as modified by #2036 . Also document a LOCKING BUG I don't now care to fix. Signed-off-by: Miloslav Trmač <[email protected]>
#2080 . |
use a private "merged" directory when mounting from an additional store.
Operations like "Diff()" and "Changes()" cause an implicit mount when the naive differ is used.
The issue was not observed earlier because native overlay can achieve these operations without using a mount.
Since these mounts are performed read-only, and overlay supports multiple mounts using the same lowerdirs, use a private location for the "merged" directory. The location is owned by the current writeable store, that is locked for writing.
Closes: #2033