-
Notifications
You must be signed in to change notification settings - Fork 931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auth: Fix missing snapshots and backups from storage pool used-by URLs #14324
Conversation
Heads up @mionaalex - the "Documentation" label was applied to this issue. |
tests are sad |
a51ac04
to
86d9a32
Compare
@tomponline tests are mostly green except for one: https://github.com/canonical/lxd/actions/runs/11463430112/job/31913802942#step:12:52804 I'm not certain why this is failing as it doesn't seem to have anything to do with this PR. It is potentially related to #14315 since Edit: Note that this also doesn't fail locally. I'll have to get a tmate session running. |
@hamistao please can you check this out, thanks Seems like a panic. |
@tomponline @hamistao The CI passed on the third attempt. I'll investigate a bit more though as I don't want to introduce any races, especially when they may be causing a panic. |
I've been investigating this for an hour or so with no progress. It would be very useful to surface panics in the test logs. I'm trying to figure out a way to do this. |
Did you identify which commit introduced it yet? Did you try reverting the earlier profiles PR? |
With it being intermittent I didn't think reverting the profiles PR would tell me very much (e.g. I'll need to figure out where that panic is occurring in either case). I've added a commit to check LXD logs for panics. It's failing on standalone tests but not in the cluster tests which is a bit odd. Still investigating. |
4b56433
to
bea0552
Compare
I've re-run the test 8 times now and the panic only occurred on the first two runs. I've added a PR to handle panics a bit more cleanly in the future (#14346). If it happens again it should be obvious where it occurred. |
Of course it fails again as soon as I move the panic checker work into another PR 🤦 |
320408e
to
6b8b335
Compare
Signed-off-by: Mark Laing <[email protected]>
Signed-off-by: Mark Laing <[email protected]>
The auth.ValidateEntitlement function validates all entitlements that can be granted via the API. Since the new entitlements on snapshots and backups cannot be granted via the API, this check fails. The OpenFGA server will return an error if an invalid query is performed based on it's own understanding of the authorization model. Signed-off-by: Mark Laing <[email protected]>
Signed-off-by: Mark Laing <[email protected]>
Previously the only entities that had inherited relations were project and server. Now that we are linking instances and storage volumes to their snapshots and backups, the OpenFGADatastore implementation needs to handle these relations. On Read, we can connect a snapshot or backup to its parent instance or storage volume using the information stored in its URL. For example, the storage volume backup URL: /1.0/storage-pools/default/volumes/custom/vol1/backups/backup1?project=project1 is related to its parent: /1.0/storage-pools/default/volumes/custom/vol1?project=project1 via the `storage_volume relation`. Signed-off-by: Mark Laing <[email protected]>
…tartingWithUser. Previously the only entities that had inherited relations were project and server. Now that we are linking instances and storage volumes to their snapshots and backups, the OpenFGADatastore implementation needs to handle these relations. On ReadStartingWithUser, the function needs to return all backups or snapshots that are related to a parent instance or storage volume. This is used in the `ListObjects` call to the OpenFGA server, which is used by `(auth.Authorizer).GetPermissionChecker`. To do this, I have naively queried for all snapshots or backups in the project, and filtered out those that don't have the correct parent. This keeps the implementation simple and makes use of `GetEntityURLs`, which performs as few queries as possible. Further optimisation may be needed. Signed-off-by: Mark Laing <[email protected]>
We can now use the `can_view`, `can_edit`, and `can_delete` entitlements with instance backups and snapshots. We should do this so that our checks more accurately reflect the authorization model. Signed-off-by: Mark Laing <[email protected]>
The access handler was performing some logic to determine the location of the storage volume for use in the access check. This was based on whether the storage pool is remote, and if not, the cluster member where the volume is located. This commit removes that logic and adds a "location" field to `storageVolumeDetails` so that it can be used in the handlers. The logic for determining the location is modified to suit the call site. It is only set when the pool is not remote. Signed-off-by: Mark Laing <[email protected]>
The storage volume snapshot and backup access handlers need to share almost identical logic to the storage volume access handler. Including getting the storage pool, understanding if the storage volume is located on another cluster member, and so forth. This commit parameterises the function so that it can be used by the snapshot and backup entity types as well; creating and checking against the correct URL when called. Signed-off-by: Mark Laing <[email protected]>
Signed-off-by: Mark Laing <[email protected]>
We can now check `can_view`, `can_edit`, and `can_delete` against the backup/snapshot itself. We should do so to more accurately reflect the authorization model. Signed-off-by: Mark Laing <[email protected]>
Signed-off-by: Mark Laing <[email protected]>
Signed-off-by: Mark Laing <[email protected]>
ca1786e
to
16fc177
Compare
a3a40ab
to
0f45b26
Compare
A suspected panic occurred in #14324. This happened twice but is intermittent and does not occur locally. The test suite has now passed for that PR 6/8 times. To surface panics more cleanly when investigating the issue, I added a python script to search LXD logs for a panic. Panics are logged at info level because the net/http package has a default `recover()` when a handler is run. The script is run after each test suite to ensure the test fails if it panics. The script is only run when `LXD_VERBOSE` or `LXD_DEBUG` are set however. This is because the panic log entry is at info level, and is not written if `--verbose` or `--debug` are unset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of interest, what prevents this?
These entitlements cannot be assigned to identities, service accounts, or group members.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is enforced by openfga. Notice that normally an entitlement has [identity, service_account, group#member]
next to it. This means that those types can have that relation. When generating the map of entity type to entitlement, that's what we look for. So auth.ValidateEntitlement
is checking that the entitlement can be granted to a group member.
In this case we're defined the relations such that they can only be granted via the parent instance. In fact, no types can be directly related to an instance snapshot except for instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@@ -134,11 +134,6 @@ func (e *embeddedOpenFGA) CheckPermission(ctx context.Context, entityURL *api.UR | |||
return fmt.Errorf("Failed to parse entity URL: %w", err) | |||
} | |||
|
|||
err = auth.ValidateEntitlement(entityType, entitlement) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have a test for trying to assign an invalid entitlement? (to check openfga does indeed block this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean a check for say:
! lxc auth group permission add instance_snapshot c1-snap can_view || false # Cannot be assigned via API
Or a check where we manipulate the auth_groups_permissions
table to insert a permission that is invalid from the perspective of OpenFGA?
Signed-off-by: Mark Laing <[email protected]>
0f45b26
to
a5de620
Compare
Signed-off-by: Mark Laing <[email protected]>
defer func() { | ||
// Only set the location if the pool is not remote. | ||
if details.pool.Driver() != nil && !details.pool.Driver().Info().Remote { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a nit, but you're calling .pool.Driver()
twice here, would be more efficient to call it once and assign to a var locally right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great commit and code comments. Thanks!
The underlying cause of this bug was that general filtering of used-by URLs makes the assumption that the
can_view
entitlement is available for all entity types. It is a fair assumption, but wasn't true for storage volume or instance backups or snapshots.To fix this, four new entity types have been added to the authorization model:
instance_backup
instance_snapshot
storage_volume_backup
storage_volume_snapshot
Each has associated entitlements:
can_edit
can_view
can_delete
It is still not possible to grant these entitlements via the API. Instead, they are granted via
can_manage_snapshots
orcan_manage_backups
on the associated instance or storage volume.The OpenFGADatastore implementation has been updated to handle
instance
andstorage_volume
relations between the parent and it's snapshots/backups.Closes #14291