Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anpl 1704 s3 folders use task queue #1204

Merged
merged 21 commits into from
Sep 15, 2023

Conversation

michaeljcollinsuk
Copy link
Contributor

@michaeljcollinsuk michaeljcollinsuk commented Sep 11, 2023

📝 Summary

This PR closes #ANPL-1204

This PR updates the task handlers to run actions related to create folders, and grant/revoke access to folders.
Revoking access to S3 buckets (users and apps) will also be handled by the task queue

🔍 What should the reviewer concentrate on?

  • Changes to the task handlers, which moves logic around granting/revoking access to the task handlers
  • The send_task logic when creating a datasource, which allows us to wait for a db transaction to complete before sending tasks to create the datasource in AWS

🧑‍💻 How should the reviewer test these changes?

  • Set feature flag for folders to true in settings.yaml
  • Make sure you have the celery worker running (using redis recommended)
  • Create a new folder, click “open with AWS” to check you can access
  • In the AWS console, create a new folder - this should be permitted as you have write access
  • Back in Control Panel, update access for the user to readonly. Reopen in AWS, try to create a new folder/add file, check that this fails (may take a few secs for permissions to update)
  • Back in CP, revoke access for the user. Wait a few secs, then use “open with AWS” to try to access the folder, check this fails
  • Back in CP, add access to the user again. Click open in AWS and keep the tab open. Then in CP, delete the datasource - wait a few secs, and then in the other tab refresh (may need to refresh couple of times) and check that permission has been revoked as part of the delete.
  • Repeat these steps with the feature flag disabled to test with S3 buckets.

📚 Documentation status

  • No changes to the documentation are required
  • This PR includes all relevant documentation
  • Documentation will be added in the future because ... (see #ANPL-...)

self.complete()


class S3BucketRevokeUserAccess(BaseTaskHandler):
Copy link
Contributor Author

@michaeljcollinsuk michaeljcollinsuk Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ymao2 wanted to discuss with you the need for checking the user running the task when removing access, but also more broadly.

  1. In the case that a user revokes access to an S3 bucket (or folder) then it is possible that we can use the ID of the user making this action to look them up, lookup the datasource object from the DB, and check that the user is a bucket admin for that object. Based on this, we can then grant/reject the request to revoke access to the user.

  2. In the case that a bucket admin deletes a datasource (bucket or folder), we will also call the revoke access action for every user that has access to the datasource. However, by the time the message is picked out of the queue and run, the DB object will already have been deleted - so there is no way that we can check that the user that triggered the task to be created is a bucket admin.

Due to this, as you can see in the code I have not implemented any logic within the task handler to check that the user who triggered the task is a bucket admin. Doing so is not possible due to the cascading nature of the deletes that occurs when a user deletes a datasource.

However, I think this is actually ok because the only way for a user to trigger the delete (or revoke access) is via the control panel views - and these views already implement permission checks to ensure that the user has correct permissions to revoke/delete a datasource. Which means that by the time that a delete/revoke occurs, we should already have checked the user has permissions. But what do you think, is this a reasonable assumption to make?

This also made me think more generally about the other tasks, and if we really need to pass the user ID when sending a task message (which we currently do by passing the current_user in various places)? Because wherever we are exposing an action that triggers a task to be added to the queue through a user action, permissions should already have been checked. Of course for some tasks we may want to enforce some verification before the action is run, but everything we have implemented so far I am not sure that it is necessary.

Long message so can discuss on slack/call if easier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are a few things raised from the above case, probably need some discussions

  • How the Db object life cycle related to the related external resources (tasks)?
    if we consider the db as the place for keeping the facts, then it would be ideal all the db records should be in placed before firing the tasking, the records will be removed only when the external resources are removed. --> but I know it may make the coding more complicated.

  • What sort of validation should be done on tasks? this one is related to your question whether we should validate the permission, I think it is really depending on individual task, for our cases so far, as we validate them against the Control panel which in a way are duplicated as you said, I am happy to not validate it
    but just bear in mind, in the future we will separate the tasks handler part out, and also task may be triggered by other components other than cpanel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on call, I have removed the permission checks in the task handlers, as these are currently repeating checks that have already occurred. Instead, validation/permission checks should be optionally added per-task based on the requirements of running the task itself. There is further refactoring that could be done to simplify the handlers, but will tackle this seperately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, when the soft_delete functionality is added, this will make it easier to add any checks e.g. when revoking access after deleting a datasource.

@michaeljcollinsuk michaeljcollinsuk force-pushed the ANPL-1704-s3-folders-use-task-queue branch from 70059a7 to 07550e5 Compare September 13, 2023 16:01
@michaeljcollinsuk michaeljcollinsuk force-pushed the ANPL-1704-s3-folders-use-task-queue branch from 07550e5 to 928238f Compare September 13, 2023 16:02
@michaeljcollinsuk michaeljcollinsuk force-pushed the ANPL-1704-s3-folders-use-task-queue branch from 22a503e to 901a2b8 Compare September 13, 2023 16:35
Current implementation of the permission checks are unnecessary as
they are simply repeating checks that have already occurred in the
views that trigger creating tasks. Permissions should instead be
implemented as required, where necessary.
@michaeljcollinsuk michaeljcollinsuk marked this pull request as ready for review September 14, 2023 11:35
Copy link
Contributor

@ymao2 ymao2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the changes, looks great! 👍 👍
I tested your PR locally, all look good ( but didn't do full set of testings 😊 )

LGTM :)

@michaeljcollinsuk michaeljcollinsuk merged commit 1bb86dd into main Sep 15, 2023
4 checks passed
@michaeljcollinsuk michaeljcollinsuk deleted the ANPL-1704-s3-folders-use-task-queue branch September 15, 2023 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants