-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restrict media privacy until a referencing page is published #46
base: main
Are you sure you want to change the base?
Conversation
3ba9bfb
to
b856066
Compare
b856066
to
9a52eb5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - The doc was simple to understand and follow. I also tested as how to review suggested.
Everything was as said but for Testing media privacy after a page has been published,
media/images/{image filename}.fill-446x390.format-webp-.original.png
showed as media/images/{image filename}.original.png
instead. I presume this is because I didn't set any fill/format for my images so I don't believe this to be important.
Just requesting an update on hyperlink :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work.
I'm happy for it to be separate, but I suggest we disable the Media library from Admin. It will add an unnecessary layer of complexity.
cms/private_media/bulk_operations.py
Outdated
results[file] = handler(file) | ||
|
||
with concurrent.futures.ThreadPoolExecutor( | ||
max_workers=int(settings.PRIVATE_MEDIA_PERMISSION_SETTING_MAX_WORKERS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor, could be something shorter like: MEDIA_PERMISSION_MAX_WORKERS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to stick with the PRIVATE_MEDIA_
prefix, as it makes it easier to mentally map settings to apps in the project. But, I do find the current name a little confusing, so have changed to PRIVATE_MEDIA_BULK_UPDATE_MAX_WORKERS
in ce09c50.
Whilst it's still a bit lengthy, I think clarity is more important when it comes to settings.
cms/private_media/bulk_operations.py
Outdated
if intended_privacy is Privacy.PRIVATE: | ||
handler = getattr(storage, "make_private", None) | ||
elif intended_privacy is Privacy.PUBLIC: | ||
handler = getattr(storage, "make_public", None) | ||
|
||
if handler is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For private media, I would like to enforce the fact that the storage backend must have these handlers. For local dev, the handles can "log out the action". I would rather not have it pass silently if storage doesn't support it; I would like to take the approach of the storage backend must implement it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Funnily enough, that's how I had it originally, but I changed my mind at some point. Should be very easy to restore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved in 16a05fb
) | ||
|
||
def handle(self, *args: Any, **options: Any) -> None: | ||
self.dry_run = options["dry_run"] # pylint: disable=attribute-defined-outside-init |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor, add as class attr like the publish_bundles.py
command does instead of the pylint disable?. The key point is consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been resolved
for model in get_private_media_models(): | ||
queryset = model.objects.filter(file_permissions_last_set__lt=F("privacy_last_changed")) | ||
if not self.dry_run and issubclass(model, PrivateImageMixin): | ||
queryset = queryset.prefetch_related("renditions") | ||
permissions_outdated = list(queryset) | ||
self.stdout.write(f"{len(permissions_outdated)} {model.__name__} instances have outdated file permissions.") | ||
if permissions_outdated: | ||
make_private = [] | ||
make_public = [] | ||
for obj in permissions_outdated: | ||
if obj.privacy is Privacy.PRIVATE: | ||
make_private.append(obj) | ||
elif obj.privacy is Privacy.PUBLIC: | ||
make_public.append(obj) | ||
|
||
self.update_file_permissions(model, make_private, Privacy.PRIVATE) | ||
self.update_file_permissions(model, make_public, Privacy.PUBLIC) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: I think it is easier to read if we added some newlines and exit early to reduce indentation. It feels a bit crowded.
for model in get_private_media_models():
queryset = model.objects.filter(file_permissions_last_set__lt=F("privacy_last_changed"))
if not self.dry_run and issubclass(model, PrivateImageMixin):
queryset = queryset.prefetch_related("renditions")
permissions_outdated = list(queryset)
self.stdout.write(f"{len(permissions_outdated)} {model.__name__} instances have outdated file permissions.")
if not permissions_outdated:
return None
make_private = []
make_public = []
for obj in permissions_outdated:
if obj.privacy is Privacy.PRIVATE:
make_private.append(obj)
elif obj.privacy is Privacy.PUBLIC:
make_public.append(obj)
self.update_file_permissions(model, make_private, Privacy.PRIVATE)
self.update_file_permissions(model, make_public, Privacy.PUBLIC)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Funnily enough, that's how I had it originally, but it was before I broke handling off to update_file_permissions
, and pylint had a problem with the number of return statements. It should be okay to do that again now that update_file_permissions
exists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved in 7f0de81
self.update_file_permissions(model, make_public, Privacy.PUBLIC) | ||
|
||
def update_file_permissions( | ||
self, model_class: type["PrivateMediaMixin"], items: list["PrivateMediaMixin"], privacy: Privacy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type annotation for args should be abstract unless the method needs it to be a list
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely agree where it comes to reusable code, but in the case of management commands (which are self-contained, and only really reusable if you're defining a base class for other commands to inherit), I think being explicit adds value. Less... 'how you might use me' and more 'this is exactly how i'm being used'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 5198943
.values_list("int_object_id", flat=True) | ||
.distinct() | ||
) | ||
queryset = model_class.objects.filter(pk__in=referenced_pks, _privacy=Privacy.PRIVATE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If referenced_pks
is empty, continue to the next model instead of eventually coming out of the loop?.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue there is that checking the result at this point will evaluate the query (a separate database round trip), currently, all of the subqueries remain lazy until the final queryset is evaluated, and postgres handles it in one round-trip).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest avoid this change based on the above reasoning. @MebinAbraham are you happy with that?
# Identify references from other pages and extract their IDs, so that we can | ||
# figure out which of those pages is live | ||
referencing_page_ids = ( | ||
ReferenceIndex.objects.filter(to_content_type=model_ct, to_object_id__in=referenced_pks) | ||
.exclude(pk__in=references.values_list("pk", flat=True)) | ||
.filter(base_content_type=page_ct) | ||
.annotate(page_id=Cast("object_id", output_field=IntegerField())) | ||
.values_list("page_id", flat=True) | ||
.distinct() | ||
) | ||
|
||
# Out of the pages that are referencing the media, identify the ids of those | ||
# that are currently live | ||
live_page_ids = ( | ||
Page.objects.filter(pk__in=referencing_page_ids) | ||
.live() | ||
.annotate(str_id=Cast("id", output_field=CharField())) | ||
.values_list("str_id", flat=True) | ||
) | ||
|
||
# Now we can identify references from live pages only, and | ||
# generate a list of media item ids that should not be made private | ||
live_page_referenced_media_pks = ( | ||
ReferenceIndex.objects.filter(to_content_type=model_ct, to_object_id__in=referenced_pks) | ||
.filter(base_content_type=page_ct, object_id__in=live_page_ids) | ||
.annotate(int_id=Cast("to_object_id", output_field=IntegerField())) | ||
.values_list("int_id", flat=True) | ||
.distinct() | ||
) | ||
|
||
queryset = ( | ||
model_class.objects.all() | ||
.filter(pk__in=[int(pk) for pk in referenced_pks], _privacy=Privacy.PUBLIC) | ||
.exclude(pk__in=live_page_referenced_media_pks) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if they will be very performant in the long term. One to revisit, as this might be quite large in the future. Unless there is a user need, it might be better to treat images as unique, so accept additional storage, meaning we can replace this involved check with a more simple one; worst case, it should only allow reusability within the same model, not across models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has now been refactored so that the potential list of pages only includes those that reference the same media referenced by the current object. If, like you say, that case is rare, it should be very small indeed.
Its always better for important code like this to assume as little as possible about what other things might be doing, so even if we did manage to somehow restrict media use to a single object (I currently have no idea how that could be achieved nicely), I'd still very much recommend keeping this.
Where the queryset result isn't explicitly evaluated to a list/set beforehand, Django usually keeps everything lazy and has the database evaluate everything in one go, so this is really quite performant (there should be some assertNumQueries items in the tests to confirm this). So, it isn't that bad at all performance-wise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MebinAbraham Happy to mark this as resolved based on the above explaination?
cms/private_media/views.py
Outdated
if image.is_public and not image.file_permissions_are_outdated(): | ||
return redirect(rendition.file.url) | ||
|
||
# Block access to private images if the user has insufficient permissions | ||
user = self.request.user | ||
if image.is_private and ( | ||
not user.is_authenticated | ||
or not permission_policy.user_has_any_permission_for_instance(user, ["choose", "add", "change"], image) | ||
): | ||
raise PermissionDenied |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to ensure that the file can still be served if the image is private but has public pages referencing it. For example, if a page is published but the ACLs are not properly set due to network issues etc, the application should still serve the image until the next scheduled task attempts to fix and retry the ACL settings. We cannot have a page that features broken assets. The same principle applies to documents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is covered by signal handlers. If a page features any media that is private, it is made public when those changes are published, and whilst the page remains published, media referenced by it will always remain public.
Also, the _privacy
field should always be the canonical indicator of privacy for a media item. If we're having to make additional queries at the time of serving to figure out if that is really the case, then we've failed terribly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if a page is published but the ACLs are not properly set due to network issues etc, the application should still serve the image until the next scheduled task attempts to fix and retry the ACL settings.
This particular case is handled by the last line of the method. The image is 'public' according to the system, so won't be permission-checked. It'll just be served, using the server's AWS creds to stream the file contents.
def protect_private_documents(document: "PrivateDocumentMixin", request: "HttpRequest") -> None: | ||
"""Block access to private documents if the user has insufficient permissions.""" | ||
if document.is_private and ( | ||
not request.user.is_authenticated | ||
or not permission_policy.user_has_any_permission_for_instance( | ||
request.user, ["choose", "add", "change"], document | ||
) | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, we need to be able to serve even if it is private just incase ACL set failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documents are different to images in that they are ALWAYS served by the view, regardless of privacy values. So, in essence, the ACL is irrelevant here, as the storage backend uses its credentials to stream the file from AWS
cms/private_media/storages.py
Outdated
logger = logging.getLogger(__name__) | ||
|
||
|
||
class PrivacySettingS3Storage(S3Storage): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I don't find PrivacySetting
very clear; what about something like PrivacyAwareS3Storage
or AccessControlledS3Storage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, naming things!! I like the 2nd suggestion. Let's go with that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved in 60fdb15
…t the necessary methods
…ment a version of FileSystemStorage to use in local dev (and by extension, tests)
74dae78
to
b581823
Compare
19376c0
to
5198943
Compare
…ernate in-memory storages to reduce need for mocking in tests
021a9dd
to
faefcac
Compare
…eMixin.get_privacy_controlled_serve_urls() to return an empty iterable
What is the context of this PR?
This is a follow-up to #42 that focusses on addressing solely public/private status of media, without attempting to tie objects to pages to aid with permission policy customisation.
It's fully functional and has lots of tests (which are worth checking out to get a feel for the intended behaviour).
A brief summary:
When a media item's privacy changes:
file_permissions_last_set
timestamp is updated for the media item, and the object'sfile_permissions_are_outdated()
method will returnFalse
.file_permissions_last_set
is not updated, and thefile_permissions_are_outdated()
method will returnTrue
.How to review
Testing media privacy for a draft page
src
value of the image rendition should look something like/images/{secure key}/2/original/{image filename}
, which means it's being served by the 'serve' view.href
value of the document link should look something like/documents/
src
URL in new tabs (they should still work).Testing media privacy after a page has been published
src
value should have changed to something like:media/images/{image filename}.fill-446x390.format-webp-.original.png
.src
value should look the same as it did on the live version.src
value should now again look like:/images/{secure key}/2/original/{image filename}
.Testing image privacy once a draft page has been published
src
value should look something like:media/images/{image filename}.fill-446x390.format-webp-.original.png
.src
value should look the same as it did on the live version.src
value should now again look like:/images/{secure key}/2/original/{image filename}
.Follow-up Actions
PRIVATE_MEDIA_PERMISSION_SETTING_MAX_WORKERS
is set appropriately for all target environmentspython manage.py retry_file_permission_set_attempts
is set to run on a cron every 10 minutes in all target environments