Create sensitive and deleted media models for decisions #4544

dhruvkb · 2024-06-24T10:17:06Z

Fixes

Description

This PR ensures that SensitiveImage/SensitiveAudio and DeletedImage/DeletedAudio models are created for every decision.

Testing Instructions

Create a report for an image.
Take a deindexed-sensitive or deindexed-copyright action on the report.
- You should see a new DeletedImage object.
- The image should be deleted.
- The report should be resolved with a linked decision.
Create a report for an image.
Take a marked-sensitive action on the report.
- You should see a new SenstiveMedia object.
- The image should be marked as sensitive.
- The report should be resolved with a linked decision.

Repeat these steps for audio.

Checklist

My pull request has a descriptive title (not a vague title likeUpdate index.md).
My pull request targets the default branch of the repository (main) or a parent feature branch.
My commit messages follow best practices.
My code follows the established code style of the repository.
I added or updated tests for the changes I made (if applicable).
I added or updated documentation (if applicable).
I tried running the project locally and verified that there are no visible errors.
I ran the DAG documentation generator (./ov just catalog/generate-docs for catalog
PRs) or the media properties generator (./ov just catalog/generate-docs media-props
for the catalog or ./ov just api/generate-docs for the API) where applicable.

Developer Certificate of Origin

Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

…n-media through model

github-actions · 2024-06-24T17:30:42Z

This PR has migrations. Please rebase it before merging to ensure that conflicting migrations are not introduced.

AetherUnbound

Makes sense, and looks good to me! I tested this locally and confirmed that the deindexed image gets deleted from the API database, and the sensitive image gets a record in the sensitive table.

…cision_action

sarayourfriend

The blocking change is to fix the issue with non-performant bulk decision creation.

I think we should exclude fixing the backfill from this PR and instead just fix the media admin so that it's working again. And then address the backfill (and lay groundwork for #3840, which needs this anyway) in a separate PR.

sarayourfriend · 2024-06-25T05:58:42Z

api/api/admin/media_report.py

+            through_model = {
+                "image": ImageDecisionThrough,
+                "audio": AudioDecisionThrough,
+            }[self.media_type]


Bit of a nit, but I've seen this pattern a few times in our code and I don't really understand it. Why not use match/case or if/else for this? The inline object approach is a little too clever (and I never understood defining a static object inline of a function body like this either).

Both match and if/else require explicitly raising if self.media_type doesn't match... but isn't that better? It's certainly easier to read and understand to me (and you know the old phrase about how often code is read compared to written, I'm sure).

Suggested change

through_model = {

"image": ImageDecisionThrough,

"audio": AudioDecisionThrough,

}[self.media_type]

match self.media_type:

case IMAGE:

through_model = ImageDecisionThrough

case AUDIO:

through_model = AudioDecisionThrough

case _:

raise ValueError(f"Unknown media type {self.media_type}")

Suggested change

through_model = {

"image": ImageDecisionThrough,

"audio": AudioDecisionThrough,

}[self.media_type]

if self.media_type == IMAGE:

through_model = ImageDecisionThrough

elif self.media_type == AUDIO:

through_model = AudioDecisionThrough

else:

raise ValueError(f"Unknown media type {self.media_type}")

Even better would be to configure it on the admin itself, along with the media type.

Alternatively, if you want to remove all explicit configuration:

Suggested change

through_model = {

"image": ImageDecisionThrough,

"audio": AudioDecisionThrough,

}[self.media_type]

through_model = getattr(media_obj, f"{self.media_type}decisionthrough_set").model

But it really would be better if it was just a @property of media_obj._meta or something...

Suggested change

through_model = {

"image": ImageDecisionThrough,

"audio": AudioDecisionThrough,

}[self.media_type]

through_model = media_obj._meta.decision_through_model

Anyway, any of those would be expected and easier to understand when reading, I think. (Except the getattr one, that's similarly too clever and it's basically not even worth including as is, but would be improved if it didn't need gettatr and could just be media_obj.decisionthrough_set.model if the media type was removed from the name, which would simplify other code too, not just here).

The reason I write use the = {}[] pattern is because it is the most compact among all the alternatives and also raises an exception when none of the keys match the input.

I've done this a few times in this file itself. Would you prefer I change this pattern across the entire file, or keep this as is is in the interest of consistency, or just change it here to one of the alternatives you've suggested?

If the dict-key-access approach is preferred for any reason at all (it is nice it implicitly throws if the key doesn't exist), I'd at the very least think defining the dictionary outside the runtime scope of the function is a reasonable requirement, if only so that it's in a shared location. It's far-fetched to me to establish a pattern of defining otherwise static dictionaries, especially one encoding relationships between static objects, entirely dynamically in the runtime of a function. From a performance perspective it's fine here, but in a tight loop it's just silly, right? From a shared data/relationship encoding perspective (and discoverability, clarity, etc) it's definitely the worst option I can think of 😕. Just seems like an antipattern to me 🤷 I also don't think compactness is necessarily a virtue, and certainly not in Python, which actively resists compactness in my experience.

So my request here, to clarify, is to move the dictionary to a static location, out of the function, or define the relationship in some other concrete manner that isn't specific to this function. That can be either: changing the field names on the models so that they can be referenced generically (without the media type prefix), or adding a class property to the base class that resolves these models for the media type based on the fields, or something else like that. Also, a follow-up issue to remove that pattern anywhere it's been added and replace it with the generic approach (whether that's statically defined dicts or the dynamic-but-shared approach of class properties).

In general: these static relationships between media type and the data models should not be defined within a local function context, even ignoring all issues with legibility, ergonomics, and performance of this local dict approach. At the very least, this static relationship should be defined statically, and in a shared location, so that new code automatically references it, and reducing the risk of someone just copy/pasting this function-local definition of the relationship.

The dict-accessor pattern is fine on its own, it's the inline dict definition I think is an anti-pattern (though I think match/case and an explicit raise of ValueError is clearer than KeyError, but that's an aesthetic judgement, I know, as at the end of the day it's more or less intelligible as the same underlying problem).

Edit: I realise I'm blocking this PR that fixes a bug in the admin on a code-style/quality issue. I do think this needs to change and believe it's an anti-pattern, but won't block the PR. I'll write an issue to address this more widely later today instead.

api/api/management/commands/backfillmoderationdecision.py

api/test/unit/management/commands/test_backfillmoderationdecision.py

api/api/models/media.py

obulat

After you delete a media object using a deindexed_[...] action, admin is trying to open the same change_view form with the same media object. However, because this media object has been deleted, you get an error on line if tags := media_obj.tags:. To prevent it, I added the redirect to the change_view:

        media_obj = self.get_object(request, object_id)
        if media_obj is None:
            return redirect(f"admin:api_{self.media_type}_changelist")

If you do not select any report, and submit a decision (e.g., mark_sensitive), you get no indication of the error except for the warning in the logs (which the moderator using the Admin UI will probably not see). This can be a follow-up issue since this PR is critical, but we should add the error display ("report_id" is required) to the form.

sarayourfriend · 2024-06-25T23:32:59Z

@stacimc pointed out that the backfill doesn't need to "perform" the action at all, because it's just creating the decisions for actions that have already been performed. Glad we removed it already! I'll re-review this today.

openverse-bot · 2024-06-26T00:00:14Z

Based on the critical urgency of this PR, the following reviewers are being gently reminded to review this PR:

@sarayourfriend
@obulat
This reminder is being automatically generated due to the urgency configuration.

Excluding weekend¹ days, this PR was ready for review 1 day(s) ago. PRs labelled with critical urgency are expected to be reviewed within 1 weekday(s)².

@dhruvkb, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings.

Specifically, Saturday and Sunday. ↩
For the purpose of these reminders we treat Monday - Friday as weekdays. Please note that the operation that generates these reminders runs at midnight UTC on Monday - Friday. This means that depending on your timezone, you may be pinged outside of the expected range. ↩

sarayourfriend

LGTM, tests fine locally and I'm unblocking on my requested changes to get the fix into the admin.

Changes implemented

dhruvkb added 4 commits June 24, 2024 14:02

Record models for media, sensitive media and deleted media in decisio…

610622a

…n-media through model

Create deleted media and sensitive media models on save

86d013a

Create through models when performing moderation from the admin

b6d944f

Update backfill command to perform action associated with through models

bcb9417

openverse-bot added 🧱 stack: api Related to the Django API 🟥 priority: critical Must be addressed ASAP 🛠 goal: fix Bug fix 💻 aspect: code Concerns the software code in the repository labels Jun 24, 2024

dhruvkb added 2 commits June 24, 2024 18:13

All media objects to be deleted without affecting past decisions

c9a1fdf

Update tests to account for media items being deleted

8cea252

dhruvkb force-pushed the decision_action branch from 46cf460 to 8cea252 Compare June 24, 2024 14:19

WordPress deleted a comment from github-actions bot Jun 24, 2024

Mock ES to prevent breaking other tests

7737df2

WordPress deleted a comment from github-actions bot Jun 24, 2024

dhruvkb mentioned this pull request Jun 24, 2024

Drop FK constraint on media_obj in MediaDecisionThrough, update backfillmoderationdecision command #4530

Merged

8 tasks

dhruvkb marked this pull request as ready for review June 24, 2024 17:42

dhruvkb requested a review from a team as a code owner June 24, 2024 17:42

dhruvkb requested review from obulat and sarayourfriend June 24, 2024 17:42

AetherUnbound approved these changes Jun 24, 2024

View reviewed changes

Merge branch 'main' of https://github.com/WordPress/openverse into de…

9e2e54e

…cision_action

sarayourfriend requested changes Jun 25, 2024

View reviewed changes

obulat previously requested changes Jun 25, 2024

View reviewed changes

dhruvkb added 3 commits June 25, 2024 20:25

Undo fixes to the backfill command

c5adb41

Avoid unnecessary queries

fb8495f

Add helpful messages when redirecting

63ce31c

dhruvkb requested review from sarayourfriend and obulat June 25, 2024 17:39

sarayourfriend approved these changes Jun 26, 2024

View reviewed changes

sarayourfriend mentioned this pull request Jun 26, 2024

Replace function-local dict media type -> model relationship configurations with static module dicts or an otherwise shared approach #4553

Open

dhruvkb merged commit 73c0ad5 into main Jun 26, 2024
52 checks passed

dhruvkb deleted the decision_action branch June 26, 2024 02:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create sensitive and deleted media models for decisions #4544

Create sensitive and deleted media models for decisions #4544

dhruvkb commented Jun 24, 2024 •

edited

Loading

github-actions bot commented Jun 24, 2024

AetherUnbound left a comment

sarayourfriend left a comment

sarayourfriend Jun 25, 2024 •

edited

Loading

dhruvkb Jun 25, 2024 •

edited

Loading

sarayourfriend Jun 25, 2024

sarayourfriend Jun 26, 2024 •

edited

Loading

obulat left a comment

sarayourfriend commented Jun 25, 2024

openverse-bot commented Jun 26, 2024

sarayourfriend left a comment

-            through_model = {
-                "image": ImageDecisionThrough,
-                "audio": AudioDecisionThrough,
-            }[self.media_type]
+            match self.media_type:
+                case IMAGE:
+                    through_model = ImageDecisionThrough
+                case AUDIO:
+                    through_model = AudioDecisionThrough
+                case _:
+                    raise ValueError(f"Unknown media type {self.media_type}")

Create sensitive and deleted media models for decisions #4544

Create sensitive and deleted media models for decisions #4544

Conversation

dhruvkb commented Jun 24, 2024 • edited Loading

Fixes

Description

Testing Instructions

Checklist

Developer Certificate of Origin

github-actions bot commented Jun 24, 2024

AetherUnbound left a comment

Choose a reason for hiding this comment

sarayourfriend left a comment

Choose a reason for hiding this comment

sarayourfriend Jun 25, 2024 • edited Loading

Choose a reason for hiding this comment

dhruvkb Jun 25, 2024 • edited Loading

Choose a reason for hiding this comment

sarayourfriend Jun 25, 2024

Choose a reason for hiding this comment

sarayourfriend Jun 26, 2024 • edited Loading

Choose a reason for hiding this comment

obulat left a comment

Choose a reason for hiding this comment

sarayourfriend commented Jun 25, 2024

openverse-bot commented Jun 26, 2024

Footnotes

sarayourfriend left a comment

Choose a reason for hiding this comment

dhruvkb commented Jun 24, 2024 •

edited

Loading

sarayourfriend Jun 25, 2024 •

edited

Loading

dhruvkb Jun 25, 2024 •

edited

Loading

sarayourfriend Jun 26, 2024 •

edited

Loading