Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create sensitive and deleted media models for decisions #4544

Merged
merged 11 commits into from
Jun 26, 2024
Merged

Conversation

dhruvkb
Copy link
Member

@dhruvkb dhruvkb commented Jun 24, 2024

Fixes

Fixes #4513 by @krysal

Description

This PR ensures that SensitiveImage/SensitiveAudio and DeletedImage/DeletedAudio models are created for every decision.

Testing Instructions

  1. Create a report for an image.
  2. Take a deindexed-sensitive or deindexed-copyright action on the report.
    • You should see a new DeletedImage object.
    • The image should be deleted.
    • The report should be resolved with a linked decision.
  3. Create a report for an image.
  4. Take a marked-sensitive action on the report.
    • You should see a new SenstiveMedia object.
    • The image should be marked as sensitive.
    • The report should be resolved with a linked decision.

Repeat these steps for audio.

Checklist

  • My pull request has a descriptive title (not a vague title likeUpdate index.md).
  • My pull request targets the default branch of the repository (main) or a parent feature branch.
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added or updated tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no visible errors.
  • I ran the DAG documentation generator (./ov just catalog/generate-docs for catalog
    PRs) or the media properties generator (./ov just catalog/generate-docs media-props
    for the catalog or ./ov just api/generate-docs for the API) where applicable.

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@openverse-bot openverse-bot added 🧱 stack: api Related to the Django API 🟥 priority: critical Must be addressed ASAP 🛠 goal: fix Bug fix 💻 aspect: code Concerns the software code in the repository labels Jun 24, 2024
@WordPress WordPress deleted a comment from github-actions bot Jun 24, 2024
Copy link

This PR has migrations. Please rebase it before merging to ensure that conflicting migrations are not introduced.

Copy link
Collaborator

@AetherUnbound AetherUnbound left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, and looks good to me! I tested this locally and confirmed that the deindexed image gets deleted from the API database, and the sensitive image gets a record in the sensitive table.

Copy link
Collaborator

@sarayourfriend sarayourfriend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The blocking change is to fix the issue with non-performant bulk decision creation.

I think we should exclude fixing the backfill from this PR and instead just fix the media admin so that it's working again. And then address the backfill (and lay groundwork for #3840, which needs this anyway) in a separate PR.

Comment on lines +443 to +446
through_model = {
"image": ImageDecisionThrough,
"audio": AudioDecisionThrough,
}[self.media_type]
Copy link
Collaborator

@sarayourfriend sarayourfriend Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit of a nit, but I've seen this pattern a few times in our code and I don't really understand it. Why not use match/case or if/else for this? The inline object approach is a little too clever (and I never understood defining a static object inline of a function body like this either).

Both match and if/else require explicitly raising if self.media_type doesn't match... but isn't that better? It's certainly easier to read and understand to me (and you know the old phrase about how often code is read compared to written, I'm sure).

Suggested change
through_model = {
"image": ImageDecisionThrough,
"audio": AudioDecisionThrough,
}[self.media_type]
match self.media_type:
case IMAGE:
through_model = ImageDecisionThrough
case AUDIO:
through_model = AudioDecisionThrough
case _:
raise ValueError(f"Unknown media type {self.media_type}")
Suggested change
through_model = {
"image": ImageDecisionThrough,
"audio": AudioDecisionThrough,
}[self.media_type]
if self.media_type == IMAGE:
through_model = ImageDecisionThrough
elif self.media_type == AUDIO:
through_model = AudioDecisionThrough
else:
raise ValueError(f"Unknown media type {self.media_type}")

Even better would be to configure it on the admin itself, along with the media type.

Alternatively, if you want to remove all explicit configuration:

Suggested change
through_model = {
"image": ImageDecisionThrough,
"audio": AudioDecisionThrough,
}[self.media_type]
through_model = getattr(media_obj, f"{self.media_type}decisionthrough_set").model

But it really would be better if it was just a @property of media_obj._meta or something...

Suggested change
through_model = {
"image": ImageDecisionThrough,
"audio": AudioDecisionThrough,
}[self.media_type]
through_model = media_obj._meta.decision_through_model

Anyway, any of those would be expected and easier to understand when reading, I think. (Except the getattr one, that's similarly too clever and it's basically not even worth including as is, but would be improved if it didn't need gettatr and could just be media_obj.decisionthrough_set.model if the media type was removed from the name, which would simplify other code too, not just here).

Copy link
Member Author

@dhruvkb dhruvkb Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I write use the = {}[] pattern is because it is the most compact among all the alternatives and also raises an exception when none of the keys match the input.

I've done this a few times in this file itself. Would you prefer I change this pattern across the entire file, or keep this as is is in the interest of consistency, or just change it here to one of the alternatives you've suggested?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the dict-key-access approach is preferred for any reason at all (it is nice it implicitly throws if the key doesn't exist), I'd at the very least think defining the dictionary outside the runtime scope of the function is a reasonable requirement, if only so that it's in a shared location. It's far-fetched to me to establish a pattern of defining otherwise static dictionaries, especially one encoding relationships between static objects, entirely dynamically in the runtime of a function. From a performance perspective it's fine here, but in a tight loop it's just silly, right? From a shared data/relationship encoding perspective (and discoverability, clarity, etc) it's definitely the worst option I can think of 😕. Just seems like an antipattern to me 🤷 I also don't think compactness is necessarily a virtue, and certainly not in Python, which actively resists compactness in my experience.

Copy link
Collaborator

@sarayourfriend sarayourfriend Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my request here, to clarify, is to move the dictionary to a static location, out of the function, or define the relationship in some other concrete manner that isn't specific to this function. That can be either: changing the field names on the models so that they can be referenced generically (without the media type prefix), or adding a class property to the base class that resolves these models for the media type based on the fields, or something else like that. Also, a follow-up issue to remove that pattern anywhere it's been added and replace it with the generic approach (whether that's statically defined dicts or the dynamic-but-shared approach of class properties).

In general: these static relationships between media type and the data models should not be defined within a local function context, even ignoring all issues with legibility, ergonomics, and performance of this local dict approach. At the very least, this static relationship should be defined statically, and in a shared location, so that new code automatically references it, and reducing the risk of someone just copy/pasting this function-local definition of the relationship.

The dict-accessor pattern is fine on its own, it's the inline dict definition I think is an anti-pattern (though I think match/case and an explicit raise of ValueError is clearer than KeyError, but that's an aesthetic judgement, I know, as at the end of the day it's more or less intelligible as the same underlying problem).

Edit: I realise I'm blocking this PR that fixes a bug in the admin on a code-style/quality issue. I do think this needs to change and believe it's an anti-pattern, but won't block the PR. I'll write an issue to address this more widely later today instead.

api/api/management/commands/backfillmoderationdecision.py Outdated Show resolved Hide resolved
api/api/models/media.py Outdated Show resolved Hide resolved
api/api/models/media.py Outdated Show resolved Hide resolved
obulat
obulat previously requested changes Jun 25, 2024
Copy link
Contributor

@obulat obulat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After you delete a media object using a deindexed_[...] action, admin is trying to open the same change_view form with the same media object. However, because this media object has been deleted, you get an error on line if tags := media_obj.tags:. To prevent it, I added the redirect to the change_view:

        media_obj = self.get_object(request, object_id)
        if media_obj is None:
            return redirect(f"admin:api_{self.media_type}_changelist")

If you do not select any report, and submit a decision (e.g., mark_sensitive), you get no indication of the error except for the warning in the logs (which the moderator using the Admin UI will probably not see). This can be a follow-up issue since this PR is critical, but we should add the error display ("report_id" is required) to the form.

@sarayourfriend
Copy link
Collaborator

@stacimc pointed out that the backfill doesn't need to "perform" the action at all, because it's just creating the decisions for actions that have already been performed. Glad we removed it already! I'll re-review this today.

@openverse-bot
Copy link
Collaborator

Based on the critical urgency of this PR, the following reviewers are being gently reminded to review this PR:

@sarayourfriend
@obulat
This reminder is being automatically generated due to the urgency configuration.

Excluding weekend1 days, this PR was ready for review 1 day(s) ago. PRs labelled with critical urgency are expected to be reviewed within 1 weekday(s)2.

@dhruvkb, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings.

Footnotes

  1. Specifically, Saturday and Sunday.

  2. For the purpose of these reminders we treat Monday - Friday as weekdays. Please note that the operation that generates these reminders runs at midnight UTC on Monday - Friday. This means that depending on your timezone, you may be pinged outside of the expected range.

Copy link
Collaborator

@sarayourfriend sarayourfriend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, tests fine locally and I'm unblocking on my requested changes to get the fix into the admin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix 🟥 priority: critical Must be addressed ASAP 🧱 stack: api Related to the Django API
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Creating MediaDecision has no effect on deindexed actions
5 participants