Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not validate_duplicate_content for APT repositories #633

Merged
merged 1 commit into from
Aug 31, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGES/632.bugfix
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fixed a bug preventing the synchronization of repos referencing a single package from multiple package indices.
4 changes: 2 additions & 2 deletions pulp_deb/app/models/repository.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from pulpcore.plugin.models import Repository

from pulpcore.plugin.repo_version_utils import remove_duplicates, validate_repo_version
from pulpcore.plugin.repo_version_utils import remove_duplicates, validate_version_paths

from pulp_deb.app.models import (
AptRemote,
Expand Down Expand Up @@ -67,7 +67,7 @@ def finalize_new_version(self, new_version):
from pulp_deb.app.tasks.exceptions import DuplicateDistributionException

remove_duplicates(new_version)
validate_repo_version(new_version)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may not understand this. But the validate_duplicate_content should prevent two content units with the same "natural almost key" but different artifacts to enter the same repository version. It is only checked for content that declares repo_key_fields. So maybe you can revisit those attributes and relax the rule based on the content type.
I would still think it is invalid to have two packages with the same (name, version, architecture) in one repo-version.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we found, is that it also appears to prevent syncs were two package indices both reference the same pool package, because it somehow counts those as duplicate content units at the time of the check. This is odd since it does not actually result in that package being saved twice and colliding on the publish. This is observed behavior, I am not certain I fully understand why this happens. (I feel like what should happen is that the first of these two declarative_content units is saved to the DB, and the second one then sees that this package already exists and does not save anything, resulting in no duplicates in the new repo version, but that is not what we are seeing...)

Not sure if completely disabling the check is a good solution, but it looks like the check was not running correctly anyway until we introduced optimize sync, so this seemed like the least disruptive stopgap measure to avoid breaking syncs that have so far been working without issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Yes, that should not happen. Can you maybe find a (unittest like) reproducer for this that creates a repo-version and adds the same content twice and then we may see, what's actually happening?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have created a follow on task, so we do not forget about this: #640

validate_version_paths(new_version)
releases = new_version.get_content(Release.objects.all())
distributions = []
for release in releases:
Expand Down