Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not validate_duplicate_content for APT repositories #633

Merged
merged 1 commit into from
Aug 31, 2022

Conversation

quba42
Copy link
Collaborator

@quba42 quba42 commented Aug 31, 2022

closes #632

Using validate_duplicate_content prevents successfull synchronization
when multiple package indices contain references to the same package.
However, this is an entirely legitimate situation for separate
distributions/releases within a single APT repo. Further more, the so
created duplicate package content, does not cause any adverse effects,
since it is actually identical and only one of each duplicate will be
saved.

Note that this issue was masked prior to the introduction of the
optimize sync feature, since we were inadvertently using the Repository
class from pulpcore instead of the AptRepository class from pulp_deb.

closes pulp#632

Using validate_duplicate_content prevents successfull synchronization
when multiple package indices contain references to the same package.
However, this is an entirely legitimate situation for separate
distributions/releases within a single APT repo. Further more, the so
created duplicate package content, does not cause any adverse effects,
since it is actually identical and only one of each duplicate will be
saved.

Note that this issue was masked prior to the introduction of the
optimize sync feature, since we were inadvertently using the Repository
class from pulpcore instead of the AptRepository class from pulp_deb.
@quba42 quba42 added the .bugfix CHANGES/<issue_number>.bugfix label Aug 31, 2022
Copy link
Contributor

@m-bucher m-bucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense 👍

@quba42 quba42 merged commit cb31fd8 into pulp:main Aug 31, 2022
@quba42 quba42 deleted the handle_multiple_package_references branch August 31, 2022 13:07
@quba42
Copy link
Collaborator Author

quba42 commented Aug 31, 2022

Does not require backporting, since the issue is masked on all existing release branches.

@@ -67,7 +67,7 @@ def finalize_new_version(self, new_version):
from pulp_deb.app.tasks.exceptions import DuplicateDistributionException

remove_duplicates(new_version)
validate_repo_version(new_version)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may not understand this. But the validate_duplicate_content should prevent two content units with the same "natural almost key" but different artifacts to enter the same repository version. It is only checked for content that declares repo_key_fields. So maybe you can revisit those attributes and relax the rule based on the content type.
I would still think it is invalid to have two packages with the same (name, version, architecture) in one repo-version.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we found, is that it also appears to prevent syncs were two package indices both reference the same pool package, because it somehow counts those as duplicate content units at the time of the check. This is odd since it does not actually result in that package being saved twice and colliding on the publish. This is observed behavior, I am not certain I fully understand why this happens. (I feel like what should happen is that the first of these two declarative_content units is saved to the DB, and the second one then sees that this package already exists and does not save anything, resulting in no duplicates in the new repo version, but that is not what we are seeing...)

Not sure if completely disabling the check is a good solution, but it looks like the check was not running correctly anyway until we introduced optimize sync, so this seemed like the least disruptive stopgap measure to avoid breaking syncs that have so far been working without issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Yes, that should not happen. Can you maybe find a (unittest like) reproducer for this that creates a repo-version and adds the same content twice and then we may see, what's actually happening?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have created a follow on task, so we do not forget about this: #640

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
.bugfix CHANGES/<issue_number>.bugfix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Do not validate_duplicate_content for APT repositories
3 participants