Skip to content

Conversation

bernt-matthias
Copy link
Collaborator

the value should be the git URL to the tool, i.e. it should have a common suffix with the path of the repo that is under consideration

the value should be the git URL to the tool, i.e. it should have a
common suffix with the path of the repo that is under consideration


def lint_shed_remote_repository_url(realized_repository: "RealizedRepository", lint_ctx):
path = realized_repository.real_path.rstrip(" /")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this doing rstrip with a whitespace and lsash ? can you add a comment explaining this longest common suffix heuristic ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i appreciate the extra comments but i'm still lost on what the while loop does, so I asked and this is what I got:

Current Implementation Issues
This code attempts to find a common suffix between a file path and a URL by iterating backwards through both strings. However, there are several problems:

Logic Error: The condition checks if characters match, but path[i:] captures everything from position i to the end, which grows longer as i becomes more negative. This doesn't correctly identify the longest common suffix.
String Comparison Confusion: Comparing individual characters at negative indices doesn't guarantee meaningful path segment matching. For example, /tool in a path might accidentally match ool in "school" in the URL.
Unclear Purpose: The docstring mentions checking for "common prefix" but the code looks for a suffix, creating confusion.
Weak Validation: Only checking for "/" in the common part is insufficient - it could match arbitrary substrings.

This was a replacement suggestion:

from pathlib import PurePosixPath

def lint_shed_remote_repository_url(realized_repository: "RealizedRepository", lint_ctx):
    """
    Verify that remote_repository_url contains the repository path as a suffix.
    Expected URL format: https://gitserver/organisation/tree/main/path
    where 'path' should match the repository's filesystem path.
    """
    path = PurePosixPath(realized_repository.real_path)
    remote_repository_url = realized_repository.config.get("remote_repository_url", "").rstrip(" /")
    
    if not remote_repository_url:
        return  # No URL to validate
    
    # Get path parts (segments) excluding empty strings
    path_parts = path.parts
    
    # Check if URL ends with a reasonable portion of the path
    # Look for at least 2 path segments to avoid false positives
    min_segments = min(2, len(path_parts))
    
    for i in range(len(path_parts) - min_segments + 1):
        suffix = "/".join(path_parts[i:])
        if remote_repository_url.endswith(suffix):
            # Found a match with at least min_segments
            return
    
    # If no match found, issue warning
    lint_ctx.warn(
        f"remote_repository_url may be incorrect: expected it to end with "
        f"repository path '{path}' or a significant portion of it"
    )

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic Error: The condition checks if characters match, but path[i:] captures everything from position i to the end, which grows longer as i becomes more negative. This doesn't correctly identify the longest common suffix.

This is why I'm not convinced yet of AI :) Of course checking equality for the last, 2nd last, 3rd last ... character will determine the longest common substring. Even if efficiency is not relevant here, note that it's also more efficient than repeatedly constructing potential longest substrings and comparing these substrings (O(n) vs O(n^2)) ... but I should move longest_common_suffix = path[i:] to the else branch :)

String Comparison Confusion: ...
Unclear Purpose: ...
Weak Validation: ...

This is why I still make use of it: Indeed checking for longest common suffix of path segments is a better idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants