-
Notifications
You must be signed in to change notification settings - Fork 206
Namespace: explicit parsing errors #455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ppkarwasz
wants to merge
5
commits into
package-url:main
Choose a base branch
from
ppkarwasz:feat/path-parser-namespace
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
aca3716
Namespace: explicit parsing errors
ppkarwasz 7a97932
Replace `Signal` with `Report`
ppkarwasz 58baca8
Merge branch 'main' into feat/path-parser-namespace
ppkarwasz 9bd2568
Replace `solidus` -> `slash`
ppkarwasz b573897
Merge remote-tracking branch 'package-url/main' into feat/path-parser…
ppkarwasz File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for clarity - what is the purpose of this rule, and for reporting an error?
I'm thinking of a couple of things when I read this.
/")There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason to report an error here is to protect consumers of PURLs from malformed or malicious input. By error I mean whatever the parser uses to signal a problem (exception, exit status, etc.).
Historically, many vulnerabilities in HTTP servers came from path traversal attacks where characters like
.or/were smuggled in through alternative encodings (e.g..as%2E,%C0%AE,%E0%80%AE,%F0%80%80%AE;/as%2F,%C0%AF,%E0%80%AF,%F0%80%80%AF). Allowing these in PURLs would create ambiguity and open the door to similar exploits.In the PURL spec today:
//) are not meaningful, but are often an honest mistake in producers. The current parser recommendation is to normalize them away rather than fail./) is never valid. Since the parse process splits on/before decoding, any literal/inside a segment must have been hidden behind percent-encoding, which is a strong signal of an attempt to “escape” the namespace. In this case, failing fast and surfacing an error is IMHO the safer and clearer choice.This distinction matters because some ecosystems map PURLs directly to URLs. A PURL like:
could trick a consumer into resolving
bar/artifactinstead offoo/artifactif the parser silently accepts it. By requiring an error, the spec prevents that entire class of misinterpretation.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Namespaces should not contain
%2Fbecause of the way PURL performs encoding and decoding, not because it is an illegal filename character on some operating systems. If you read the PURLpkg:generic/a%2Fb/cthea%2Fbcannot be represented and turns intoa/bwhile parsing.However, I don't think it really matters for namespaces and it's probably better not to do this. I guess this avoids potentially confusing parses like
pkg:generic/a%2Fb/c%2Fd/e%2Ffhaving a namespacea/b/c/dand namee/f, and the unnecessary edge case about empty segments created by a previous rule about namespaces. I still believe that namespaces are a mistake that needs to be fixed by treating the part between the type and the version as an opaque path string, similar to how it works in URL, with the meaning defined by the package type. This new rule may be a step in the wrong direction because it forbids certain character sequences from being in that segment in a convoluted way. For example,pkg:golanghas no namespace but the name often contains slashes, so if namespaces are eliminated then the path would still need to be something likegithub.com%2Fpackage-url%2Fexamplefor compatibility with namespace+name implementations, which would be allowed because there are no%2Fcharacters followed by an unencoded/character. However, in some other package type (maybepkg:swid),Acme A%2FB/Widgetswould need to be forbidden because parsers implementing this proposed addition to the spec would see the%2Fas being an illegal namespace character, making it complicated to deal with company names ending in "A/B".Maybe it's too broken already and fixing namespaces would need to wait for a
pkg2that follows URL parsing semantics, parsing only from the left and not trying to apply special meaning to the path strings used by different backends.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike subpaths, namespaces are not paths, and should not be blanket sanitized as if they are paths.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per current spec, namespace segments MUST NOT contain
%2Fanyway (see grammar in #578) - and if they did, then the whole thing is not a valid PRUL. So far, there is no rule what to do if any forbidden chars occurred.This is what this PR tries to fix: it adds a rule that expresses to report the error and fail the parsing all along. (in this case, Postel's Law must not be applied - fail and report - no "try to fix it" approach.)