Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure special characters in manifest filepaths are urlencoded #35

Merged
merged 5 commits into from
Feb 22, 2022

Conversation

whikloj
Copy link
Owner

@whikloj whikloj commented Feb 17, 2022

Resolves #33

This change:

  • Checks the payload manifests and fetch files of loaded Bags for any url encoded characters other than %0D, %0A and %25 and fails validation if it finds one. It also fails if it finds an unencoded newline or carriage return (though this really should never happen)
  • Correctly urlencodes these characters in payload manifests and fetch files written out by this library

@whikloj
Copy link
Owner Author

whikloj commented Feb 17, 2022

@pwinckles not sure if you had a specific test that you had been running that you'd like to retry with this PR. I added some tests but always appreciate more eyes.

Copy link

@pwinckles pwinckles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't have any specific tests. I was just eyeballing the libraries to see what they were doing.

Your changes seem good to me.

src/BagUtils.php Outdated
*/
public static function checkUnencodedFilepath(string $filepath): bool
{
return (strpos($filepath, "\n") > -1 || strpos($filepath, "\r") > -1 ||

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear to me what fgets considers a newline, but if you're reading the file line by line as defined by the spec where CR, LF, or CRLF are line endings, then it shouldn't be possible for the filepath to contain a CR or LF at this point.

The encoding check is technically correct. However, it does make it so the code would reject bags created by libraries that haven't be updated to encode paths correctly. I don't know if you're interested in accepting those bags or not though.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I struggled with this, mostly because I don't use PHP on Windows and so I am biased towards LF as newline characters. Looking at it now, I'm not handling CR line endings properly. I'll make a new ticket for that.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#36

@whikloj whikloj merged commit 683b5bd into main Feb 22, 2022
@whikloj whikloj deleted the issue-33 branch February 22, 2022 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Path encoding bug
2 participants