Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escapes for various elements (urls, json, slack payload, directories, commit log contents) #39

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

init-js
Copy link

@init-js init-js commented Oct 5, 2016

This pull request is concerned with escaping and encoding fixes. I've uncovered several problems, trying to set hooks for a cGit web viewer on a gitolite repository system.

  1. Parsing the logs (esp. multiple update changesets):
  • don't use sed for splitting fields. sed doesn't support non-greedy wildcards. this is problematic for sed forms such as 's/foo\(.*\)bar/\1/g'. any "bar" substring would get gobbed up in the group. Alternatives to make the grouping non-greedy, such as [^x]*, introduce ambiguities and negative lookaheads are difficult to maintain. This is an improved version of pull request Fix malformed payload when bundeling multiple commits #36 , Fix broken multiline commit messages and multiple commits #27 (which didn't fix the issue completely).
  • The patches here replace the shell pipeline with a bash tokenizer and a nonce as the field separator. The random separator makes false positives extremely unlikely, the code is easy to read, and the template straightforward to extend.
  • The fields title and value are first extracted, and then inserted into a json string. The previous shell pipeline conflated the two tasks. The resulting code is much easier to extend.
  1. Unsupported characters in URLs:
    • sed "s/%hash%/$url/g" produces unexpected results without escaping $url for special sed-interpreted character commands (like '&'). For these forms, the patch provides workarounds that avoid the need to escape URLs -- but produce the same output.
    • ampersand '&' characters are valid URL query string characters, eg. ?a=b&c=d . the Cgit git viewer uses these forms to display commits ( ?id=abcabc&id2=defdef ). These query strings wreak havoc on the sed substitutions. Users shouldn't have to escape URLs in their config. Fixes issue message garbled with multiple commits in push #24.
  2. Directory path problems:
    • git repo directory names with spaces produce invalid notification messages. the patch provides adequate shell quoting during repo name determination: instances of $(basename $foo) are changed to $(basename "$foo").
  3. JSON encoding problems:
    • Only a subset of special characters were escaped (linefeeds 0x0a, '', and '"') in json strings. Unicode characters should be escaped too. The patch uses a python one-liner to properly escape json (with the json package). It assumes the commit messages from the log are utf-8. Invalid sequences are replaced with unicode replacement char json sequences \ufffd.
    • A comment seemed to indicate that slack did not support newlines in the payload. This is erroneous. The json may contain literal linefeeds (e.g. between array members), but they must be escaped inside json string values.
    • the channel name, emoji, and other string fields of the payload are now also json-encoded.
  4. POST body encoding fixes.
  5. Things not covered.

…eader.

   - if the replacement contains a '&' sed behaves differently. this fix avoids
     feeding URLs on the replacement side of sed commands. query strings may
     choke on it.

     ex.   echo "123" | sed -e 's/2/T&O/g'
           # prints  '1T2O3'

   - sed doesn't support non-greedy .*  tokenization of git-log was broken.
     the command pipeline is replaced with tokenization using bash suffix
     match. easy to follow, fast, and straightforward to maintain.

   - the @,,,,@ and ;,,,,; separators have been replaced with pseudorandom
     boundary strings. safer. 7*15bits of entropy (poorman's uuid).

   - json escapes for just \n \" and \\ was insufficient. using python
     oneliner to escape unicode.
… now.

   URLs containing '&' would trigger 400 invalid_payload slack errors.
   POST body needs to be urlencoded.
@piit79
Copy link

piit79 commented Feb 4, 2017

Any chance this could be merged? I came across the '&' in changeset URL issue which I fixed very crudely (by escaping it in the sed expression). It looks like this fixes a lot of issues!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants