Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escape all literal dollar signs #3

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

BolunThompson
Copy link

@BolunThompson BolunThompson commented Nov 22, 2024

In principle, should resolve binpash/pash#664. If c is a literal dollar sign, it should be parsed as one (with no need to account for escaping & variable interpolation). Yet, this fix is suspiciously simple — is there an edge case here?

The tests for PaSh still pass with this change.

Signed-off-by: Bolun Thompson <[email protected]>
@angelhof
Copy link
Member

Wow, that is indeed suspiciously simple, lol. Does it work for this problematic case (binpash/pash#664 (comment))? Also, as an experiment, what does the parsed AST of this example look like?

I am wondering why we need to diverge from the original libdash pretty printer though (https://github.com/binpash/libdash/blob/master/libdash/printer.py#L412). That printer seems to pass their roundtrip tests (https://github.com/binpash/libdash/tree/master/test). Could we somehow manage to run the roundtrip libdash tests here too? Is this issue exposed anywhere in the libdash tests? If not, should we add a test that exposes it?

@mgree thoughts on this? Am I missing something?

@angelhof
Copy link
Member

One more thought: I guess PaSh's usecase is special (and therefore maybe easier to handle) because when we call the pretty printer it is after the compiler is done and therefore everything has been expanded already, so any spare $ must be escaped?

@mgree
Copy link

mgree commented Nov 22, 2024

No, this looks like the correct printer fix. The libdash printer treats this correctly:

>>> import libdash
>>> cmds = list(libdash.parse('f'))
>>> cmds
[(['Command', [1, [], [[['C', 101], ['C', 99], ['C', 104], ['C', 111]], [['Q', [['E', 36], ['C', 120]]]]], []]], 'echo "\\$x"\n', 0, 1)]
>>> libdash.to_string(cmds[0][0])
'echo "\\$x"'

I don't know how the shasta printer diverged from the libdash one...

@angelhof
Copy link
Member

angelhof commented Nov 23, 2024

I don't think the two printers diverge. I am seeing different behavior @mgree (after making libdash from source):

$ cat ../../../../../f
echo '$1'
$ python3
Python 3.10.12 (main, Nov  6 2024, 20:22:13) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import libdash
>>> cmds = list(libdash.parse('f'))
>>> cmds
[(['Command', [1, [], [[['C', 101], ['C', 99], ['C', 104], ['C', 111]], [['Q', [['C', 36], ['C', 49]]]]], []]], "echo '$1'\n", 0, 1)]
>>> libdash.to_string(cmds[0][0])
'echo "$1"'
>>>

Maybe you had f be echo "\$x"? Like the one below?

$ cat f2 
echo "\$x"
$ python3
Python 3.10.12 (main, Nov  6 2024, 20:22:13) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import libdash
>>> cmds = list(libdash.parse('f2'))
>>> cmds
[(['Command', [1, [], [[['C', 101], ['C', 99], ['C', 104], ['C', 111]], [['Q', [['E', 36], ['C', 120]]]]], []]], 'echo "\\$x"\n', 0, 1)]
>>> libdash.to_string(cmds[0][0])
'echo "\\$x"'
>>>

The issue here seems to be with the single quote, and the fact that $ in the single quote is added as C, even if followed by something. However, the fix that @BolunThompson proposed seems to lead to an issue with the following case (where $ is standalone):

$ cat f4
echo $ a
## This is with the fix!
$ python3
Python 3.10.12 (main, Nov  6 2024, 20:22:13) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import libdash
>>> cmds = list(libdash.parse('f4'))
>>> cmds
[(['Command', [1, [], [[['C', 101], ['C', 99], ['C', 104], ['C', 111]], [['C', 36]], [['C', 97]]], []]], 'echo $ a\n', 0, 1)]
>>> libdash.to_string(cmds[0][0])
'echo \\$ a'

Having the fix, it now always escapes the character $, but if we have a word that only contains $, this is not correct.

Maybe the correct fix is to only escape the (C, $) on the printer if it is followed by something that could be interpreted as a variable by the shell (not only escaped characters as it is currently implemented, but any character). Is there any followup character to $ in which case we don't want to escape $ @mgree ?

A larger issue here IMO is that we have two separate printer implementations. One in libdash and one in shasta (without any tests). My suggestion would be to:
(1) remove libdash's printer and import shasta's printer if we want printer functionality in libdash (we could also remove printing in shasta's printer, but it feels like a better place to have the printer).
(2) add tests in shasta (both locally and in CI), that import libdash, and run libdash's tests using shasta's printer.
(3) add a few tests with corner cases with $ (at least the ones I wrote in this comment) to make sure we won't regress in the future. Maybe there are some failing libdash tests now that fail due to this?

Also, it seems that libdash's tests fail to catch this issue, because they do two roundtrips (and in the second and third this is interpreted as a variable):

$ cat ../test/failing/single-quote.sh
echo '$a'
echo $ a
$ ../test/round_trip.sh ./rt.py ../test/failing/single-quote.sh
PASS '../test/failing/single-quote.sh' (two runs to fixpoint)
## this is the output of `echo "$orig" "$rt" "$rtrt"`
/tmp/tmp.rgHMVGpsHK /tmp/tmp.y6EkOh0NEJ /tmp/tmp.6cK85M8VJb
$ cat /tmp/tmp.rgHMVGpsHK /tmp/tmp.y6EkOh0NEJ /tmp/tmp.6cK85M8VJb
echo "$a"
echo $ a

echo "${a}"
echo $ a

echo "${a}"
echo $ a

@mgree
Copy link

mgree commented Nov 25, 2024

Huh. My f was different, writing echo "\$x". I agree that C '$' should be escaped.

@angelhof
Copy link
Member

So I think the plan of action (to complete this PR) is the following. @BolunThompson let me know if you have any questions:

  1. Add tests in shasta that import and build the latest libdash and run its tests with the shasta printer.
  2. Add one new test (ideally in libdash too so that we have all tests at the same place):
echo '$1'
echo $ a

This test should not be allowed to run two roundtrips to pass, because the bug will be missed.
3. Do the fix that only escapes $ if it is followed by another character (it is possible down the line that we identify an even finer-grained behavior here, but no need to fix it prematurely).
4. Later, and in a separate PR, we can consider removing libdash's python printer completely and just use shasta's printer. There is no need to keep two printer implementations around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

awk argument issues
3 participants