-
Notifications
You must be signed in to change notification settings - Fork 4
Implementing a solution
There is no complete solution to argument-escaping on Windows, particularly when cmd.exe is involved. The best that can be achieved is a robust compromise that can handle most cases without introducing a set of complex rules.
From How Windows parses the command-line it is clear that the only character that might cause unexpected results is a double-quote. So we need a convention for handling these and arguments in general:
- The argument is treated as unescaped.
- Any double-quotes in an argument are escaped as literal double-quotes.
- An argument will not be enclosed in double-quotes unless absolutely necessary.
This will avoid inconsistencies when handling consecutive double-quotes and enable each argument to be included in a command-line without it affecting other items. It will also prevent double-quotes breaking batch scripts.
Having defined our convention, the steps to escape an argument are simply:
- Replace all
[backslashes] double-quote
with[2 x backslashes] backslash double-quote
. - If a
space
ortab
character is found, or the argument is empty:- double up trailing backslashes.
- add surrounding double-quotes.
function escapeWin($arg)
{
$arg = preg_replace('/(\\\\*)"/', '$1$1\\"', $arg);
if (strpbrk($arg, " \t") !== false || $arg === '') {
$arg = preg_replace('/(\\\\*)$/', '$1$1', $arg);
$arg = '"'.$arg.'"';
}
return $arg;
}
From How cmd.exe parses a command we know that meta characters have a special meaning. How we deal with these is split into the following sections:
From the point of view of cmd, all double-quotes either start or end a quoted-string, regardless of whether they are backslash-escaped. This could have unexpected consequences if there is an odd number of double quotes, or in other situations.
For example, the argument colors="red & blue"
would be escaped as:
"colors=\"red & blue\""
However the &
character is no longer protected by the opening double-quote, because the quoted-string has been closed by the first literal (backslash-escaped) double-quote. The result is that the argument is split by the & character and cmd trys to call a program named blue\""
.
The only way to solve this is to caret-escape the whole argument, which in this case would be ^"colors=\^"red ^& blue\^"^"
.
Environment variable expansion is triggered by the %...%
and !...!
syntax, regardless of the quoted-string state. Therefore we need to caret-escape the whole argument.
However we cannot do this for exclamation-marks. These require an escape sequence of two carets ^^!
, due to the two step parsing that cmd performs, and we have no way of knowing the DelayedExpansion state (other than it is disabled by default):
- If enabled, an escaped
^^!var^^!
will be transformed to!var!
as intended. - If disabled, an escaped
^^!var^^!
will be transformed to^!var^!
and introduce two unintended carets.
These are the characters that have not yet been accounted for: ^
&
|
<
>
(
)
Since they have no special meaning inside a quoted-string (and we know there are no double-quotes to confuse the quoted-string state) we have two choices:
- Do nothing if there is whitespace in the argument (because these meta characters will be escaped by the enclosing double-quotes).
- Enclose the argument in double-quotes if it contains any of these meta characters.
Note that we do not use caret-escaping in case we come up against its single limitation.
We can condense the above into the following rules:
- If an argument contains double-quotes or
%...%
syntax, the transformed argument must be caret-escaped. - Otherwise if it does not contain whitespace but does contains meta characters it will be enclosed in double-quotes.
- The
!
meta character is not escaped because it cannot be handled reliably.
We need to set the following flags:
- Set quote to true if a
space
ortab
character is found, or the argument is empty. - Set dquotes to true if a double-quote character is found.
- Set meta to true if dquotes is true or two
%
characters surround other characters.
- We need to caret-escape everything, including any enclosing double-quotes.
- If meta and quote are false, set quote to true if any
^
&
|
<
>
(
)
characters are found.
- We can safely escape these characters using the surrounding double-quotes.
Now we can perform the escaping:
- If dquotes is true:
- Replace all
[backslashes] double-quote
with[2 x backslashes] backslash double-quote
.
- If quote is true:
- double up trailing backslashes.
- add surrounding double-quotes.
- If meta is true:
- escape all
"
^
&
|
<
>
(
)
%
characters with a caret^
.
function escapeCmdExe($arg)
{
$quote = strpbrk($arg, " \t") !== false || $arg === '';
$dquotes = strpos($arg, '"') !== false;
$meta = $dquotes || preg_match('/%[^%]+%/', $arg);
if (!$meta && !$quote) {
$quote = strpbrk($arg, '^&|<>()') !== false;
}
if ($dquotes) {
$arg = preg_replace('/(\\\\*)"/', '$1$1\\"', $arg);
}
if ($quotes) {
$arg = preg_replace('/(\\\\*)$/', '$1$1', $arg);
$arg = '"'.$arg.'"';
}
if ($meta) {
$arg = preg_replace('/(["^&|<>()%])/', '^$1', $arg);
}
return $arg;
}