Skip to content

Implementing a solution

John Stevenson edited this page Oct 8, 2021 · 6 revisions

There is no complete solution to argument-escaping on Windows, particularly when cmd.exe is involved. The best that can be achieved is a robust compromise that can handle most cases without introducing a set of complex rules.

Windows command-line

From How Windows parses the command-line it is clear that the only character that might cause unexpected results is a double-quote. So we need a convention for handling these and arguments in general:

  • The argument is treated as unescaped.
  • Any double-quotes in an argument are escaped as literal double-quotes.
  • An argument will not be enclosed in double-quotes unless absolutely necessary.

This will avoid inconsistencies when handling consecutive double-quotes and enable each argument to be included in a command-line without it affecting other items. It will also prevent double-quotes breaking batch scripts.

Outline

Having defined our convention, the steps to escape an argument are simply:

  1. Replace all [backslashes] double-quote with [2 x backslashes] backslash double-quote.
  2. If a space or tab character is found, or the argument is empty:
    1. double up trailing backslashes.
    2. add surrounding double-quotes.

PHP code

function escapeWin($arg)
{
    $arg = preg_replace('/(\\\\*)"/', '$1$1\\"', $arg);

    if (strpbrk($arg, " \t") !== false || $arg === '') {
        $arg = preg_replace('/(\\\\*)$/', '$1$1', $arg);
        $arg = '"'.$arg.'"';
    }
    return $arg;
}

Incorporating cmd.exe

From How cmd.exe parses a command we know that meta characters have a special meaning. How we deal with these is split into the following sections:

Double-quotes

From the point of view of cmd, all double-quotes either start or end a quoted-string, regardless of whether they are backslash-escaped. This could have unexpected consequences if there is an odd number of double quotes, or in other situations.

For example, the argument colors="red & blue" would be escaped as:

"colors=\"red & blue\""

However the & character is no longer protected by the opening double-quote, because the quoted-string has been closed by the first literal (backslash-escaped) double-quote. The result is that the argument is split by the & character and cmd trys to call a program named blue\"".

The only way to solve this is to caret-escape the whole argument, which in this case would be ^"colors=\^"red ^& blue\^"^".

Variable expansion

Environment variable expansion is triggered by the %...% and !...! syntax, regardless of the quoted-string state. Therefore we need to caret-escape the whole argument.

However we cannot do this for exclamation-marks. These require an escape sequence of two carets ^^!, due to the two step parsing that cmd performs, and we have no way of knowing the DelayedExpansion state (other than it is disabled by default):

  • If enabled, an escaped ^^!var^^! will be transformed to !var! as intended.
  • If disabled, an escaped ^^!var^^! will be transformed to ^!var^! and introduce two unintended carets.

Other meta characters

These are the characters that have not yet been accounted for: ^ & | < > ( )

Since they have no special meaning inside a quoted-string (and we know there are no double-quotes to confuse the quoted-string state) we have two choices:

  1. Do nothing if there is whitespace in the argument (because these meta characters will be escaped by the enclosing double-quotes).
  2. Enclose the argument in double-quotes if it contains any of these meta characters.

Note that we do not use caret-escaping in case we come up against its single limitation.

Meta escaping rules

We can condense the above into the following rules:

  • If an argument contains double-quotes or %...% syntax, the transformed argument must be caret-escaped.
  • Otherwise if it does not contain whitespace but does contains meta characters it will be enclosed in double-quotes.
  • The ! meta character is not escaped because it cannot be handled reliably.

Outline

We need to set the following flags:

  1. Set quote to true if a space or tab character is found, or the argument is empty.
  2. Set dquotes to true if a double-quote character is found.
  3. Set meta to true if dquotes is true or two % characters surround other characters.
  • We need to caret-escape everything, including any enclosing double-quotes.
  1. If meta and quote are false, set quote to true if any ^ & | < > ( ) characters are found.
  • We can safely escape these characters using the surrounding double-quotes.

Now we can perform the escaping:

  1. If dquotes is true:
  • Replace all [backslashes] double-quote with [2 x backslashes] backslash double-quote.
  1. If quote is true:
  • double up trailing backslashes.
  • add surrounding double-quotes.
  1. If meta is true:
  • escape all " ^ & | < > ( ) % characters with a caret ^.

PHP code

function escapeCmdExe($arg)
{
    $quote = strpbrk($arg, " \t") !== false || $arg === '';
    $dquotes = strpos($arg, '"') !== false;
    $meta = $dquotes || preg_match('/%[^%]+%/', $arg);

    if (!$meta && !$quote) {
        $quote = strpbrk($arg, '^&|<>()') !== false;
    }

    if ($dquotes) {
        $arg = preg_replace('/(\\\\*)"/', '$1$1\\"', $arg);
    }

    if ($quotes) {
        $arg = preg_replace('/(\\\\*)$/', '$1$1', $arg);
        $arg = '"'.$arg.'"';
    }

    if ($meta) {
        $arg = preg_replace('/(["^&|<>()%])/', '^$1', $arg);
    }
    return $arg;
}