-
Notifications
You must be signed in to change notification settings - Fork 4
How Windows parses the command line
In Windows the system does not automatically split the command-line into an array of arguments:
- If the program uses the C/C++ runtime there will be an
argv
array and/or an__argv
global variable. The compiler inserts the code to automatically create this when the program starts. - Otherwise the program must specifically call CommandLineToArgvW with the value returned from GetCommandLine.
- Alternatively the program may use its own parsing function.
This section describes the C/C++ runtime and CommandLineToArgvW implementations, which are broadly similar.
By convention the module name, or the program to run, is the first argument in the command-line and this will be placed in argv[0]
. Note that this is only a convention and is not enforced in any way. The C/C++ runtime and CommandLineToArgvW behave quite differently for this element.
The only characters that have any special meaning are:
- double-quote
"
(U+0022) - whitespace, which is either a space (U+0020) or a tab (U+0009)
The following rules apply:
- A double-quote signifies the start or end of a quoted-string.
- Whitespace inside a quoted-string is treated as literal whitespace and is copied to
argv[0]
. - Whitespace outside a quoted-string acts signifies the end of
argv[0]
. - Any other character is copied to
argv[0]
The only characters that have any special meaning are:
- double-quote
"
(U+0022) - whitespace, which is any character in the range U+0001 to U+0020
The following rules apply:
- If the first character is a double-quote then all subsequent characters (including whitespace) are copied to
argv[0]
up until another double-quote (or the end of the command-line). - If the first character is not a double-quote then all characters (including double-quotes) are copied to
argv[0]
up until a whitespace character (or the end of the command-line).
The C/C++ runtime and CommandLineToArgvW behave identically, except when handling Consecutive double-quotes. The only characters that have any special meaning are:
- double-quote
"
(U+0022) - whitespace, which is either a space (U+0020) or a tab (U+0009)
- backslash
\
(U+005C)
The role of these characters and whether they are copied to the argv array is explained below:
- A double-quote preceded by a backslash
\"
is treated as a literal double-quote character"
and is placed in the argv item. In this instance the preceding backlash escapes the double-quote. - A double-quote not preceded by a backslash generally signifies the start or end of a quoted-string. See Consecutive double-quotes.
- Whitespace inside a quoted-string is treated as literal whitespace and is copied to the argv item.
- Whitespace outside a quoted-string acts as an argument delimiter.
- Backslashes not followed by a double-quote are interpreted literally and copied to the argv item.
- If an even number of backslashes is followed by a double-quote, one backslash is placed in the argv item for every pair of backslashes, and the double-quote generally signifies the start or end of a quoted-string. See Consecutive double-quotes.
- If an odd number of backslashes is followed by a double-quote, one backslash is placed in the argv item for every pair of backslashes, and the double-quote is escaped by the remaining backslash, causing a literal double-quote character
"
to be placed in the argv item.
- Inside a quoted-string, consecutive double-quotes
""
cause a single, literal double-quote character"
to be placed in the argv item and the quoted-string continues. This is the C/C++ runtime behavior. CommandLineToArgvW (and the C/C++ runtime prior to Visual Studio 2008) behaves differently by ending the quoted-string. - Outside a quoted-string, consecutive double-quotes simply toggle the quoted-string as having started then ended, with nothing added to the argv item.