Skip to content

How Windows parses the command line

John Stevenson edited this page Oct 8, 2021 · 2 revisions

In Windows the system does not automatically split the command-line into an array of arguments:

  • If the program uses the C/C++ runtime there will be an argv array and/or an __argv global variable. The compiler inserts the code to automatically create this when the program starts.
  • Otherwise the program must specifically call CommandLineToArgvW with the value returned from GetCommandLine.
  • Alternatively the program may use its own parsing function.

This section describes the C/C++ runtime and CommandLineToArgvW implementations, which are broadly similar.

Parsing the module name

By convention the module name, or the program to run, is the first argument in the command-line and this will be placed in argv[0]. Note that this is only a convention and is not enforced in any way. The C/C++ runtime and CommandLineToArgvW behave quite differently for this element.

C/C++ Runtime

The only characters that have any special meaning are:

  • double-quote " (U+0022)
  • whitespace, which is either a space (U+0020) or a tab (U+0009)

The following rules apply:

  • A double-quote signifies the start or end of a quoted-string.
  • Whitespace inside a quoted-string is treated as literal whitespace and is copied to argv[0].
  • Whitespace outside a quoted-string acts signifies the end of argv[0].
  • Any other character is copied to argv[0]

CommandLineToArgvW

The only characters that have any special meaning are:

  • double-quote " (U+0022)
  • whitespace, which is any character in the range U+0001 to U+0020

The following rules apply:

  • If the first character is a double-quote then all subsequent characters (including whitespace) are copied to argv[0] up until another double-quote (or the end of the command-line).
  • If the first character is not a double-quote then all characters (including double-quotes) are copied to argv[0] up until a whitespace character (or the end of the command-line).

Parsing the remaining arguments

The C/C++ runtime and CommandLineToArgvW behave identically, except when handling Consecutive double-quotes. The only characters that have any special meaning are:

  • double-quote " (U+0022)
  • whitespace, which is either a space (U+0020) or a tab (U+0009)
  • backslash \ (U+005C)

The role of these characters and whether they are copied to the argv array is explained below:

  • A double-quote preceded by a backslash \" is treated as a literal double-quote character " and is placed in the argv item. In this instance the preceding backlash escapes the double-quote.
  • A double-quote not preceded by a backslash generally signifies the start or end of a quoted-string. See Consecutive double-quotes.
  • Whitespace inside a quoted-string is treated as literal whitespace and is copied to the argv item.
  • Whitespace outside a quoted-string acts as an argument delimiter.
  • Backslashes not followed by a double-quote are interpreted literally and copied to the argv item.
  • If an even number of backslashes is followed by a double-quote, one backslash is placed in the argv item for every pair of backslashes, and the double-quote generally signifies the start or end of a quoted-string. See Consecutive double-quotes.
  • If an odd number of backslashes is followed by a double-quote, one backslash is placed in the argv item for every pair of backslashes, and the double-quote is escaped by the remaining backslash, causing a literal double-quote character " to be placed in the argv item.

Consecutive double-quotes

  • Inside a quoted-string, consecutive double-quotes "" cause a single, literal double-quote character " to be placed in the argv item and the quoted-string continues. This is the C/C++ runtime behavior. CommandLineToArgvW (and the C/C++ runtime prior to Visual Studio 2008) behaves differently by ending the quoted-string.
  • Outside a quoted-string, consecutive double-quotes simply toggle the quoted-string as having started then ended, with nothing added to the argv item.