-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify SYS_GET_CMDLINE return string format #276
Comments
Good catch! I agree that the text "that is, argc and argv" is confusing and misleading. The intention, and every implementation I've seen, is that
|
Thanks for the quick response!
This is actual what my question is about. The documentation does not define what "the whole command line" or "the command line" means. My original comment lists some interpretations I could come up with for that term could mean. Each interpretation has different consequences for what an embedded programs might be able to do with the received string. For example qemu currently passes just the command name and arguments separated by spaces as"the command line" string, which is different from what POSIX would define as the command line string. |
POSIX doesn't have any definition of a single-string command line at all. In POSIX, the command line is communicated across each On the other hand, on Windows, the command line is communicated across a The semihosting API follows Windows's convention in this respect. The command line passed across semihosting is a single string, with a single NUL terminator at the end. Questions of its semantics are left to each application to define. If you have a tool like A tool like that on Windows would surely do better to get the semihosting command line directly from the whole-string command line passed to the Windows tool – trying to recombine words from the argv generated by its crt0 would lose a lot of precision that it could have avoided losing. |
Thinking about it a bit more, it sounds as if what you're really after is a specification of the convention used for breaking up the If every libc's startup code did that in the same way, then tools like Unfortunately, libc implementations don't agree on a standard convention for this. For example, So there's no convention |
The documentation for the SYS_GET_CMDLINE semihosting operation mentions that the operation "Returns the command line that is used for the call to the executable, that is, argc and argv".
The return fields are then defined as:
It seems to me there are three interpretations of this definition:
field 1
is supposed to contain the command string before argument splitting. When using a POSIX command string, the string is parsed in step, where field splitting (converting the single command string into a command-name string and list of command argument strings) is somewhere in the middle. It is unclear whether the string infield 1
should be the raw, unprocessed command string or should already have gone through the processing steps before field splitting (or anything in between).For example: For the command
./app.elf "hello $(echo world)"
the unprocessed command string would be./app.elf "hello $(echo world)"\0
, where the command string processed up to field splitting would be./app.elf "hello world"\0
.Regardless of the level of processing, this is different from
argv
. Field splitting and quote removal needs to happen on the returned string before it can be used as argv.field 1
is supposed to contain a list of null terminated strings, concatenated together. Although technically a null-terminated string is not forbidden to contain null characters, this feels like stretching the definition offield 1
.For example: For the command
./app.elf "hello $(echo world)"
field 1
would contain./app.elf\0hello world\0
field 1
is supposed to contain a list of strings, separated by spaces. This seems to be qemu's interpretation.For example: For the command
./app.elf "hello $(echo world)"
field 1
would contain./app.elf hello world\0
This form yields a null-terminated string without null characters. However, splitting it up back into the original arguments is ambiguous. This can be seen from the example, where
argv = {"./app.elf", "hello", "world"}
orargv = {"./app.elf", "hello world"}
or evenargv = {"./app.elf hello world"}
could be correct argument vectors that would all yield the given argument string.The examples assume POSIX commands, but I think it's trivial to see how Windows cmd, powershell, or any other command line spec yields similar situations.
I think some more clarity about what the format of the string returned by SYS_GET_CMDLINE is needed. The uncertainty on the format, in my view, defeats the purpose of standardizing the command in the first place, since it can only be parsed when making assumptions about the provider of the string (the host machine).
Personally, I think interpretation 2 (list of string separated by null characters) is the most simple and useful one. Since command names and argument strings cannot contains null characters parsing such a string back into a list of strings is trivial.
The text was updated successfully, but these errors were encountered: