-
-
Notifications
You must be signed in to change notification settings - Fork 162
CSTR Proposal
Related: TSV2 Proposal
Update: This is done, and it's now called QSN: Quoted String Notation. See the qsn/ directory.
Issue 582 is to implement CSTR
Rationale: ls --escaped
and stat
print filenames with 0xFF
bytes differently! We want to document and formalize this small format.
CSTR doesn't stand for anything; it's basically short for "C String". It's spelled a bit like JSON.
It's basically a single quoted string with \
escapes that can express any byte string. We use single rather than double quotes to reduce confusion with JSON.
These are valid strings in the CSTR format:
-
''
- empty string 'foo'
'\t\n'
'foo \xFF'
'nul bytes \0 ok \0'
It could be easier to describe CSTR as a "diff" from the JSON string format.
- Take a JSON string
"foo bar\n"
- Change double quotes to single quotes:
'foo bar\n'
- This also means that JSON's
"\""
becomes'"'
- Conversely, CSTR's
'\''
is"'"
in JSON.
- This also means that JSON's
- And add the ability to express bytes:
'foo bar \xFF \n'
. We should probably keep the ability to express code points like\u00FF
.
It can be implemented in any number of ways, but it's a regular language so Oil's common style with re2c
should work very well.
- If it doesn't know the encoding, it will always print
\x00
for non-printable characters.- Common special cases:
\t \r \n \'
. Not sure about\0
.
- Common special cases:
- If it does know the encoding, it can print code points like
\u1234
.
CSTR is a subset of TSV2. TSV2 might not be implemented in Oil v1, but CSTR is necessary for basic shell functionality like displaying filenames and argv
arrays.
Unquoted variant. "bob" is valid because it doesn't TABs.
name age
bob 10
name age
'bob' 10
- I think the main difference that in Python,
"'"
is valid. In CSTR it has to be'\''
.
https://docs.python.org/2/library/codecs.html#python-specific-encodings
Unix Tools lists tools like find
which understand backslash escapes.