Why Sponsor Oils? | source | all docs for version 0.22.0 | all versions | oilshell.org
Oils Reference — Chapter JSON / J8 Notation
This chapter describes JSON, and its J8 Notation superset.
See the J8 Notation doc for more background. This doc is a quick reference, not the official spec.
(in progress)
J8 strings are an upgrade of JSON strings that solve the JSON-Unix Mismatch.
That is, Unix deals with byte strings, but JSON can't represent byte strings.
"hi"All JSON strings are valid J8 strings!
This is important for compatibility.  Encoders may prefer to emit JSON-style
"" strings rather than u'' or b'' strings.
Example:
"hi μ \n"
To be explicit, you can prefix JSON strings with j:
j"hi μ \n"  # same as above
Of course, the j"" prefix is accepted by our json8 builtin, but not the
json builtin.
\" \n \u1234As a reminder, the backslash escapes valid in JSON strings are:
\" \\
\b \f \n \r \t
\u1234
Additional J8 escapes are valid in u'' and b'' strings, described below.
\ud83e\udd26JSON's \u1234 escapes can't represent code points above U+10000 or
216, so JSON also has a "surrogate pair hack".
That is, there are special code points in the "surrogate range" that can be paired to represent larger numbers.
See the Surrogate Pair Blog Post for an example:
"\ud83e\udd26"
Because JSON strings are valid J8 strings, surrogate pairs are also part of J8 notation. Decoders must accept them, but encoders should avoid them.
You can emit u'\u{1f926}' or b'\u{1f926}' instead of "\ud83\udd26".
u'hi'A type of J8 string.
u'hi μ \n'
It's never necessary to emit, but it can be used to express that a string is valid Unicode. JSON strings can represent strings that aren't Unicode because they may contain surrogate halves.
In contrast, u'' strings can only have escapes like \u{1f926}, with no
surrogate pairs or halves.
Escaping:
u'' strings may not contain \u1234 escapes.  They must be \u{1234},
\u{1f926}\yff escapes, because those would represent a string
that's not UTF-8 or Unicode.u'' or b'' strings.  Use the
longer form \u{1f926}.\u{1f926} escapes aren't strictly
necessary.  Decoders must accept these escapes.\'
\", but encoders don't emit it.b'hi'Another J8 string.  These b'' strings are identical to u'' strings, but
they can also \yff escapes.
Examples:
b'hi μ \n'
b'this isn\'t a valid unicode string \yff\fe \u{3bc}'
\u{1f926} \yffTo summarize, the valid J8 escapes are:
\'
\yff   # only valid in b'' strings
\u{3bc} \u{1f926} etc.
'hi'Single-quoted strings without a u or b prefix are implicitly u''.
u'hi μ \n'  
 'hi μ \n'  # same as above, no \yff escapes accepted
They should be avoided in contexts where "" strings may also appear, because
it's easy to confuse single quotes and double quotes.
"J8 Lines" is a format built on top of J8 strings. Each line is either:
"" or J8-style b'' u'' '')In all cases, leading and trailing whitespace is ignored.
Any line that doesn't begin with " or b' or u' is an unquoted line.
Examples:
foo bar
C:\Program Files\
internal "quotes" aren't special
In contrast, these are quoted lines, and must be valid J8 strings:
"json-style J8 string"
b'this is b style'
u'this is u style'
JSON8 is JSON with 4 more things allowed:
JSON8 numbers are identical to JSON numbers.
Here is a decoding detail, specific to Oils:
If there's a decimal point or e-10 suffix, then it's decoded into a YSH
Float.  Otherwise it's a YSH Int.
42       # decoded to Int
42.0     # decoded to Float
42e1     # decoded to Float
42.0e1   # decoded to Float
JSON8 strings are J8 strings:
"hi 🤦 \u03bc"
u'hi 🤦 \u{3bc}'
b'hi 🤦 \u{3bc} \yff'
TODO:
Like JSON lists, but can have trailing comma. Examples:
[42, 43]
[42, 43,]   # same as above
TODO:
Like JSON "objects", but:
Examples:
{"json8": "message"}
{json8: "message"}     # same as above
{json8: "message",}    # same as above
End-of-line comments in the same style as shell:
{"json8": "message"}   # comment
These are the J8 Primitives (Bool, Int, Float, Str), separated by tabs.
!tsv8    name    age
!type    Str     Int
!other   x       y
         Alice   42
         Bob     25
The primitives:
Note: Can null be in all cells?  Maybe except Bool?
It can stand in for NA?