OILS / doc / ref / chap-j8.md View on Github | oilshell.org

210 lines, 134 significant
1---
2in_progress: yes
3body_css_class: width40 help-body
4default_highlighter: oils-sh
5preserve_anchor_case: yes
6---
7
8JSON / J8 Notation
9==================
10
11This chapter in the [Oils Reference](index.html) describes [JSON]($xref), and
12its **J8 Notation** superset.
13
14See the [J8 Notation](../j8-notation.html) doc for more background. This doc
15is a quick reference, not the official spec.
16
17<div id="toc">
18</div>
19
20
21## J8 Strings
22
23J8 strings are an upgrade of JSON strings that solve the *JSON-Unix Mismatch*.
24
25That is, Unix deals with byte strings, but JSON can't represent byte strings.
26
27<h3 id="json-string">json-string <code>"hi"</code></h3>
28
29All JSON strings are valid J8 strings!
30
31This is important. Encoders often emit JSON-style `""` strings rather than
32`u''` or `b''` strings.
33
34Example:
35
36 "hi μ \n"
37
38<h3 id="json-escape">json-escape <code>\" \n \u1234</code></h3>
39
40As a reminder, the backslash escapes valid in [JSON]($xref) strings are:
41
42 \" \\
43 \b \f \n \r \t
44 \u1234
45
46Additional J8 escapes are valid in `u''` and `b''` strings, described below.
47
48<h3 id="surrogate-pair">surrogate-pair <code>\ud83e\udd26</code></h3>
49
50JSON's `\u1234` escapes can't represent code points above `U+10000` or
512<sup>16</sup>, so JSON also has a "surrogate pair hack".
52
53That is, there are special code points in the "surrogate range" that can be
54paired to represent larger numbers.
55
56See the [Surrogate Pair Blog
57Post](https://www.oilshell.org/blog/2023/06/surrogate-pair.html) for an
58example:
59
60 "\ud83e\udd26"
61
62Because JSON strings are valid J8 strings, surrogate pairs are also part of J8
63notation. Decoders must accept them, but encoders should avoid them.
64
65You can emit `u'\u{1f926}'` or `b'\u{1f926}'` instead of `"\ud83\udd26"`.
66
67<h3 id="u-prefix">u-prefix <code>u'hi'</code></h3>
68
69A type of J8 string.
70
71 u'hi μ \n'
72
73It's never necessary to **emit**, but it can be used to express that a string
74is **valid Unicode**. JSON strings can represent strings that aren't Unicode
75because they may contain surrogate halves.
76
77In contrast, `u''` strings can only have escapes like `\u{1f926}`, with no
78surrogate pairs or halves.
79
80- The **encoded** bytes must be valid UTF-8, like JSON strings.
81- The **decoded** bytes must be valid UTF-8, **unlike** JSON strings.
82
83Escaping:
84
85- `u''` strings may **not** contain `\u1234` escapes. They must be `\u{1234}`,
86 `\u{1f926}`
87- They may not contain `\yff` escapes, because those would represent a string
88 that's not UTF-8 or Unicode.
89- Surrogate pairs are never necessary in `u''` or `b''` strings. Use the
90 longer form `\u{1f926}`.
91- You can always emit literal UTF-8, so `\u{1f926}` escapes aren't strictly
92 necessary. Decoders must accept these escapes.
93- A literal single quote is escaped with `\'`
94 - Decoders still accept `\"`, but encoders don't emit it.
95
96<h3 id="b-prefix">b-prefix <code>b'hi'</code></h3>
97
98Another J8 string. These `b''` strings are identical to `u''` strings, but
99they can also `\yff` escapes.
100
101Examples:
102
103 b'hi μ \n'
104 b'this isn\'t a valid unicode string \yff\fe \u{3bc}'
105
106<h3 id="j8-escape">j8-escape<code>\u{1f926} \yff</code></h3>
107
108To summarize, the valid J8 escapes are:
109
110 \'
111 \yff # only valid in b'' strings
112 \u{3bc} \u{1f926} etc.
113
114<h3 id="no-prefix">no-prefix <code>'hi'</code></h3>
115
116Single-quoted strings without a `u` or `b` prefix are implicitly `u''`.
117
118 u'hi μ \n'
119 'hi μ \n' # same as above, no \yff escapes accepted
120
121They should be avoided in contexts where `""` strings may also appear, because
122it's easy to confuse single quotes and double quotes.
123
124## JSON8
125
126JSON8 is JSON with 4 more things allowed:
127
1281. J8 strings in addition to JSON strings
1291. Comments
1301. Unquoted keys (TODO)
1311. Trailing commas (TODO)
132
133### json8-num
134
135Decoding detail, specific to Oils:
136
137If there's a decimal point or `e-10` suffix, then it's decoded into YSH
138`Float`. Otherwise it's a YSH `Int`.
139
140 42 # decoded to Int
141 42.0 # decoded to Float
142 42e1 # decoded to Float
143 42.0e1 # decoded to Float
144
145### json8-str
146
147JSON8 strings are exactly J8 strings:
148
149<pre>
150"hi &#x1f926; \u03bc"
151u'hi &#x1f926; \u{3bc}'
152b'hi &#x1f926; \u{3bc} \yff'
153</pre>
154
155### json8-list
156
157Like JSON lists, but can have trailing comma. Examples:
158
159 [42, 43]
160 [42, 43,] # same as above
161
162### json8-dict
163
164Like JSON "objects", but:
165
166- Can have trailing comma.
167- Can have unquoted keys, as long as they're an identifier.
168
169Examples:
170
171 {"json8": "message"}
172 {json8: "message"} # same as above
173 {json8: "message",} # same as above
174
175### json8-comment
176
177End-of-line comments in the same style as shell:
178
179 {"json8": "message"} # comment
180
181## TSV8
182
183These are the J8 Primitives (Bool, Int, Float, Str), separated by tabs.
184
185
186### column-attrs
187
188```
189!tsv8 name age
190!type Str Int
191!other x y
192 Alice 42
193 Bob 25
194```
195
196### column-types
197
198The primitives:
199
200- Bool
201- Int
202- Float
203- Str
204
205Note: Can `null` be in all cells? Maybe except `Bool`?
206
207It can stand in for `NA`?
208
209[JSON]: https://json.org
210