OILS / doc / ref / chap-j8.md View on Github | oilshell.org

236 lines, 151 significant
1---
2in_progress: yes
3body_css_class: width40 help-body
4default_highlighter: oils-sh
5preserve_anchor_case: yes
6---
7
8JSON / J8 Notation
9==================
10
11This chapter in the [Oils Reference](index.html) describes [JSON]($xref), and
12its **J8 Notation** superset.
13
14See the [J8 Notation](../j8-notation.html) doc for more background. This doc
15is a quick reference, not the official spec.
16
17<div id="toc">
18</div>
19
20
21## J8 Strings
22
23J8 strings are an upgrade of JSON strings that solve the *JSON-Unix Mismatch*.
24
25That is, Unix deals with byte strings, but JSON can't represent byte strings.
26
27<h3 id="json-string">json-string <code>"hi"</code></h3>
28
29All JSON strings are valid J8 strings!
30
31This is important. Encoders often emit JSON-style `""` strings rather than
32`u''` or `b''` strings.
33
34Example:
35
36 "hi μ \n"
37
38<h3 id="json-escape">json-escape <code>\" \n \u1234</code></h3>
39
40As a reminder, the backslash escapes valid in [JSON]($xref) strings are:
41
42 \" \\
43 \b \f \n \r \t
44 \u1234
45
46Additional J8 escapes are valid in `u''` and `b''` strings, described below.
47
48<h3 id="surrogate-pair">surrogate-pair <code>\ud83e\udd26</code></h3>
49
50JSON's `\u1234` escapes can't represent code points above `U+10000` or
512<sup>16</sup>, so JSON also has a "surrogate pair hack".
52
53That is, there are special code points in the "surrogate range" that can be
54paired to represent larger numbers.
55
56See the [Surrogate Pair Blog
57Post](https://www.oilshell.org/blog/2023/06/surrogate-pair.html) for an
58example:
59
60 "\ud83e\udd26"
61
62Because JSON strings are valid J8 strings, surrogate pairs are also part of J8
63notation. Decoders must accept them, but encoders should avoid them.
64
65You can emit `u'\u{1f926}'` or `b'\u{1f926}'` instead of `"\ud83\udd26"`.
66
67<h3 id="u-prefix">u-prefix <code>u'hi'</code></h3>
68
69A type of J8 string.
70
71 u'hi μ \n'
72
73It's never necessary to **emit**, but it can be used to express that a string
74is **valid Unicode**. JSON strings can represent strings that aren't Unicode
75because they may contain surrogate halves.
76
77In contrast, `u''` strings can only have escapes like `\u{1f926}`, with no
78surrogate pairs or halves.
79
80- The **encoded** bytes must be valid UTF-8, like JSON strings.
81- The **decoded** bytes must be valid UTF-8, **unlike** JSON strings.
82
83Escaping:
84
85- `u''` strings may **not** contain `\u1234` escapes. They must be `\u{1234}`,
86 `\u{1f926}`
87- They may not contain `\yff` escapes, because those would represent a string
88 that's not UTF-8 or Unicode.
89- Surrogate pairs are never necessary in `u''` or `b''` strings. Use the
90 longer form `\u{1f926}`.
91- You can always emit literal UTF-8, so `\u{1f926}` escapes aren't strictly
92 necessary. Decoders must accept these escapes.
93- A literal single quote is escaped with `\'`
94 - Decoders still accept `\"`, but encoders don't emit it.
95
96<h3 id="b-prefix">b-prefix <code>b'hi'</code></h3>
97
98Another J8 string. These `b''` strings are identical to `u''` strings, but
99they can also `\yff` escapes.
100
101Examples:
102
103 b'hi μ \n'
104 b'this isn\'t a valid unicode string \yff\fe \u{3bc}'
105
106<h3 id="j8-escape">j8-escape<code>\u{1f926} \yff</code></h3>
107
108To summarize, the valid J8 escapes are:
109
110 \'
111 \yff # only valid in b'' strings
112 \u{3bc} \u{1f926} etc.
113
114<h3 id="no-prefix">no-prefix <code>'hi'</code></h3>
115
116Single-quoted strings without a `u` or `b` prefix are implicitly `u''`.
117
118 u'hi μ \n'
119 'hi μ \n' # same as above, no \yff escapes accepted
120
121They should be avoided in contexts where `""` strings may also appear, because
122it's easy to confuse single quotes and double quotes.
123
124## J8 Lines
125
126"J8 Lines" is a format built on top of J8 strings. Each line is either:
127
1281. An unquoted string, which must be valid UTF-8. Whitespace is allowed, but
129 not other ASCII control chars.
1302. A quoted J8 string (JSON style `""` or J8-style `b'' u'' ''`)
1313. An **ignored** empty line
132
133In all cases, leading and trailing whitespace is ignored.
134
135### unquoted-line
136
137Any line that doesn't begin with `"` or `b'` or `u'` is an unquoted line.
138Examples:
139
140 foo bar
141 C:\Program Files\
142 internal "quotes" aren't special
143
144In contrast, these are quoted lines, and must be valid J8 strings:
145
146 "json-style J8 string"
147 b'this is b style'
148 u'this is u style'
149
150## JSON8
151
152JSON8 is JSON with 4 more things allowed:
153
1541. J8 strings in addition to JSON strings
1551. Comments
1561. Unquoted keys (TODO)
1571. Trailing commas (TODO)
158
159### json8-num
160
161Decoding detail, specific to Oils:
162
163If there's a decimal point or `e-10` suffix, then it's decoded into YSH
164`Float`. Otherwise it's a YSH `Int`.
165
166 42 # decoded to Int
167 42.0 # decoded to Float
168 42e1 # decoded to Float
169 42.0e1 # decoded to Float
170
171### json8-str
172
173JSON8 strings are exactly J8 strings:
174
175<pre>
176"hi &#x1f926; \u03bc"
177u'hi &#x1f926; \u{3bc}'
178b'hi &#x1f926; \u{3bc} \yff'
179</pre>
180
181### json8-list
182
183Like JSON lists, but can have trailing comma. Examples:
184
185 [42, 43]
186 [42, 43,] # same as above
187
188### json8-dict
189
190Like JSON "objects", but:
191
192- Can have trailing comma.
193- Can have unquoted keys, as long as they're an identifier.
194
195Examples:
196
197 {"json8": "message"}
198 {json8: "message"} # same as above
199 {json8: "message",} # same as above
200
201### json8-comment
202
203End-of-line comments in the same style as shell:
204
205 {"json8": "message"} # comment
206
207## TSV8
208
209These are the J8 Primitives (Bool, Int, Float, Str), separated by tabs.
210
211
212### column-attrs
213
214```
215!tsv8 name age
216!type Str Int
217!other x y
218 Alice 42
219 Bob 25
220```
221
222### column-types
223
224The primitives:
225
226- Bool
227- Int
228- Float
229- Str
230
231Note: Can `null` be in all cells? Maybe except `Bool`?
232
233It can stand in for `NA`?
234
235[JSON]: https://json.org
236