1 | ---
|
2 | in_progress: yes
|
3 | body_css_class: width40 help-body
|
4 | default_highlighter: oils-sh
|
5 | preserve_anchor_case: yes
|
6 | ---
|
7 |
|
8 | JSON / J8 Notation
|
9 | ==================
|
10 |
|
11 | This chapter in the [Oils Reference](index.html) describes [JSON]($xref), and
|
12 | its **J8 Notation** superset.
|
13 |
|
14 | See the [J8 Notation](../j8-notation.html) doc for more background. This doc
|
15 | is a quick reference, not the official spec.
|
16 |
|
17 | <div id="toc">
|
18 | </div>
|
19 |
|
20 |
|
21 | ## J8 Strings
|
22 |
|
23 | J8 strings are an upgrade of JSON strings that solve the *JSON-Unix Mismatch*.
|
24 |
|
25 | That is, Unix deals with byte strings, but JSON can't represent byte strings.
|
26 |
|
27 | <h3 id="json-string">json-string <code>"hi"</code></h3>
|
28 |
|
29 | All JSON strings are valid J8 strings!
|
30 |
|
31 | This is important. Encoders often emit JSON-style `""` strings rather than
|
32 | `u''` or `b''` strings.
|
33 |
|
34 | Example:
|
35 |
|
36 | "hi μ \n"
|
37 |
|
38 | <h3 id="json-escape">json-escape <code>\" \n \u1234</code></h3>
|
39 |
|
40 | As a reminder, the backslash escapes valid in [JSON]($xref) strings are:
|
41 |
|
42 | \" \\
|
43 | \b \f \n \r \t
|
44 | \u1234
|
45 |
|
46 | Additional J8 escapes are valid in `u''` and `b''` strings, described below.
|
47 |
|
48 | <h3 id="surrogate-pair">surrogate-pair <code>\ud83e\udd26</code></h3>
|
49 |
|
50 | JSON's `\u1234` escapes can't represent code points above `U+10000` or
|
51 | 2<sup>16</sup>, so JSON also has a "surrogate pair hack".
|
52 |
|
53 | That is, there are special code points in the "surrogate range" that can be
|
54 | paired to represent larger numbers.
|
55 |
|
56 | See the [Surrogate Pair Blog
|
57 | Post](https://www.oilshell.org/blog/2023/06/surrogate-pair.html) for an
|
58 | example:
|
59 |
|
60 | "\ud83e\udd26"
|
61 |
|
62 | Because JSON strings are valid J8 strings, surrogate pairs are also part of J8
|
63 | notation. Decoders must accept them, but encoders should avoid them.
|
64 |
|
65 | You can emit `u'\u{1f926}'` or `b'\u{1f926}'` instead of `"\ud83\udd26"`.
|
66 |
|
67 | <h3 id="u-prefix">u-prefix <code>u'hi'</code></h3>
|
68 |
|
69 | A type of J8 string.
|
70 |
|
71 | u'hi μ \n'
|
72 |
|
73 | It's never necessary to **emit**, but it can be used to express that a string
|
74 | is **valid Unicode**. JSON strings can represent strings that aren't Unicode
|
75 | because they may contain surrogate halves.
|
76 |
|
77 | In contrast, `u''` strings can only have escapes like `\u{1f926}`, with no
|
78 | surrogate pairs or halves.
|
79 |
|
80 | - The **encoded** bytes must be valid UTF-8, like JSON strings.
|
81 | - The **decoded** bytes must be valid UTF-8, **unlike** JSON strings.
|
82 |
|
83 | Escaping:
|
84 |
|
85 | - `u''` strings may **not** contain `\u1234` escapes. They must be `\u{1234}`,
|
86 | `\u{1f926}`
|
87 | - They may not contain `\yff` escapes, because those would represent a string
|
88 | that's not UTF-8 or Unicode.
|
89 | - Surrogate pairs are never necessary in `u''` or `b''` strings. Use the
|
90 | longer form `\u{1f926}`.
|
91 | - You can always emit literal UTF-8, so `\u{1f926}` escapes aren't strictly
|
92 | necessary. Decoders must accept these escapes.
|
93 | - A literal single quote is escaped with `\'`
|
94 | - Decoders still accept `\"`, but encoders don't emit it.
|
95 |
|
96 | <h3 id="b-prefix">b-prefix <code>b'hi'</code></h3>
|
97 |
|
98 | Another J8 string. These `b''` strings are identical to `u''` strings, but
|
99 | they can also `\yff` escapes.
|
100 |
|
101 | Examples:
|
102 |
|
103 | b'hi μ \n'
|
104 | b'this isn\'t a valid unicode string \yff\fe \u{3bc}'
|
105 |
|
106 | <h3 id="j8-escape">j8-escape<code>\u{1f926} \yff</code></h3>
|
107 |
|
108 | To summarize, the valid J8 escapes are:
|
109 |
|
110 | \'
|
111 | \yff # only valid in b'' strings
|
112 | \u{3bc} \u{1f926} etc.
|
113 |
|
114 | <h3 id="no-prefix">no-prefix <code>'hi'</code></h3>
|
115 |
|
116 | Single-quoted strings without a `u` or `b` prefix are implicitly `u''`.
|
117 |
|
118 | u'hi μ \n'
|
119 | 'hi μ \n' # same as above, no \yff escapes accepted
|
120 |
|
121 | They should be avoided in contexts where `""` strings may also appear, because
|
122 | it's easy to confuse single quotes and double quotes.
|
123 |
|
124 | ## J8 Lines
|
125 |
|
126 | "J8 Lines" is a format built on top of J8 strings. Each line is either:
|
127 |
|
128 | 1. An unquoted string, which must be valid UTF-8. Whitespace is allowed, but
|
129 | not other ASCII control chars.
|
130 | 2. A quoted J8 string (JSON style `""` or J8-style `b'' u'' ''`)
|
131 | 3. An **ignored** empty line
|
132 |
|
133 | In all cases, leading and trailing whitespace is ignored.
|
134 |
|
135 | ### unquoted-line
|
136 |
|
137 | Any line that doesn't begin with `"` or `b'` or `u'` is an unquoted line.
|
138 | Examples:
|
139 |
|
140 | foo bar
|
141 | C:\Program Files\
|
142 | internal "quotes" aren't special
|
143 |
|
144 | In contrast, these are quoted lines, and must be valid J8 strings:
|
145 |
|
146 | "json-style J8 string"
|
147 | b'this is b style'
|
148 | u'this is u style'
|
149 |
|
150 | ## JSON8
|
151 |
|
152 | JSON8 is JSON with 4 more things allowed:
|
153 |
|
154 | 1. J8 strings in addition to JSON strings
|
155 | 1. Comments
|
156 | 1. Unquoted keys (TODO)
|
157 | 1. Trailing commas (TODO)
|
158 |
|
159 | ### json8-num
|
160 |
|
161 | Decoding detail, specific to Oils:
|
162 |
|
163 | If there's a decimal point or `e-10` suffix, then it's decoded into YSH
|
164 | `Float`. Otherwise it's a YSH `Int`.
|
165 |
|
166 | 42 # decoded to Int
|
167 | 42.0 # decoded to Float
|
168 | 42e1 # decoded to Float
|
169 | 42.0e1 # decoded to Float
|
170 |
|
171 | ### json8-str
|
172 |
|
173 | JSON8 strings are exactly J8 strings:
|
174 |
|
175 | <pre>
|
176 | "hi 🤦 \u03bc"
|
177 | u'hi 🤦 \u{3bc}'
|
178 | b'hi 🤦 \u{3bc} \yff'
|
179 | </pre>
|
180 |
|
181 | ### json8-list
|
182 |
|
183 | Like JSON lists, but can have trailing comma. Examples:
|
184 |
|
185 | [42, 43]
|
186 | [42, 43,] # same as above
|
187 |
|
188 | ### json8-dict
|
189 |
|
190 | Like JSON "objects", but:
|
191 |
|
192 | - Can have trailing comma.
|
193 | - Can have unquoted keys, as long as they're an identifier.
|
194 |
|
195 | Examples:
|
196 |
|
197 | {"json8": "message"}
|
198 | {json8: "message"} # same as above
|
199 | {json8: "message",} # same as above
|
200 |
|
201 | ### json8-comment
|
202 |
|
203 | End-of-line comments in the same style as shell:
|
204 |
|
205 | {"json8": "message"} # comment
|
206 |
|
207 | ## TSV8
|
208 |
|
209 | These are the J8 Primitives (Bool, Int, Float, Str), separated by tabs.
|
210 |
|
211 |
|
212 | ### column-attrs
|
213 |
|
214 | ```
|
215 | !tsv8 name age
|
216 | !type Str Int
|
217 | !other x y
|
218 | Alice 42
|
219 | Bob 25
|
220 | ```
|
221 |
|
222 | ### column-types
|
223 |
|
224 | The primitives:
|
225 |
|
226 | - Bool
|
227 | - Int
|
228 | - Float
|
229 | - Str
|
230 |
|
231 | Note: Can `null` be in all cells? Maybe except `Bool`?
|
232 |
|
233 | It can stand in for `NA`?
|
234 |
|
235 | [JSON]: https://json.org
|
236 |
|