| 1 | ---
|
| 2 | title: JSON / J8 Notation (Oils Reference)
|
| 3 | all_docs_url: ..
|
| 4 | body_css_class: width40
|
| 5 | default_highlighter: oils-sh
|
| 6 | preserve_anchor_case: yes
|
| 7 | ---
|
| 8 |
|
| 9 | <div class="doc-ref-header">
|
| 10 |
|
| 11 | [Oils Reference](index.html) —
|
| 12 | Chapter **JSON / J8 Notation**
|
| 13 |
|
| 14 | </div>
|
| 15 |
|
| 16 | This chapter describes [JSON]($xref), and its **J8 Notation** superset.
|
| 17 |
|
| 18 | See the [J8 Notation doc](../j8-notation.html) for more background. This doc
|
| 19 | is a quick reference, not the official spec.
|
| 20 |
|
| 21 | <span class="in-progress">(in progress)</span>
|
| 22 |
|
| 23 | <div id="dense-toc">
|
| 24 | </div>
|
| 25 |
|
| 26 |
|
| 27 | ## J8 Strings
|
| 28 |
|
| 29 | J8 strings are an upgrade of JSON strings that solve the *JSON-Unix Mismatch*.
|
| 30 |
|
| 31 | That is, Unix deals with byte strings, but JSON can't represent byte strings.
|
| 32 |
|
| 33 | <h3 id="json-string">json-string <code>"hi"</code></h3>
|
| 34 |
|
| 35 | All JSON strings are valid J8 strings!
|
| 36 |
|
| 37 | This is important for compatibility. Encoders may prefer to emit JSON-style
|
| 38 | `""` strings rather than `u''` or `b''` strings.
|
| 39 |
|
| 40 | Example:
|
| 41 |
|
| 42 | "hi μ \n"
|
| 43 |
|
| 44 | To be explicit, you can prefix JSON strings with `j`:
|
| 45 |
|
| 46 | j"hi μ \n" # same as above
|
| 47 |
|
| 48 | Of course, the `j""` prefix is accepted by our `json8` builtin, but not the
|
| 49 | `json` builtin.
|
| 50 |
|
| 51 | <h3 id="json-escape">json-escape <code>\" \n \u1234</code></h3>
|
| 52 |
|
| 53 | As a reminder, the backslash escapes valid in [JSON]($xref) strings are:
|
| 54 |
|
| 55 | \" \\
|
| 56 | \b \f \n \r \t
|
| 57 | \u1234
|
| 58 |
|
| 59 | Additional J8 escapes are valid in `u''` and `b''` strings, described below.
|
| 60 |
|
| 61 | <h3 id="surrogate-pair">surrogate-pair <code>\ud83e\udd26</code></h3>
|
| 62 |
|
| 63 | JSON's `\u1234` escapes can't represent code points above `U+10000` or
|
| 64 | 2<sup>16</sup>, so JSON also has a "surrogate pair hack".
|
| 65 |
|
| 66 | That is, there are special code points in the "surrogate range" that can be
|
| 67 | paired to represent larger numbers.
|
| 68 |
|
| 69 | See the [Surrogate Pair Blog
|
| 70 | Post](https://www.oilshell.org/blog/2023/06/surrogate-pair.html) for an
|
| 71 | example:
|
| 72 |
|
| 73 | "\ud83e\udd26"
|
| 74 |
|
| 75 | Because JSON strings are valid J8 strings, surrogate pairs are also part of J8
|
| 76 | notation. Decoders must accept them, but encoders should avoid them.
|
| 77 |
|
| 78 | You can emit `u'\u{1f926}'` or `b'\u{1f926}'` instead of `"\ud83\udd26"`.
|
| 79 |
|
| 80 | <h3 id="u-prefix">u-prefix <code>u'hi'</code></h3>
|
| 81 |
|
| 82 | A type of J8 string.
|
| 83 |
|
| 84 | u'hi μ \n'
|
| 85 |
|
| 86 | It's never necessary to **emit**, but it can be used to express that a string
|
| 87 | is **valid Unicode**. JSON strings can represent strings that aren't Unicode
|
| 88 | because they may contain surrogate halves.
|
| 89 |
|
| 90 | In contrast, `u''` strings can only have escapes like `\u{1f926}`, with no
|
| 91 | surrogate pairs or halves.
|
| 92 |
|
| 93 | - The **encoded** bytes must be valid UTF-8, like JSON strings.
|
| 94 | - The **decoded** bytes must be valid UTF-8, **unlike** JSON strings.
|
| 95 |
|
| 96 | Escaping:
|
| 97 |
|
| 98 | - `u''` strings may **not** contain `\u1234` escapes. They must be `\u{1234}`,
|
| 99 | `\u{1f926}`
|
| 100 | - They may not contain `\yff` escapes, because those would represent a string
|
| 101 | that's not UTF-8 or Unicode.
|
| 102 | - Surrogate pairs are never necessary in `u''` or `b''` strings. Use the
|
| 103 | longer form `\u{1f926}`.
|
| 104 | - You can always emit literal UTF-8, so `\u{1f926}` escapes aren't strictly
|
| 105 | necessary. Decoders must accept these escapes.
|
| 106 | - A literal single quote is escaped with `\'`
|
| 107 | - Decoders still accept `\"`, but encoders don't emit it.
|
| 108 |
|
| 109 | <h3 id="b-prefix">b-prefix <code>b'hi'</code></h3>
|
| 110 |
|
| 111 | Another J8 string. These `b''` strings are identical to `u''` strings, but
|
| 112 | they can also `\yff` escapes.
|
| 113 |
|
| 114 | Examples:
|
| 115 |
|
| 116 | b'hi μ \n'
|
| 117 | b'this isn\'t a valid unicode string \yff\fe \u{3bc}'
|
| 118 |
|
| 119 | <h3 id="j8-escape">j8-escape<code>\u{1f926} \yff</code></h3>
|
| 120 |
|
| 121 | To summarize, the valid J8 escapes are:
|
| 122 |
|
| 123 | \'
|
| 124 | \yff # only valid in b'' strings
|
| 125 | \u{3bc} \u{1f926} etc.
|
| 126 |
|
| 127 | <h3 id="no-prefix">no-prefix <code>'hi'</code></h3>
|
| 128 |
|
| 129 | Single-quoted strings without a `u` or `b` prefix are implicitly `u''`.
|
| 130 |
|
| 131 | u'hi μ \n'
|
| 132 | 'hi μ \n' # same as above, no \yff escapes accepted
|
| 133 |
|
| 134 | They should be avoided in contexts where `""` strings may also appear, because
|
| 135 | it's easy to confuse single quotes and double quotes.
|
| 136 |
|
| 137 | ## J8 Lines
|
| 138 |
|
| 139 | "J8 Lines" is a format built on top of J8 strings. Each line is either:
|
| 140 |
|
| 141 | 1. An unquoted string, which must be valid UTF-8. Whitespace is allowed, but
|
| 142 | not other ASCII control chars.
|
| 143 | 2. A quoted J8 string (JSON style `""` or J8-style `b'' u'' ''`)
|
| 144 | 3. An **ignored** empty line
|
| 145 |
|
| 146 | In all cases, leading and trailing whitespace is ignored.
|
| 147 |
|
| 148 | ### unquoted-line
|
| 149 |
|
| 150 | Any line that doesn't begin with `"` or `b'` or `u'` is an unquoted line.
|
| 151 | Examples:
|
| 152 |
|
| 153 | foo bar
|
| 154 | C:\Program Files\
|
| 155 | internal "quotes" aren't special
|
| 156 |
|
| 157 | In contrast, these are quoted lines, and must be valid J8 strings:
|
| 158 |
|
| 159 | "json-style J8 string"
|
| 160 | b'this is b style'
|
| 161 | u'this is u style'
|
| 162 |
|
| 163 | ## JSON8
|
| 164 |
|
| 165 | JSON8 is JSON with 4 more things allowed:
|
| 166 |
|
| 167 | 1. J8 strings in addition to JSON strings
|
| 168 | 1. Comments
|
| 169 | 1. Unquoted keys (TODO)
|
| 170 | 1. Trailing commas (TODO)
|
| 171 |
|
| 172 | ### json8-num
|
| 173 |
|
| 174 | JSON8 numbers are identical to JSON numbers.
|
| 175 |
|
| 176 | Here is a decoding detail, specific to Oils:
|
| 177 |
|
| 178 | If there's a decimal point or `e-10` suffix, then it's decoded into a YSH
|
| 179 | `Float`. Otherwise it's a YSH `Int`.
|
| 180 |
|
| 181 | 42 # decoded to Int
|
| 182 | 42.0 # decoded to Float
|
| 183 | 42e1 # decoded to Float
|
| 184 | 42.0e1 # decoded to Float
|
| 185 |
|
| 186 | ### json8-str
|
| 187 |
|
| 188 | JSON8 strings are J8 strings:
|
| 189 |
|
| 190 | <pre>
|
| 191 | "hi 🤦 \u03bc"
|
| 192 | u'hi 🤦 \u{3bc}'
|
| 193 | b'hi 🤦 \u{3bc} \yff'
|
| 194 | </pre>
|
| 195 |
|
| 196 | ### json8-list
|
| 197 |
|
| 198 | TODO:
|
| 199 |
|
| 200 | Like JSON lists, but can have trailing comma. Examples:
|
| 201 |
|
| 202 | [42, 43]
|
| 203 | [42, 43,] # same as above
|
| 204 |
|
| 205 | ### json8-dict
|
| 206 |
|
| 207 | TODO:
|
| 208 |
|
| 209 | Like JSON "objects", but:
|
| 210 |
|
| 211 | - Can have trailing comma.
|
| 212 | - Can have unquoted keys, as long as they're an identifier.
|
| 213 |
|
| 214 | Examples:
|
| 215 |
|
| 216 | {"json8": "message"}
|
| 217 | {json8: "message"} # same as above
|
| 218 | {json8: "message",} # same as above
|
| 219 |
|
| 220 | ### json8-comment
|
| 221 |
|
| 222 | End-of-line comments in the same style as shell:
|
| 223 |
|
| 224 | {"json8": "message"} # comment
|
| 225 |
|
| 226 | ## TSV8
|
| 227 |
|
| 228 | These are the J8 Primitives (Bool, Int, Float, Str), separated by tabs.
|
| 229 |
|
| 230 | ### column-attrs
|
| 231 |
|
| 232 | <!-- Consider #.tsv8 and 'type' perhaps
|
| 233 |
|
| 234 | #.tsv8 name age
|
| 235 | type Str Int
|
| 236 | other x y
|
| 237 | Alice 42
|
| 238 |
|
| 239 | Also consider alignment.
|
| 240 | -->
|
| 241 |
|
| 242 |
|
| 243 | ```
|
| 244 | !tsv8 name age
|
| 245 | !type Str Int
|
| 246 | !other x y
|
| 247 | Alice 42
|
| 248 | Bob 25
|
| 249 | ```
|
| 250 |
|
| 251 | ### column-types
|
| 252 |
|
| 253 | The primitives:
|
| 254 |
|
| 255 | - Bool
|
| 256 | - Int
|
| 257 | - Float
|
| 258 | - Str
|
| 259 |
|
| 260 | Note: Can `null` be in all cells? Maybe except `Bool`?
|
| 261 |
|
| 262 | It can stand in for `NA`?
|
| 263 |
|
| 264 | [JSON]: https://json.org
|
| 265 |
|