| 1 | ---
|
| 2 | in_progress: yes
|
| 3 | body_css_class: width40 help-body
|
| 4 | default_highlighter: oils-sh
|
| 5 | preserve_anchor_case: yes
|
| 6 | ---
|
| 7 |
|
| 8 | JSON / J8 Notation
|
| 9 | ==================
|
| 10 |
|
| 11 | This chapter in the [Oils Reference](index.html) describes [JSON]($xref), and
|
| 12 | its **J8 Notation** superset.
|
| 13 |
|
| 14 | See the [J8 Notation](../j8-notation.html) doc for more background. This doc
|
| 15 | is a quick reference, not the official spec.
|
| 16 |
|
| 17 | <div id="toc">
|
| 18 | </div>
|
| 19 |
|
| 20 |
|
| 21 | ## J8 Strings
|
| 22 |
|
| 23 | J8 strings are an upgrade of JSON strings that solve the *JSON-Unix Mismatch*.
|
| 24 |
|
| 25 | That is, Unix deals with byte strings, but JSON can't represent byte strings.
|
| 26 |
|
| 27 | <h3 id="json-string">json-string <code>"hi"</code></h3>
|
| 28 |
|
| 29 | All JSON strings are valid J8 strings!
|
| 30 |
|
| 31 | This is important. Encoders often emit JSON-style `""` strings rather than
|
| 32 | `u''` or `b''` strings.
|
| 33 |
|
| 34 | Example:
|
| 35 |
|
| 36 | "hi μ \n"
|
| 37 |
|
| 38 | <h3 id="json-escape">json-escape <code>\" \n \u1234</code></h3>
|
| 39 |
|
| 40 | As a reminder, the backslash escapes valid in [JSON]($xref) strings are:
|
| 41 |
|
| 42 | \" \\
|
| 43 | \b \f \n \r \t
|
| 44 | \u1234
|
| 45 |
|
| 46 | Additional J8 escapes are valid in `u''` and `b''` strings, described below.
|
| 47 |
|
| 48 | <h3 id="surrogate-pair">surrogate-pair <code>\ud83e\udd26</code></h3>
|
| 49 |
|
| 50 | JSON's `\u1234` escapes can't represent code points above `U+10000` or
|
| 51 | 2<sup>16</sup>, so JSON also has a "surrogate pair hack".
|
| 52 |
|
| 53 | That is, there are special code points in the "surrogate range" that can be
|
| 54 | paired to represent larger numbers.
|
| 55 |
|
| 56 | See the [Surrogate Pair Blog
|
| 57 | Post](https://www.oilshell.org/blog/2023/06/surrogate-pair.html) for an
|
| 58 | example:
|
| 59 |
|
| 60 | "\ud83e\udd26"
|
| 61 |
|
| 62 | Because JSON strings are valid J8 strings, surrogate pairs are also part of J8
|
| 63 | notation. Decoders must accept them, but encoders should avoid them.
|
| 64 |
|
| 65 | You can emit `u'\u{1f926}'` or `b'\u{1f926}'` instead of `"\ud83\udd26"`.
|
| 66 |
|
| 67 | <h3 id="u-prefix">u-prefix <code>u'hi'</code></h3>
|
| 68 |
|
| 69 | A type of J8 string.
|
| 70 |
|
| 71 | u'hi μ \n'
|
| 72 |
|
| 73 | It's never necessary to **emit**, but it can be used to express that a string
|
| 74 | is **valid Unicode**. JSON strings can represent strings that aren't Unicode
|
| 75 | because they may contain surrogate halves.
|
| 76 |
|
| 77 | In contrast, `u''` strings can only have escapes like `\u{1f926}`, with no
|
| 78 | surrogate pairs or halves.
|
| 79 |
|
| 80 | - The **encoded** bytes must be valid UTF-8, like JSON strings.
|
| 81 | - The **decoded** bytes must be valid UTF-8, **unlike** JSON strings.
|
| 82 |
|
| 83 | Escaping:
|
| 84 |
|
| 85 | - `u''` strings may **not** contain `\u1234` escapes. They must be `\u{1234}`,
|
| 86 | `\u{1f926}`
|
| 87 | - They may not contain `\yff` escapes, because those would represent a string
|
| 88 | that's not UTF-8 or Unicode.
|
| 89 | - Surrogate pairs are never necessary in `u''` or `b''` strings. Use the
|
| 90 | longer form `\u{1f926}`.
|
| 91 | - You can always emit literal UTF-8, so `\u{1f926}` escapes aren't strictly
|
| 92 | necessary. Decoders must accept these escapes.
|
| 93 | - A literal single quote is escaped with `\'`
|
| 94 | - Decoders still accept `\"`, but encoders don't emit it.
|
| 95 |
|
| 96 | <h3 id="b-prefix">b-prefix <code>b'hi'</code></h3>
|
| 97 |
|
| 98 | Another J8 string. These `b''` strings are identical to `u''` strings, but
|
| 99 | they can also `\yff` escapes.
|
| 100 |
|
| 101 | Examples:
|
| 102 |
|
| 103 | b'hi μ \n'
|
| 104 | b'this isn\'t a valid unicode string \yff\fe \u{3bc}'
|
| 105 |
|
| 106 | <h3 id="j8-escape">j8-escape<code>\u{1f926} \yff</code></h3>
|
| 107 |
|
| 108 | To summarize, the valid J8 escapes are:
|
| 109 |
|
| 110 | \'
|
| 111 | \yff # only valid in b'' strings
|
| 112 | \u{3bc} \u{1f926} etc.
|
| 113 |
|
| 114 | <h3 id="no-prefix">no-prefix <code>'hi'</code></h3>
|
| 115 |
|
| 116 | Single-quoted strings without a `u` or `b` prefix are implicitly `u''`.
|
| 117 |
|
| 118 | u'hi μ \n'
|
| 119 | 'hi μ \n' # same as above, no \yff escapes accepted
|
| 120 |
|
| 121 | They should be avoided in contexts where `""` strings may also appear, because
|
| 122 | it's easy to confuse single quotes and double quotes.
|
| 123 |
|
| 124 | ## J8 Lines
|
| 125 |
|
| 126 | "J8 Lines" is a format built on top of J8 strings. Each line is either:
|
| 127 |
|
| 128 | 1. An unquoted string, which must be valid UTF-8. Whitespace is allowed, but
|
| 129 | not other ASCII control chars.
|
| 130 | 2. A quoted J8 string (JSON style `""` or J8-style `b'' u'' ''`)
|
| 131 | 3. An **ignored** empty line
|
| 132 |
|
| 133 | In all cases, leading and trailing whitespace is ignored.
|
| 134 |
|
| 135 | ### unquoted-line
|
| 136 |
|
| 137 | Any line that doesn't begin with `"` or `b'` or `u'` is an unquoted line.
|
| 138 | Examples:
|
| 139 |
|
| 140 | foo bar
|
| 141 | C:\Program Files\
|
| 142 | internal "quotes" aren't special
|
| 143 |
|
| 144 | In contrast, these are quoted lines, and must be valid J8 strings:
|
| 145 |
|
| 146 | "json-style J8 string"
|
| 147 | b'this is b style'
|
| 148 | u'this is u style'
|
| 149 |
|
| 150 | ## JSON8
|
| 151 |
|
| 152 | JSON8 is JSON with 4 more things allowed:
|
| 153 |
|
| 154 | 1. J8 strings in addition to JSON strings
|
| 155 | 1. Comments
|
| 156 | 1. Unquoted keys (TODO)
|
| 157 | 1. Trailing commas (TODO)
|
| 158 |
|
| 159 | ### json8-num
|
| 160 |
|
| 161 | Decoding detail, specific to Oils:
|
| 162 |
|
| 163 | If there's a decimal point or `e-10` suffix, then it's decoded into YSH
|
| 164 | `Float`. Otherwise it's a YSH `Int`.
|
| 165 |
|
| 166 | 42 # decoded to Int
|
| 167 | 42.0 # decoded to Float
|
| 168 | 42e1 # decoded to Float
|
| 169 | 42.0e1 # decoded to Float
|
| 170 |
|
| 171 | ### json8-str
|
| 172 |
|
| 173 | JSON8 strings are exactly J8 strings:
|
| 174 |
|
| 175 | <pre>
|
| 176 | "hi 🤦 \u03bc"
|
| 177 | u'hi 🤦 \u{3bc}'
|
| 178 | b'hi 🤦 \u{3bc} \yff'
|
| 179 | </pre>
|
| 180 |
|
| 181 | ### json8-list
|
| 182 |
|
| 183 | Like JSON lists, but can have trailing comma. Examples:
|
| 184 |
|
| 185 | [42, 43]
|
| 186 | [42, 43,] # same as above
|
| 187 |
|
| 188 | ### json8-dict
|
| 189 |
|
| 190 | Like JSON "objects", but:
|
| 191 |
|
| 192 | - Can have trailing comma.
|
| 193 | - Can have unquoted keys, as long as they're an identifier.
|
| 194 |
|
| 195 | Examples:
|
| 196 |
|
| 197 | {"json8": "message"}
|
| 198 | {json8: "message"} # same as above
|
| 199 | {json8: "message",} # same as above
|
| 200 |
|
| 201 | ### json8-comment
|
| 202 |
|
| 203 | End-of-line comments in the same style as shell:
|
| 204 |
|
| 205 | {"json8": "message"} # comment
|
| 206 |
|
| 207 | ## TSV8
|
| 208 |
|
| 209 | These are the J8 Primitives (Bool, Int, Float, Str), separated by tabs.
|
| 210 |
|
| 211 |
|
| 212 | ### column-attrs
|
| 213 |
|
| 214 | ```
|
| 215 | !tsv8 name age
|
| 216 | !type Str Int
|
| 217 | !other x y
|
| 218 | Alice 42
|
| 219 | Bob 25
|
| 220 | ```
|
| 221 |
|
| 222 | ### column-types
|
| 223 |
|
| 224 | The primitives:
|
| 225 |
|
| 226 | - Bool
|
| 227 | - Int
|
| 228 | - Float
|
| 229 | - Str
|
| 230 |
|
| 231 | Note: Can `null` be in all cells? Maybe except `Bool`?
|
| 232 |
|
| 233 | It can stand in for `NA`?
|
| 234 |
|
| 235 | [JSON]: https://json.org
|
| 236 |
|