| 1 | ---
|
| 2 | in_progress: yes
|
| 3 | body_css_class: width40 help-body
|
| 4 | default_highlighter: oils-sh
|
| 5 | preserve_anchor_case: yes
|
| 6 | ---
|
| 7 |
|
| 8 | JSON / J8 Notation
|
| 9 | ==================
|
| 10 |
|
| 11 | This chapter in the [Oils Reference](index.html) describes [JSON]($xref), and
|
| 12 | its **J8 Notation** superset.
|
| 13 |
|
| 14 | See the [J8 Notation](../j8-notation.html) doc for more background. This doc
|
| 15 | is a quick reference, not the official spec.
|
| 16 |
|
| 17 | <div id="toc">
|
| 18 | </div>
|
| 19 |
|
| 20 |
|
| 21 | ## J8 Strings
|
| 22 |
|
| 23 | J8 strings are an upgrade of JSON strings that solve the *JSON-Unix Mismatch*.
|
| 24 |
|
| 25 | That is, Unix deals with byte strings, but JSON can't represent byte strings.
|
| 26 |
|
| 27 | <h3 id="json-string">json-string <code>"hi"</code></h3>
|
| 28 |
|
| 29 | All JSON strings are valid J8 strings!
|
| 30 |
|
| 31 | This is important. Encoders often emit JSON-style `""` strings rather than
|
| 32 | `u''` or `b''` strings.
|
| 33 |
|
| 34 | Example:
|
| 35 |
|
| 36 | "hi μ \n"
|
| 37 |
|
| 38 | <h3 id="json-escape">json-escape <code>\" \n \u1234</code></h3>
|
| 39 |
|
| 40 | As a reminder, the backslash escapes valid in [JSON]($xref) strings are:
|
| 41 |
|
| 42 | \" \\
|
| 43 | \b \f \n \r \t
|
| 44 | \u1234
|
| 45 |
|
| 46 | Additional J8 escapes are valid in `u''` and `b''` strings, described below.
|
| 47 |
|
| 48 | <h3 id="surrogate-pair">surrogate-pair <code>\ud83e\udd26</code></h3>
|
| 49 |
|
| 50 | JSON's `\u1234` escapes can't represent code points above `U+10000` or
|
| 51 | 2<sup>16</sup>, so JSON also has a "surrogate pair hack".
|
| 52 |
|
| 53 | That is, there are special code points in the "surrogate range" that can be
|
| 54 | paired to represent larger numbers.
|
| 55 |
|
| 56 | See the [Surrogate Pair Blog
|
| 57 | Post](https://www.oilshell.org/blog/2023/06/surrogate-pair.html) for an
|
| 58 | example:
|
| 59 |
|
| 60 | "\ud83e\udd26"
|
| 61 |
|
| 62 | Because JSON strings are valid J8 strings, surrogate pairs are also part of J8
|
| 63 | notation. Decoders must accept them, but encoders should avoid them.
|
| 64 |
|
| 65 | You can emit `u'\u{1f926}'` or `b'\u{1f926}'` instead of `"\ud83\udd26"`.
|
| 66 |
|
| 67 | <h3 id="u-prefix">u-prefix <code>u'hi'</code></h3>
|
| 68 |
|
| 69 | A type of J8 string.
|
| 70 |
|
| 71 | u'hi μ \n'
|
| 72 |
|
| 73 | It's never necessary to **emit**, but it can be used to express that a string
|
| 74 | is **valid Unicode**. JSON strings can represent strings that aren't Unicode
|
| 75 | because they may contain surrogate halves.
|
| 76 |
|
| 77 | In contrast, `u''` strings can only have escapes like `\u{1f926}`, with no
|
| 78 | surrogate pairs or halves.
|
| 79 |
|
| 80 | - The **encoded** bytes must be valid UTF-8, like JSON strings.
|
| 81 | - The **decoded** bytes must be valid UTF-8, **unlike** JSON strings.
|
| 82 |
|
| 83 | Escaping:
|
| 84 |
|
| 85 | - `u''` strings may **not** contain `\u1234` escapes. They must be `\u{1234}`,
|
| 86 | `\u{1f926}`
|
| 87 | - They may not contain `\yff` escapes, because those would represent a string
|
| 88 | that's not UTF-8 or Unicode.
|
| 89 | - Surrogate pairs are never necessary in `u''` or `b''` strings. Use the
|
| 90 | longer form `\u{1f926}`.
|
| 91 | - You can always emit literal UTF-8, so `\u{1f926}` escapes aren't strictly
|
| 92 | necessary. Decoders must accept these escapes.
|
| 93 | - A literal single quote is escaped with `\'`
|
| 94 | - Decoders still accept `\"`, but encoders don't emit it.
|
| 95 |
|
| 96 | <h3 id="b-prefix">b-prefix <code>b'hi'</code></h3>
|
| 97 |
|
| 98 | Another J8 string. These `b''` strings are identical to `u''` strings, but
|
| 99 | they can also `\yff` escapes.
|
| 100 |
|
| 101 | Examples:
|
| 102 |
|
| 103 | b'hi μ \n'
|
| 104 | b'this isn\'t a valid unicode string \yff\fe \u{3bc}'
|
| 105 |
|
| 106 | <h3 id="j8-escape">j8-escape<code>\u{1f926} \yff</code></h3>
|
| 107 |
|
| 108 | To summarize, the valid J8 escapes are:
|
| 109 |
|
| 110 | \'
|
| 111 | \yff # only valid in b'' strings
|
| 112 | \u{3bc} \u{1f926} etc.
|
| 113 |
|
| 114 | <h3 id="no-prefix">no-prefix <code>'hi'</code></h3>
|
| 115 |
|
| 116 | Single-quoted strings without a `u` or `b` prefix are implicitly `u''`.
|
| 117 |
|
| 118 | u'hi μ \n'
|
| 119 | 'hi μ \n' # same as above, no \yff escapes accepted
|
| 120 |
|
| 121 | They should be avoided in contexts where `""` strings may also appear, because
|
| 122 | it's easy to confuse single quotes and double quotes.
|
| 123 |
|
| 124 | ## JSON8
|
| 125 |
|
| 126 | JSON8 is JSON with 4 more things allowed:
|
| 127 |
|
| 128 | 1. J8 strings in addition to JSON strings
|
| 129 | 1. Comments
|
| 130 | 1. Unquoted keys (TODO)
|
| 131 | 1. Trailing commas (TODO)
|
| 132 |
|
| 133 | ### json8-num
|
| 134 |
|
| 135 | Decoding detail, specific to Oils:
|
| 136 |
|
| 137 | If there's a decimal point or `e-10` suffix, then it's decoded into YSH
|
| 138 | `Float`. Otherwise it's a YSH `Int`.
|
| 139 |
|
| 140 | 42 # decoded to Int
|
| 141 | 42.0 # decoded to Float
|
| 142 | 42e1 # decoded to Float
|
| 143 | 42.0e1 # decoded to Float
|
| 144 |
|
| 145 | ### json8-str
|
| 146 |
|
| 147 | JSON8 strings are exactly J8 strings:
|
| 148 |
|
| 149 | <pre>
|
| 150 | "hi 🤦 \u03bc"
|
| 151 | u'hi 🤦 \u{3bc}'
|
| 152 | b'hi 🤦 \u{3bc} \yff'
|
| 153 | </pre>
|
| 154 |
|
| 155 | ### json8-list
|
| 156 |
|
| 157 | Like JSON lists, but can have trailing comma. Examples:
|
| 158 |
|
| 159 | [42, 43]
|
| 160 | [42, 43,] # same as above
|
| 161 |
|
| 162 | ### json8-dict
|
| 163 |
|
| 164 | Like JSON "objects", but:
|
| 165 |
|
| 166 | - Can have trailing comma.
|
| 167 | - Can have unquoted keys, as long as they're an identifier.
|
| 168 |
|
| 169 | Examples:
|
| 170 |
|
| 171 | {"json8": "message"}
|
| 172 | {json8: "message"} # same as above
|
| 173 | {json8: "message",} # same as above
|
| 174 |
|
| 175 | ### json8-comment
|
| 176 |
|
| 177 | End-of-line comments in the same style as shell:
|
| 178 |
|
| 179 | {"json8": "message"} # comment
|
| 180 |
|
| 181 | ## TSV8
|
| 182 |
|
| 183 | These are the J8 Primitives (Bool, Int, Float, Str), separated by tabs.
|
| 184 |
|
| 185 |
|
| 186 | ### column-attrs
|
| 187 |
|
| 188 | ```
|
| 189 | !tsv8 name age
|
| 190 | !type Str Int
|
| 191 | !other x y
|
| 192 | Alice 42
|
| 193 | Bob 25
|
| 194 | ```
|
| 195 |
|
| 196 | ### column-types
|
| 197 |
|
| 198 | The primitives:
|
| 199 |
|
| 200 | - Bool
|
| 201 | - Int
|
| 202 | - Float
|
| 203 | - Str
|
| 204 |
|
| 205 | Note: Can `null` be in all cells? Maybe except `Bool`?
|
| 206 |
|
| 207 | It can stand in for `NA`?
|
| 208 |
|
| 209 | [JSON]: https://json.org
|
| 210 |
|