|
      source |
      all docs
        for  |
      all versions |
      oilshell.org
      
      
This chapter describes errors for data languages.  An error checklist is
often a nice, concise way to describe a language.
Related: Oils Error Catalog, With Hints describes
errors in code.
(in progress)
UTF8
J8 Notation is built on UTF-8, so let's summarize UTF-8 errors.
err-utf8-encode
Oils stores strings as UTF-8 in memory, so it doesn't encode UTF-8 often.
But it may have a function to encode UTF-8 from a List[Int].  These errors
would be handled:
- Integer greater than max code point
- Code point in the surrogate range
err-utf8-decode
A UTF-8 decoder should handle these errors:
- Overlong encoding.  In UTF-8, each code point should be represented with the
fewest possible bytes.
- Overlong encodings are the equivalent of writing the integer 42as042,0042,00042, etc.  This is not allowed.
 
- Surrogate code point.  The sequence decodes to a code point in the surrogate
range, which is used only for the UTF-16 encoding, not for string data.
- Exceeds max code point.  The sequence decodes to an integer that's larger
than the maximum code point.
- Bad encoding.  A byte is not encoded like a UTF-8 start byte or a
continuation byte.
- Incomplete sequence.  Too few continuation bytes appeared after the start
byte.
J8 String
J8 strings extend JSON strings, and are a primary building block of J8
Notation.
err-j8-str-encode
J8 strings can represent any string — bytes or unicode — so there
are no encoding errors.
err-j8-str-decode
- Escape sequence like \u{dc00}should not be in the surrogate range.
- This means it doesn't represent a real character.  Byte escapes like
\yffshould be used instead.
 
- Escape sequence like \u{110000}is greater than the maximimum Unicode code
point.
- Byte escapes like \yffshould not be inu''string.
- By design, they're only valid in b''strings.
 
Implementation-defined limit:
- Max string length (NYI)
- e.g. more than 4 billion bytes could overflow a length field, in some
implementations
 
J8 Lines
Roughly speaking, J8 Lines are an encoding for a stream of J8 strings.  In
YSH, it's used by @(split command sub).
err-j8-lines-encode
Like J8 strings, J8 Lines have no encoding errors by design.
err-j8-lines-decode
- Any error in a J8 quoted string.
- e.g. no closing quote, invalid UTF-8, invalid backslash escape, ...
 
- A line with a quoted string has extra text after it.
- An unquoted line is not valid UTF-8.
JSON
err-json-encode
JSON encoding has these errors:
- Object of this type can't be serialized.
- For example, Str List Dictare Oils objects can be serialized, butEggex Func Rangecan't.
 
- Circular reference.
- e.g. a Dict that points to itself, a List that points to itself, and other
permutations
 
- Float values of NaN, Inf, and -Inf can't be encoded.
- (These encode to nullin Oils, following JavaScript.)
 
Note that invalid UTF-8 bytes like 0xfe produce a Unicode replacement
character, not a hard error.
err-json-decode
- The encoded message itself is not valid UTF-8.
- (Typically, you need to check the unescaped bytes in string literals
"abc\n").
 
- Lexical error, like
- the message +
- an invalid escape "\z"or a truncated escape"\u1"
- A single quoted string like u''
 
- Grammatical error
- Unexpected trailing input
- like the message 42]or{}]
 
Implementation-defined limits, i.e. outside the grammar:
- Integer too big
- implementations may decode to a 64-bit integer
 
- Floats that are too big
- Max array length (NYI)
- e.g. more than 4 billion objects in an array could overflow a length
field, in some implementations
 
- Max object length (NYI)
- Max depth for arrays and objects (NYI)
- to avoid a recursive parser blowing the stack
 
JSON8
err-json8-encode
JSON8 has the same encoding errors as JSON.
However, the encoding is lossless by design.  Instead of invalid UTF-8 being
turned into a Unicode replacment character, it can use J8 strings with byte
escapes like b'byte \yfe\yff'.
err-json8-decode
JSON8 has the same decoding errors as JSON, plus J8 string decoding errors.
See err-j8-str-decode.
    
      Generated on Sat, 10 Aug 2024 23:50:22 +0000