Oils Reference — Chapter Errors

This chapter describes errors for data languages. An error checklist is often a nice, concise way to describe a language.

Related: Oils Error Catalog, With Hints describes errors in code.

(in progress)

In This Chapter


J8 Notation is built on UTF-8, so let's summarize UTF-8 errors.


Oils stores strings as UTF-8 in memory, so it doesn't encode UTF-8 often.

But it may have a function to encode UTF-8 from a List[Int]. These errors would be handled:

  1. Integer greater than max code point
  2. Code point in the surrogate range


A UTF-8 decoder should handle these errors:

  1. Overlong encoding. In UTF-8, each code point should be represented with the fewest possible bytes.
  2. Surrogate code point. The sequence decodes to a code point in the surrogate range, which is used only for the UTF-16 encoding, not for string data.
  3. Exceeds max code point. The sequence decodes to an integer that's larger than the maximum code point.
  4. Bad encoding. A byte is not encoded like a UTF-8 start byte or a continuation byte.
  5. Incomplete sequence. Too few continuation bytes appeared after the start byte.

J8 String

J8 strings extend JSON strings, and are a primary building block of J8 Notation.


J8 strings can represent any string — bytes or unicode — so there are no encoding errors.


  1. Escape sequence like \u{dc00} should not be in the surrogate range.
  2. Escape sequence like \u{110000} is greater than the maximimum Unicode code point.
  3. Byte escapes like \yff should not be in u'' string.

Implementation-defined limit:

  1. Max string length (NYI)

J8 Lines

Roughly speaking, J8 Lines are an encoding for a stream of J8 strings. In YSH, it's used by @(split command sub).


Like J8 strings, J8 Lines have no encoding errors by design.


  1. Any error in a J8 quoted string.
  2. A line with a quoted string has extra text after it.
  3. An unquoted line is not valid UTF-8.



JSON encoding has these errors:

  1. Object of this type can't be serialized.
  2. Circular reference.
  3. Float values of NaN, Inf, and -Inf can't be encoded.

Note that invalid UTF-8 bytes like 0xfe produce a Unicode replacement character, not a hard error.


  1. The encoded message itself is not valid UTF-8.
  2. Lexical error, like
  3. Grammatical error
  4. Unexpected trailing input

Implementation-defined limits, i.e. outside the grammar:

  1. Integer too big
  2. Floats that are too big
  3. Max array length (NYI)
  4. Max object length (NYI)
  5. Max depth for arrays and objects (NYI)



JSON8 has the same encoding errors as JSON.

However, the encoding is lossless by design. Instead of invalid UTF-8 being turned into a Unicode replacment character, it can use J8 strings with byte escapes like b'byte \yfe\yff'.


JSON8 has the same decoding errors as JSON, plus J8 string decoding errors.

See err-j8-str-decode.

Generated on Thu, 01 Aug 2024 23:32:50 +0000