1 | ---
2 | default_highlighter: oils-sh
3 | ---
4 |
5 | Hay - Custom Languages for Unix Systems
6 | =======================================
7 |
8 | *Hay* lets you use the syntax of the YSH to declare **data** and
9 | interleaved **code**. It allows the shell to better serve its role as
10 | essential **glue**. For example, these systems all combine Unix processes in
11 | various ways:
12 |
13 | - local build systems (Ninja, CMake, Debian package builds, Docker/OCI builds)
14 | - remote build services (VM-based continuous integration like sourcehut, Github
15 | Actions)
16 | - local process supervisors (SysV init, systemd)
17 | - remote process supervisors / cluster managers (Slurm, Kubernetes)
18 |
19 | Slogans:
20 |
21 | - *Hay Ain't YAML*.
22 | - It evaluates to [JSON][] + Shell Scripts.
23 | - *We need a better **control plane** language for the cloud*.
24 | - *YSH adds the missing declarative part to shell*.
25 |
26 | This doc describes how to use Hay, with motivating examples.
27 |
28 | As of 2022, this is a new feature of YSH, and **it needs user feedback**.
29 | Nothing is set in stone, so you can influence the language and its features!
30 |
31 |
32 | [JSON]: $xref:JSON
33 |
34 | <!--
35 | - although also Tcl, Lua, Python, Ruby
36 | - DSLs, Config Files, and More
37 | - For Dialects of YSH
38 |
39 | Use case examples
40 | -->
41 |
42 | <!-- cmark.py expands this -->
43 | <div id="toc">
44 | </div>
45 |
46 | ## Example
47 |
48 | Hay could be used to configure a hypothetical Linux package manager:
49 |
50 | # cpython.hay -- A package definition
51 |
52 | hay define Package/TASK # define a tree of Hay node types
53 |
54 | Package cpython { # a node with attributes, and children
55 |
56 | version = '3.9'
57 | url = 'https://python.org'
58 |
59 | TASK build { # a child node, with YSH code
60 | ./configure
61 | make
62 | }
63 | }
64 |
65 | This program evaluates to a JSON tree, which you can consume from programs in
66 | any language, including YSH:
67 |
68 | { "type": "Package",
69 | "args": [ "cpython" ],
70 | "attrs": { "version": "3.9", "url": "https://python.org" },
71 | "children": [
72 | { "type": "TASK",
73 | "args": [ "build" ],
74 | "code_str": " ./configure\n make\n"
75 | }
76 | ]
77 | }
78 |
79 | That is, a package manager can use the attributes to create a build
80 | environment, then execute shell code within it. This is a *staged evaluation
81 | model*.
82 |
83 | ## Understanding Hay
84 |
85 | A goal of Hay is to restore the **simplicity** of Unix to distributed systems.
86 | It's all just **code and data**!
87 |
88 | This means that it's a bit abstract, so here are a few ways of understanding
89 | it.
90 |
91 | ### Analogies
92 |
93 | The relation between Hay and YSH is like the relationship between these pairs
94 | of languages:
95 |
96 | - [YAML][] / [Go templates][], which are used in Helm config for Kubernetes.
97 | - YAML data specifies a **service**, and templates specify **variants**.
98 | - Two common ways of building C and C++ code:
99 | - [Make]($xref:make) / [Autotools]($xref:autotools)
100 | - [Ninja]($xref:ninja) / [CMake][]
101 | - Make and Ninja specify a **build graph**, while autotools and CMake detect
102 | a **configured variant** with respect to your system.
103 |
104 | Each of these is *70's-style macro programming* — a stringly-typed
105 | language generating another stringly-typed language, with all the associated
106 | problems.
107 |
108 | In contrast, Hay and YSH are really the same language, with the same syntax,
109 | and the same Python- and JavaScript-like dynamic **types**. Hay is just YSH
110 | that **builds up data** instead of executing commands.
111 |
112 | (Counterpoint: Ninja is intended for code generation, and it makes sense for
113 | YSH to generate simple languages.)
114 |
115 |
116 | [Go templates]: https://pkg.go.dev/text/template
117 | [CMake]: https://cmake.org
118 |
119 | ### Prior Art
120 |
121 | See the [Survey of Config Languages]($wiki) on the wiki, which puts them in
122 | these categories:
123 |
124 | 1. Languages for String Data
125 | - INI, XML, [YAML][], ...
126 | 1. Languages for Typed Data
127 | - [JSON][], TOML, ...
128 | 1. Programmable String-ish Languages
129 | - Go templates, CMake, autotools/m4, ...
130 | 1. Programmable Typed Data
131 | - Nix expressions, Starlark, Cue, ...
132 | 1. Internal DSLs in General Purpose Languages
133 | - Hay, Guile Scheme for Guix, Ruby blocks, ...
134 |
135 | Excerpts:
136 |
137 | [YAML][] is a data format that is (surprisingly) the de-facto control plane
138 | language for the cloud. It's an approximate superset of [JSON][].
139 |
140 | [UCL][] (universal config language) and [HCL][] (HashiCorp config language) are
141 | influenced by the [Nginx][] config file syntax. If you can read any of these
142 | languages, you can read Hay.
143 |
144 | [Nix][] has a [functional language][nix-lang] to configure Linux distros. In
145 | contrast, Hay is multi-paradigm and imperative.
146 |
147 | [nix-lang]: https://nixos.wiki/wiki/Nix_Expression_Language
148 |
149 | The [Starlark][] language is a dialect of Python used by the [Bazel][] build
150 | system. It uses imperative code to specify build graph variants, and you can
151 | use this same pattern in Hay. That is, if statements, for loops, and functions
152 | are useful in Starlark and Hay.
153 |
154 | [Ruby][]'s use of [first-class
155 | blocks](http://radar.oreilly.com/2014/04/make-magic-with-ruby-dsls.html)
156 | inspired YSH. They're used in systems like Vagrant (VM dev environments) and
157 | Rake (a build system).
158 |
159 | In [Lisp][], code and data are expressed with the same syntax, and can be
160 | interleaved.
161 | [G-Expressions](https://guix.gnu.org/manual/en/html_node/G_002dExpressions.html)
162 | in Guix use a *staged evaluation model*, like Hay.
163 |
164 | [YAML]: $xref:YAML
165 | [UCL]: https://github.com/vstakhov/libucl
166 | [Nginx]: https://en.wikipedia.org/wiki/Nginx
167 | [HCL]: https://github.com/hashicorp/hcl
168 | [Nix]: $xref:nix
169 |
170 | [Starlark]: https://github.com/bazelbuild/starlark
171 | [Bazel]: https://bazel.build/
172 |
173 | [Ruby]: https://www.ruby-lang.org/en/
174 | [Lisp]: https://en.wikipedia.org/wiki/Lisp_(programming_language)
175 |
176 |
177 | ### Comparison
178 |
179 | The biggest difference between Hay and [UCL][] / [HCL][] is that it's
180 | **embedded in a shell**. In other words, Hay languages are *internal DSLs*,
181 | while those languages are *external*.
182 |
183 | This means:
184 |
185 | 1. You can **interleave** shell code with Hay data. We'll discuss the many
186 | uses of this below.
187 | - On the other hand, it's OK to configure simple systems with plain data
188 | like [JSON][]. Hay is for when that stops working!
189 | 1. Hay isn't a library you embed in another program. Instead, you use
190 | Unix-style **process-based** composition.
191 | - For example, [HCL][] is written in Go, which may be hard to embed in a C
192 | or Rust program.
193 | - Note that a process is a good **security** boundary. It can be
194 | additionally run in an OS container or VM.
195 |
196 | <!--
197 | - Code on the **outside** of Hay blocks may use the ["staged programming" / "graph metaprogramming" pattern][build-ci-comments] mentioned above.
198 | - Code on the **inside** is *unevaluated*. You can execute it in another
199 | context, like a remote machine, Linux container, or virtual machine.
200 | -->
201 |
202 | The sections below elaborate on these points.
203 |
204 | [shell-pipelines]: https://www.oilshell.org/blog/2017/01/15.html
205 |
206 | <!--
207 | - YSH has an imperative programming model. It's a little like Starlark.
208 | - Guile / GNU Make.
209 | - Tensorflow.
210 | -->
211 |
212 |
213 | ## Overview
214 |
215 | Hay nodes have a regular structure:
216 |
217 | - They start with a "command", which is called the **type**.
218 | - They accept **string** arguments and **block** arguments. There must be at
219 | least one argument.
220 |
221 | ### Two Kinds of Nodes, and Three Kinds of Evaluation
222 |
223 | There are two kinds of node with this structure.
224 |
225 | (1) `SHELL` nodes contain **unevaluated** code, and their type is ALL CAPS.
226 | The code is turned into a string that can be executed elsewhere.
227 |
228 | TASK build {
229 | ./configure
230 | make
231 | }
232 | # =>
233 | # ... {"code_str": " ./configure\n make\n"}
234 |
235 | (2) `Attr` nodes contain **data**, and their type starts with a capital letter.
236 | They eagerly evaluate a block in a new **stack frame** and turn it into an
237 | **attributes dict**.
238 |
239 | Package cpython {
240 | version = '3.9'
241 | }
242 | # =>
243 | # ... {"attrs": {"version": "3.9"}} ...
244 |
245 | These blocks have a special rule to allow *bare assignments* like `version =
246 | '3.9'`. That is, you don't need keywords like `const` or `var`.
247 |
248 | (3) In contrast to these two types of Hay nodes, YSH builtins that take a block
249 | usually evaluate it eagerly:
250 |
251 | cd /tmp { # run in a new directory
252 | echo $PWD
253 | }
254 |
255 | Builtins are spelled with `lower` case letters, so `SHELL` and `Attr` nodes
256 | won't be confused with them.
257 |
258 | ### Two Stages of Evaluation
259 |
260 | So Hay is designed to be used with a *staged evaluation model*:
261 |
262 | 1. The first stage follows the rules above:
263 | - Tree of Hay nodes → [JSON]($xref) + Unevaluated shell.
264 | - You can use variables, conditionals, loops, and more.
265 | 2. Your app or system controls the second stage. You can invoke YSH again to
266 | execute shell inside a VM, inside a Linux container, or on a remote machine.
267 |
268 | These two stages conceptually different, but use the **same** syntax and
269 | evaluator! Again, the evaluator runs in a mode where it **builds up data**
270 | rather than executing commands.
271 |
272 | ### Result Schema
273 |
274 | Here's a description of the result of Hay evaluation (the first stage).
275 |
276 | # The source may be "cpython.hay"
277 | FileResult = (source Str, children List[NodeResult])
278 |
279 | NodeResult =
280 | # package cpython { version = '3.9' }
281 | Attr (type Str,
282 | args List[Str],
283 | attrs Map[Str, Any],
284 | children List[NodeResult])
285 |
286 | # TASK build { ./configure; make }
287 | | Shell(type Str,
288 | args List[Str],
289 | location_str Str,
290 | location_start_line Int,
291 | code_str Str)
292 |
293 |
294 | Notes:
295 |
296 | - Except for user-defined attributes, the result is statically typed.
297 | - Shell nodes are always leaf nodes.
298 | - Attr nodes may or may not be leaf nodes.
299 |
300 | ## Three Ways to Invoke Hay
301 |
302 | ### Inline Hay Has No Restrictions
303 |
304 | You can put Hay blocks and normal shell code in the same file. Retrieve the
305 | result of Hay evaluation with the `_hay()` function.
306 |
307 | # myscript.ysh
308 |
309 | hay define Rule
310 |
311 | Rule mylib.o {
312 | inputs = ['mylib.c']
313 |
314 | # not recommended, but allowed
315 | echo 'hi'
316 | ls /tmp/$(whoami)
317 | }
318 |
319 | echo 'bye' # other shell code
320 |
321 | const result = _hay()
322 | json write (result)
323 |
324 | In this case, there are no restrictions on the commands you can run.
325 |
326 | ### In Separate Files
327 |
328 | You can put hay definitions in their own file:
329 |
330 | # my-config.hay
331 |
332 | Rule mylib.o {
333 | inputs = ['mylib.c']
334 | }
335 |
336 | echo 'hi' # allowed for debugging
337 | # ls /tmp/$(whoami) would fail due to restrictions on hay evaluation
338 |
339 | In this case, you can use `echo` and `write`, but the interpreted is
340 | **restricted** (see below).
341 |
342 | Parse it with `parse_hay()`, and evaluate it with `eval_hay()`:
343 |
344 | # my-evaluator.ysh
345 |
346 | hay define Rule # node types for the file
347 | const h = parse_hay('build.hay')
348 | const result = eval_hay(h)
349 |
350 | json write (result)
351 | # =>
352 | # {
353 | # "children": [
354 | # { "type": "Rule",
355 | # "args": ["mylib.o"],
356 | # "attrs": {"inputs": ["mylib.c"]}
357 | # }
358 | # ]
359 | # }
360 |
361 | ### In A Block
362 |
363 | Instead of creating separate files, you can also use the `hay eval` builtin:
364 |
365 | hay define Rule
366 |
367 | hay eval :result { # assign to the variable 'result'
368 | Rule mylib.o {
369 | inputs = ['mylib.c']
370 | }
371 | }
372 |
373 | json write (result) # same as above
374 |
375 | This is mainly for testing and demos.
376 |
377 | ## Security Model: Restricted != Sandboxed
378 |
379 | The "restrictions" are **not** a security boundary! (They could be, but we're
380 | not making promises now.)
381 |
382 | Even with `eval_hay()` and `hay eval`, the config file is evaluated in the
383 | **same interpreter**. But the following restrictions apply:
384 |
385 | - External commands aren't allowed
386 | - Builtins other than `echo` and `write` aren't allowed
387 | - For example, the `.hay` file can't invoke `shopt` to change global shell
388 | options
389 | - A new stack frame is created, so the `.hay` file can't mutate your locals
390 | - However it can still mutate globals with `setglobal`!
391 |
392 | In summary, Hay evaluation is restricted to prevent basic mistakes, but your
393 | code isn't completely separate from the evaluated Hay file.
394 |
395 | If you want to evaluate untrusted code, use a **separate process**, and run it
396 | in a container or VM.
397 |
398 | ## Reference
399 |
400 | Here is a list of all the mechanisms mentioned.
401 |
402 | ### Shell Builtins
403 |
404 | - `hay`
405 | - `hay define` to define node types.
406 | - `hay pp` to pretty print the node types.
407 | - `hay reset` to delete both the node types **and** the current evaluation
408 | result.
409 | - `hay eval :result { ... }` to evaluate in restricted mode, and put the
410 | result in a variable.
411 | - Implementation detail: the `haynode` builtin is run when types like
412 | `Package` and `TASK` are invoked. That is, all node types are aliases for
413 | this same builtin.
414 |
415 | ### Functions
416 |
417 | - `parse_hay()` parses a file, just as `bin/ysh` does.
418 | - `eval_hay()` evaluates the parsed file in restricted mode, like `hay eval`.
419 | - `_hay()` retrieves the current result
420 | - It's useful interactive debugging.
421 | - The name starts with `_` because it's a "register" mutated by the
422 | interpreter.
423 |
424 | ### Options
425 |
426 | Hay is parsed and evaluated with option group `ysh:all`, which includes
427 | `parse_proc` and `parse_equals`.
428 |
429 | <!--
430 |
431 | - The `parse_brace` and `parse_equals` options are what let us inside attribute nodes
432 | - `_running_hay`
433 |
434 | -->
435 |
436 |
437 | ## Usage: Interleaving Hay and YSH
438 |
439 | Why would you want to interleave data and code? One reason is to naturally
440 | express variants of a configuration. Here are some examples.
441 |
442 | **Build variants**. There are many variants of the YSH binary:
443 |
444 | - `dbg` and `opt`. the compiler optimization level, and whether debug symbols
445 | are included.
446 | - `asan` and `ubsan`. Dynamic analysis with Clang sanitizers.
447 | - `-D GC_EVERY_ALLOC`. Make a build that helps debug the garbage collector.
448 |
449 | So the Ninja build graph to produce these binaries is **shaped** similarly, but
450 | it **varies** with compiler and linker flags.
451 |
452 | **Service variants**. A common problem in distributed systems is how to
453 | develop and debug services locally.
454 |
455 | Do your service dependencies live in the cloud, or are they run locally? What
456 | about state? Common variants:
457 |
458 | - `local`. Part or all of the service runs locally, so you may pass flags like
459 | `--auth-service localhost:8001` to binaries.
460 | - `staging`. A complete copy of the service, in a different cloud, with a
461 | different database.
462 | - `prod`. The live instance running with user data.
463 |
464 | Again, these collections of services are all **shaped** similarly, but the
465 | flags **vary** based on where binaries are physically running.
466 |
467 | ---
468 |
469 | This model can be referred to as ["graph metaprogramming" or "staged
470 | programming"][build-ci-comments]. In YSH, it's done with dynamically typed
471 | data like integers and dictionaries. In contrast, systems like CMake and
472 | autotools are more stringly typed.
473 |
474 | [build-ci-comments]: https://www.oilshell.org/blog/2021/04/build-ci-comments.html
475 |
476 | The following **examples** are meant to be "evocative"; they're not based on
477 | real code. Again, user feedback can improve them!
478 |
479 | ### Conditionals
480 |
481 | Conditionals can go on the inside of a block:
482 |
483 | Service auth.example.com { # node taking a block
484 | if (variant === 'local') { # condition
485 | port = 8001
486 | } else {
487 | port = 80
488 | }
489 | }
490 |
491 | Or on the outside:
492 |
493 | Service web { # node
494 | root = '/home/www'
495 | }
496 |
497 | if (variant === 'local') { # condition
498 | Service auth-local { # node
499 | port = 8001
500 | }
501 | }
502 |
503 |
504 | ### Iteration
505 |
506 | Iteration can also go on the inside of a block:
507 |
508 | Rule foo.o { # node
509 | inputs = [] # populate with all .cc files except one
510 |
511 | # variables ending with _ are "hidden" from block evaluation
512 | for name_ in *.cc {
513 | if name_ !== 'skipped.cc' {
514 | call inputs->append(name_)
515 | }
516 | }
517 | }
518 |
519 | Or on the outside:
520 |
521 | for name_ in *.cc { # loop
522 | Rule $(basename $name_ .cc).o { # node
523 | inputs = [name_]
524 | }
525 | }
526 |
527 |
528 | ### Remove Duplication with `proc`
529 |
530 | Procs can wrap blocks:
531 |
532 | proc myrule(name) {
533 |
534 | # needed for blocks to use variables higher on the stack
535 | shopt --set dynamic_scope {
536 |
537 | Rule dbg/$name.o { # node
538 | inputs = ["$name.c"]
539 | flags = ['-O0']
540 | }
541 |
542 | Rule opt/$name.o { # node
543 | inputs = ["$name.c"]
544 | flags = ['-O2']
545 | }
546 |
547 | }
548 | }
549 |
550 | myrule foo # call proc
551 | myrule bar # call proc
552 |
553 | Or they can be invoked from within blocks:
554 |
555 | proc set-port (port_num; out) {
556 | call out->setValue("localhost:$port_num")
557 | }
558 |
559 | Service foo { # node
560 | set-port 80 :p1 # call proc
561 | set-port 81 :p2 # call proc
562 | }
563 |
564 | ## More Usage Patterns
565 |
566 | ### Using YSH for the Second Stage
567 |
568 | The general pattern is:
569 |
570 | ./my-evaluator.ysh my-config.hay | json read :result
571 |
572 | The evaluator does the following:
573 |
574 | 1. Sets up the execution context with `hay define`
575 | 1. Parses `my-config.hay` with `parse_hay()`
576 | 1. Evaluates it with `eval_hay()`
577 | 1. Prints the result as JSON.
578 |
579 | Then a separate YSH processes reads this JSON and executes application code.
580 |
581 | TODO: Show code example.
582 |
583 | ### Using Python for the Second Stage
584 |
585 | In Python, you would:
586 |
587 | 1. Use the `subprocess` module to invoke `./my-evaluator.ysh my-config.hay`.
588 | 2. Use the `json` module to parse the result.
589 | 3. Then execute application code using the data.
590 |
591 | TODO: Show code example.
592 |
593 | ### Locating Errors in the Original `.hay` File
594 |
595 | The YSH interpreter has 2 flags starting with `--location` that give you
596 | control over error messages.
597 |
598 | ysh --location-str 'foo.hay' --location-start-line 42 -- stage2.ysh
599 |
600 | Set them to the values of fields `location_str` and `location_start_line` in
601 | the result of `SHELL` node evaluation.
602 |
603 | ### Debian `.d` Dirs
604 |
605 | Debian has a pattern of splitting configuration into a **directory** of
606 | concatenated files. It's easier for shell scripts to add to a directory than
607 | add to a file.
608 |
609 | This can be done with an evaluator that simply enumerates all files:
610 |
611 | var results = []
612 | for path in myconfig.d/*.hay {
613 | const code = parse_hay(path)
614 | const result = eval(hay)
615 | call results->append(result)
616 | }
617 |
618 | # Now iterate through results
619 |
620 | ### Parallel Loading
621 |
622 | TODO: Example of using `xargs -P` to spawn processes with `parse_hay()` and
623 | `eval_hay()`. Then merge the JSON results.
624 |
625 | ## Style
626 |
627 | ### Attributes vs. Procs
628 |
629 | Assigning attributes and invoking procs can look similar:
630 |
631 | Package grep {
632 | version = '1.0' # An attribute?
633 |
634 | version 1.0 # or call proc 'version'?
635 | }
636 |
637 | The first style is better for typed data like integers and dictionaries. The
638 | latter style isn't useful here, but it could be if `version 1.0` created
639 | complex Hay nodes.
640 |
641 | ### Attributes vs. Flags
642 |
643 | Hay nodes shouldn't take flags or `--`. Flags are for key-value pairs, and
644 | blocks are better for expressing such data.
645 |
646 | No:
647 |
648 | Package --version 1.0 grep {
649 | license = 'GPL'
650 | }
651 |
652 | Yes:
653 |
654 | Package grep {
655 | version = '1.0'
656 | license = 'GPL'
657 | }
658 |
659 | ### Dicts vs. Blocks
660 |
661 | Superficially, dicts and blocks are similar:
662 |
663 | Package grep {
664 | mydict = {name: 'value'} # a dict
665 |
666 | mynode foo { # a node taking a block
667 | name = 'value'
668 | }
669 | }
670 |
671 | Use dicts in cases where you don't know the names or types up front, like
672 |
673 | files = {'README.md': true, '__init__.py': false}
674 |
675 | Use blocks when there's a **schema**. Blocks are also different because:
676 |
677 | - You can use `if` statements and `for` loops in them.
678 | - You can call `TASK build; TASK test` within a block, creating multiple
679 | objects of the same type.
680 | - Later: custom validation
681 |
682 | ### YSH vs. Shell
683 |
684 | Hay files are parsed as YSH, not OSH. That includes `SHELL` nodes:
685 |
686 | TASK build {
687 | cp @deps /tmp # YSH splicing syntax
688 | }
689 |
690 | If you want to use POSIX shell or bash, use two arguments, the second of which
691 | is a multi-line string:
692 |
693 | TASK build '''
694 | cp "${deps[@]}" /tmp
695 | '''
696 |
697 | The YSH style gives you *static parsing*, which catches some errors earlier.
698 |
699 | ## Future Work
700 |
701 | - `hay proc` for arbitrary schema validation, including JSON schema
702 | - Examples of running hay in a secure process / container, in various languages
703 | - Sandboxing:
704 | - More find-grained rules?
705 | - "restricted" could come with a security guarantee. I've avoided making
706 | such guarantees, but I think it's possible as YSH matures. The
707 | interpreter uses dependency inversion to isolate I/O.
708 | - More location info, including the source file.
709 |
710 | [Please send
711 | feedback](https://github.com/oilshell/oil/wiki/Where-To-Send-Feedback) about
712 | Hay. It will inform and prioritize this work!
713 |
714 | ## Links
715 |
716 | - Blog posts tagged #[hay]($blog-tag). Hay is a general mechanism, so it's
717 | useful to explain it with concrete examples.
718 | - [Data Definition and Code Generation in Tcl](https://trs.jpl.nasa.gov/bitstream/handle/2014/7660/03-1728.pdf) (2003, PDF)
719 | - Like Hay, it has the (Type, Name, Attributes) data model.
720 | - <https://github.com/oilshell/oil/wiki/Config-Dialect>. Design notes and related links on the wiki.