doc/hay.md

OILS / doc / hay.md View on Github | oilshell.org

720 lines, 502 significant

1	---
2	default_highlighter: oils-sh
3	---
4
5	Hay - Custom Languages for Unix Systems
6	=======================================
7
8	Hay lets you use the syntax of the YSH to declare data and
9	interleaved code. It allows the shell to better serve its role as
10	essential glue. For example, these systems all combine Unix processes in
11	various ways:
12
13	- local build systems (Ninja, CMake, Debian package builds, Docker/OCI builds)
14	- remote build services (VM-based continuous integration like sourcehut, Github
15	Actions)
16	- local process supervisors (SysV init, systemd)
17	- remote process supervisors / cluster managers (Slurm, Kubernetes)
18
19	Slogans:
20
21	- Hay Ain't YAML.
22	- It evaluates to [JSON][] + Shell Scripts.
23	- We need a better control plane* language for the cloud*.
24	- YSH adds the missing declarative part to shell.
25
26	This doc describes how to use Hay, with motivating examples.
27
28	As of 2022, this is a new feature of YSH, and it needs user feedback.
29	Nothing is set in stone, so you can influence the language and its features!
30
31
32	[JSON]: $xref:JSON
33
34	<!--
35	- although also Tcl, Lua, Python, Ruby
36	- DSLs, Config Files, and More
37	- For Dialects of YSH
38
39	Use case examples
40	-->
41
42	<!-- cmark.py expands this -->
43	<div id="toc">
44	</div>
45
46	## Example
47
48	Hay could be used to configure a hypothetical Linux package manager:
49
50	# cpython.hay -- A package definition
51
52	hay define Package/TASK # define a tree of Hay node types
53
54	Package cpython { # a node with attributes, and children
55
56	version = '3.9'
57	url = 'https://python.org'
58
59	TASK build { # a child node, with YSH code
60	./configure
61	make
62	}
63	}
64
65	This program evaluates to a JSON tree, which you can consume from programs in
66	any language, including YSH:
67
68	{ "type": "Package",
69	"args": [ "cpython" ],
70	"attrs": { "version": "3.9", "url": "https://python.org" },
71	"children": [
72	{ "type": "TASK",
73	"args": [ "build" ],
74	"code_str": " ./configure\n make\n"
75	}
76	]
77	}
78
79	That is, a package manager can use the attributes to create a build
80	environment, then execute shell code within it. This is a *staged evaluation
81	model*.
82
83	## Understanding Hay
84
85	A goal of Hay is to restore the simplicity of Unix to distributed systems.
86	It's all just code and data!
87
88	This means that it's a bit abstract, so here are a few ways of understanding
89	it.
90
91	### Analogies
92
93	The relation between Hay and YSH is like the relationship between these pairs
94	of languages:
95
96	- [YAML][] / [Go templates][], which are used in Helm config for Kubernetes.
97	- YAML data specifies a service, and templates specify variants.
98	- Two common ways of building C and C++ code:
99	- [Make]($xref:make) / [Autotools]($xref:autotools)
100	- [Ninja]($xref:ninja) / [CMake][]
101	- Make and Ninja specify a build graph, while autotools and CMake detect
102	a configured variant with respect to your system.
103
104	Each of these is 70's-style macro programming — a stringly-typed
105	language generating another stringly-typed language, with all the associated
106	problems.
107
108	In contrast, Hay and YSH are really the same language, with the same syntax,
109	and the same Python- and JavaScript-like dynamic types. Hay is just YSH
110	that builds up data instead of executing commands.
111
112	(Counterpoint: Ninja is intended for code generation, and it makes sense for
113	YSH to generate simple languages.)
114
115
116	[Go templates]: https://pkg.go.dev/text/template
117	[CMake]: https://cmake.org
118
119	### Prior Art
120
121	See the [Survey of Config Languages]($wiki) on the wiki, which puts them in
122	these categories:
123
124	1. Languages for String Data
125	- INI, XML, [YAML][], ...
126	1. Languages for Typed Data
127	- [JSON][], TOML, ...
128	1. Programmable String-ish Languages
129	- Go templates, CMake, autotools/m4, ...
130	1. Programmable Typed Data
131	- Nix expressions, Starlark, Cue, ...
132	1. Internal DSLs in General Purpose Languages
133	- Hay, Guile Scheme for Guix, Ruby blocks, ...
134
135	Excerpts:
136
137	[YAML][] is a data format that is (surprisingly) the de-facto control plane
138	language for the cloud. It's an approximate superset of [JSON][].
139
140	[UCL][] (universal config language) and [HCL][] (HashiCorp config language) are
141	influenced by the [Nginx][] config file syntax. If you can read any of these
142	languages, you can read Hay.
143
144	[Nix][] has a [functional language][nix-lang] to configure Linux distros. In
145	contrast, Hay is multi-paradigm and imperative.
146
147	[nix-lang]: https://wiki.nixos.org/wiki/Nix_Expression_Language
148
149	The [Starlark][] language is a dialect of Python used by the [Bazel][] build
150	system. It uses imperative code to specify build graph variants, and you can
151	use this same pattern in Hay. That is, if statements, for loops, and functions
152	are useful in Starlark and Hay.
153
154	[Ruby][]'s use of [first-class
155	blocks](http://radar.oreilly.com/2014/04/make-magic-with-ruby-dsls.html)
156	inspired YSH. They're used in systems like Vagrant (VM dev environments) and
157	Rake (a build system).
158
159	In [Lisp][], code and data are expressed with the same syntax, and can be
160	interleaved.
161	[G-Expressions](https://guix.gnu.org/manual/en/html_node/G_002dExpressions.html)
162	in Guix use a staged evaluation model, like Hay.
163
164	[YAML]: $xref:YAML
165	[UCL]: https://github.com/vstakhov/libucl
166	[Nginx]: https://en.wikipedia.org/wiki/Nginx
167	[HCL]: https://github.com/hashicorp/hcl
168	[Nix]: $xref:nix
169
170	[Starlark]: https://github.com/bazelbuild/starlark
171	[Bazel]: https://bazel.build/
172
173	[Ruby]: https://www.ruby-lang.org/en/
174	[Lisp]: https://en.wikipedia.org/wiki/Lisp_(programming_language)
175
176
177	### Comparison
178
179	The biggest difference between Hay and [UCL][] / [HCL][] is that it's
180	embedded in a shell. In other words, Hay languages are internal DSLs,
181	while those languages are external.
182
183	This means:
184
185	1. You can interleave shell code with Hay data. We'll discuss the many
186	uses of this below.
187	- On the other hand, it's OK to configure simple systems with plain data
188	like [JSON][]. Hay is for when that stops working!
189	1. Hay isn't a library you embed in another program. Instead, you use
190	Unix-style process-based composition.
191	- For example, [HCL][] is written in Go, which may be hard to embed in a C
192	or Rust program.
193	- Note that a process is a good security boundary. It can be
194	additionally run in an OS container or VM.
195
196	<!--
197	- Code on the outside of Hay blocks may use the ["staged programming" / "graph metaprogramming" pattern][build-ci-comments] mentioned above.
198	- Code on the inside is unevaluated. You can execute it in another
199	context, like a remote machine, Linux container, or virtual machine.
200	-->
201
202	The sections below elaborate on these points.
203
204	[shell-pipelines]: https://www.oilshell.org/blog/2017/01/15.html
205
206	<!--
207	- YSH has an imperative programming model. It's a little like Starlark.
208	- Guile / GNU Make.
209	- Tensorflow.
210	-->
211
212
213	## Overview
214
215	Hay nodes have a regular structure:
216
217	- They start with a "command", which is called the type.
218	- They accept string arguments and block arguments. There must be at
219	least one argument.
220
221	### Two Kinds of Nodes, and Three Kinds of Evaluation
222
223	There are two kinds of node with this structure.
224
225	(1) `SHELL` nodes contain unevaluated code, and their type is ALL CAPS.
226	The code is turned into a string that can be executed elsewhere.
227
228	TASK build {
229	./configure
230	make
231	}
232	# =>
233	# ... {"code_str": " ./configure\n make\n"}
234
235	(2) `Attr` nodes contain data, and their type starts with a capital letter.
236	They eagerly evaluate a block in a new stack frame and turn it into an
237	attributes dict.
238
239	Package cpython {
240	version = '3.9'
241	}
242	# =>
243	# ... {"attrs": {"version": "3.9"}} ...
244
245	These blocks have a special rule to allow bare assignments like `version =
246	'3.9'`. That is, you don't need keywords like `const` or `var`.
247
248	(3) In contrast to these two types of Hay nodes, YSH builtins that take a block
249	usually evaluate it eagerly:
250
251	cd /tmp { # run in a new directory
252	echo $PWD
253	}
254
255	Builtins are spelled with `lower` case letters, so `SHELL` and `Attr` nodes
256	won't be confused with them.
257
258	### Two Stages of Evaluation
259
260	So Hay is designed to be used with a staged evaluation model:
261
262	1. The first stage follows the rules above:
263	- Tree of Hay nodes → [JSON]($xref) + Unevaluated shell.
264	- You can use variables, conditionals, loops, and more.
265	2. Your app or system controls the second stage. You can invoke YSH again to
266	execute shell inside a VM, inside a Linux container, or on a remote machine.
267
268	These two stages conceptually different, but use the same syntax and
269	evaluator! Again, the evaluator runs in a mode where it builds up data
270	rather than executing commands.
271
272	### Result Schema
273
274	Here's a description of the result of Hay evaluation (the first stage).
275
276	# The source may be "cpython.hay"
277	FileResult = (source Str, children List[NodeResult])
278
279	NodeResult =
280	# package cpython { version = '3.9' }
281	Attr (type Str,
282	args List[Str],
283	attrs Map[Str, Any],
284	children List[NodeResult])
285
286	# TASK build { ./configure; make }
287	\| Shell(type Str,
288	args List[Str],
289	location_str Str,
290	location_start_line Int,
291	code_str Str)
292
293
294	Notes:
295
296	- Except for user-defined attributes, the result is statically typed.
297	- Shell nodes are always leaf nodes.
298	- Attr nodes may or may not be leaf nodes.
299
300	## Three Ways to Invoke Hay
301
302	### Inline Hay Has No Restrictions
303
304	You can put Hay blocks and normal shell code in the same file. Retrieve the
305	result of Hay evaluation with the `_hay()` function.
306
307	# myscript.ysh
308
309	hay define Rule
310
311	Rule mylib.o {
312	inputs = ['mylib.c']
313
314	# not recommended, but allowed
315	echo 'hi'
316	ls /tmp/$(whoami)
317	}
318
319	echo 'bye' # other shell code
320
321	const result = _hay()
322	json write (result)
323
324	In this case, there are no restrictions on the commands you can run.
325
326	### In Separate Files
327
328	You can put hay definitions in their own file:
329
330	# my-config.hay
331
332	Rule mylib.o {
333	inputs = ['mylib.c']
334	}
335
336	echo 'hi' # allowed for debugging
337	# ls /tmp/$(whoami) would fail due to restrictions on hay evaluation
338
339	In this case, you can use `echo` and `write`, but the interpreted is
340	restricted (see below).
341
342	Parse it with `parseHay()`, and evaluate it with `evalHay()`:
343
344	# my-evaluator.ysh
345
346	hay define Rule # node types for the file
347	const h = parseHay('build.hay')
348	const result = evalHay(h)
349
350	json write (result)
351	# =>
352	# {
353	# "children": [
354	# { "type": "Rule",
355	# "args": ["mylib.o"],
356	# "attrs": {"inputs": ["mylib.c"]}
357	# }
358	# ]
359	# }
360
361	### In A Block
362
363	Instead of creating separate files, you can also use the `hay eval` builtin:
364
365	hay define Rule
366
367	hay eval :result { # assign to the variable 'result'
368	Rule mylib.o {
369	inputs = ['mylib.c']
370	}
371	}
372
373	json write (result) # same as above
374
375	This is mainly for testing and demos.
376
377	## Security Model: Restricted != Sandboxed
378
379	The "restrictions" are not a security boundary! (They could be, but we're
380	not making promises now.)
381
382	Even with `evalHay()` and `hay eval`, the config file is evaluated in the
383	same interpreter. But the following restrictions apply:
384
385	- External commands aren't allowed
386	- Builtins other than `echo` and `write` aren't allowed
387	- For example, the `.hay` file can't invoke `shopt` to change global shell
388	options
389	- A new stack frame is created, so the `.hay` file can't mutate your locals
390	- However it can still mutate globals with `setglobal`!
391
392	In summary, Hay evaluation is restricted to prevent basic mistakes, but your
393	code isn't completely separate from the evaluated Hay file.
394
395	If you want to evaluate untrusted code, use a separate process, and run it
396	in a container or VM.
397
398	## Reference
399
400	Here is a list of all the mechanisms mentioned.
401
402	### Shell Builtins
403
404	- `hay`
405	- `hay define` to define node types.
406	- `hay pp` to pretty print the node types.
407	- `hay reset` to delete both the node types and the current evaluation
408	result.
409	- `hay eval :result { ... }` to evaluate in restricted mode, and put the
410	result in a variable.
411	- Implementation detail: the `haynode` builtin is run when types like
412	`Package` and `TASK` are invoked. That is, all node types are aliases for
413	this same builtin.
414
415	### Functions
416
417	- `parseHay()` parses a file, just as `bin/ysh` does.
418	- `evalHay()` evaluates the parsed file in restricted mode, like `hay eval`.
419	- `_hay()` retrieves the current result
420	- It's useful interactive debugging.
421	- The name starts with `_` because it's a "register" mutated by the
422	interpreter.
423
424	### Options
425
426	Hay is parsed and evaluated with option group `ysh:all`, which includes
427	`parse_proc` and `parse_equals`.
428
429	<!--
430
431	- The `parse_brace` and `parse_equals` options are what let us inside attribute nodes
432	- `_running_hay`
433
434	-->
435
436
437	## Usage: Interleaving Hay and YSH
438
439	Why would you want to interleave data and code? One reason is to naturally
440	express variants of a configuration. Here are some examples.
441
442	Build variants. There are many variants of the YSH binary:
443
444	- `dbg` and `opt`. the compiler optimization level, and whether debug symbols
445	are included.
446	- `asan` and `ubsan`. Dynamic analysis with Clang sanitizers.
447	- `-D GC_EVERY_ALLOC`. Make a build that helps debug the garbage collector.
448
449	So the Ninja build graph to produce these binaries is shaped similarly, but
450	it varies with compiler and linker flags.
451
452	Service variants. A common problem in distributed systems is how to
453	develop and debug services locally.
454
455	Do your service dependencies live in the cloud, or are they run locally? What
456	about state? Common variants:
457
458	- `local`. Part or all of the service runs locally, so you may pass flags like
459	`--auth-service localhost:8001` to binaries.
460	- `staging`. A complete copy of the service, in a different cloud, with a
461	different database.
462	- `prod`. The live instance running with user data.
463
464	Again, these collections of services are all shaped similarly, but the
465	flags vary based on where binaries are physically running.
466
467	---
468
469	This model can be referred to as ["graph metaprogramming" or "staged
470	programming"][build-ci-comments]. In YSH, it's done with dynamically typed
471	data like integers and dictionaries. In contrast, systems like CMake and
472	autotools are more stringly typed.
473
474	[build-ci-comments]: https://www.oilshell.org/blog/2021/04/build-ci-comments.html
475
476	The following examples are meant to be "evocative"; they're not based on
477	real code. Again, user feedback can improve them!
478
479	### Conditionals
480
481	Conditionals can go on the inside of a block:
482
483	Service auth.example.com { # node taking a block
484	if (variant === 'local') { # condition
485	port = 8001
486	} else {
487	port = 80
488	}
489	}
490
491	Or on the outside:
492
493	Service web { # node
494	root = '/home/www'
495	}
496
497	if (variant === 'local') { # condition
498	Service auth-local { # node
499	port = 8001
500	}
501	}
502
503
504	### Iteration
505
506	Iteration can also go on the inside of a block:
507
508	Rule foo.o { # node
509	inputs = [] # populate with all .cc files except one
510
511	# variables ending with _ are "hidden" from block evaluation
512	for name_ in *.cc {
513	if name_ !== 'skipped.cc' {
514	call inputs->append(name_)
515	}
516	}
517	}
518
519	Or on the outside:
520
521	for name_ in *.cc { # loop
522	Rule $(basename $name_ .cc).o { # node
523	inputs = [name_]
524	}
525	}
526
527
528	### Remove Duplication with `proc`
529
530	Procs can wrap blocks:
531
532	proc myrule(name) {
533
534	# needed for blocks to use variables higher on the stack
535	shopt --set dynamic_scope {
536
537	Rule dbg/$name.o { # node
538	inputs = ["$name.c"]
539	flags = ['-O0']
540	}
541
542	Rule opt/$name.o { # node
543	inputs = ["$name.c"]
544	flags = ['-O2']
545	}
546
547	}
548	}
549
550	myrule foo # call proc
551	myrule bar # call proc
552
553	Or they can be invoked from within blocks:
554
555	proc set-port (port_num; out) {
556	call out->setValue("localhost:$port_num")
557	}
558
559	Service foo { # node
560	set-port 80 :p1 # call proc
561	set-port 81 :p2 # call proc
562	}
563
564	## More Usage Patterns
565
566	### Using YSH for the Second Stage
567
568	The general pattern is:
569
570	./my-evaluator.ysh my-config.hay \| json read :result
571
572	The evaluator does the following:
573
574	1. Sets up the execution context with `hay define`
575	1. Parses `my-config.hay` with `parseHay()`
576	1. Evaluates it with `evalHay()`
577	1. Prints the result as JSON.
578
579	Then a separate YSH processes reads this JSON and executes application code.
580
581	TODO: Show code example.
582
583	### Using Python for the Second Stage
584
585	In Python, you would:
586
587	1. Use the `subprocess` module to invoke `./my-evaluator.ysh my-config.hay`.
588	2. Use the `json` module to parse the result.
589	3. Then execute application code using the data.
590
591	TODO: Show code example.
592
593	### Locating Errors in the Original `.hay` File
594
595	The YSH interpreter has 2 flags starting with `--location` that give you
596	control over error messages.
597
598	ysh --location-str 'foo.hay' --location-start-line 42 -- stage2.ysh
599
600	Set them to the values of fields `location_str` and `location_start_line` in
601	the result of `SHELL` node evaluation.
602
603	### Debian `.d` Dirs
604
605	Debian has a pattern of splitting configuration into a directory of
606	concatenated files. It's easier for shell scripts to add to a directory than
607	add to a file.
608
609	This can be done with an evaluator that simply enumerates all files:
610
611	var results = []
612	for path in myconfig.d/*.hay {
613	const code = parseHay(path)
614	const result = eval(hay)
615	call results->append(result)
616	}
617
618	# Now iterate through results
619
620	### Parallel Loading
621
622	TODO: Example of using `xargs -P` to spawn processes with `parseHay()` and
623	`evalHay()`. Then merge the JSON results.
624
625	## Style
626
627	### Attributes vs. Procs
628
629	Assigning attributes and invoking procs can look similar:
630
631	Package grep {
632	version = '1.0' # An attribute?
633
634	version 1.0 # or call proc 'version'?
635	}
636
637	The first style is better for typed data like integers and dictionaries. The
638	latter style isn't useful here, but it could be if `version 1.0` created
639	complex Hay nodes.
640
641	### Attributes vs. Flags
642
643	Hay nodes shouldn't take flags or `--`. Flags are for key-value pairs, and
644	blocks are better for expressing such data.
645
646	No:
647
648	Package --version 1.0 grep {
649	license = 'GPL'
650	}
651
652	Yes:
653
654	Package grep {
655	version = '1.0'
656	license = 'GPL'
657	}
658
659	### Dicts vs. Blocks
660
661	Superficially, dicts and blocks are similar:
662
663	Package grep {
664	mydict = {name: 'value'} # a dict
665
666	mynode foo { # a node taking a block
667	name = 'value'
668	}
669	}
670
671	Use dicts in cases where you don't know the names or types up front, like
672
673	files = {'README.md': true, '__init__.py': false}
674
675	Use blocks when there's a schema. Blocks are also different because:
676
677	- You can use `if` statements and `for` loops in them.
678	- You can call `TASK build; TASK test` within a block, creating multiple
679	objects of the same type.
680	- Later: custom validation
681
682	### YSH vs. Shell
683
684	Hay files are parsed as YSH, not OSH. That includes `SHELL` nodes:
685
686	TASK build {
687	cp @deps /tmp # YSH splicing syntax
688	}
689
690	If you want to use POSIX shell or bash, use two arguments, the second of which
691	is a multi-line string:
692
693	TASK build '''
694	cp "${deps[@]}" /tmp
695	'''
696
697	The YSH style gives you static parsing, which catches some errors earlier.
698
699	## Future Work
700
701	- `hay proc` for arbitrary schema validation, including JSON schema
702	- Examples of running hay in a secure process / container, in various languages
703	- Sandboxing:
704	- More find-grained rules?
705	- "restricted" could come with a security guarantee. I've avoided making
706	such guarantees, but I think it's possible as YSH matures. The
707	interpreter uses dependency inversion to isolate I/O.
708	- More location info, including the source file.
709
710	[Please send
711	feedback](https://github.com/oilshell/oil/wiki/Where-To-Send-Feedback) about
712	Hay. It will inform and prioritize this work!
713
714	## Links
715
716	- Blog posts tagged #[hay]($blog-tag). Hay is a general mechanism, so it's
717	useful to explain it with concrete examples.
718	- [Data Definition and Code Generation in Tcl](https://trs.jpl.nasa.gov/bitstream/handle/2014/7660/03-1728.pdf) (2003, PDF)
719	- Like Hay, it has the (Type, Name, Attributes) data model.
720	- <https://github.com/oilshell/oil/wiki/Config-Dialect>. Design notes and related links on the wiki.