OILS / doc / error-handling.md View on Github | oilshell.org

754 lines, 505 significant
1---
2default_highlighter: oils-sh
3---
4
5YSH Fixes Shell's Error Handling (`errexit`)
6============================================
7
8<style>
9 .faq {
10 font-style: italic;
11 color: purple;
12 }
13
14 /* copied from web/blog.css */
15 .attention {
16 text-align: center;
17 background-color: #DEE;
18 padding: 1px 0.5em;
19
20 /* to match p tag etc. */
21 margin-left: 2em;
22 }
23</style>
24
25YSH is unlike other shells:
26
27- It never silently ignores an error, and it never loses an exit code.
28- There's no reason to write an YSH script without `errexit`, which is on by
29 default.
30
31This document explains how YSH makes these guarantees. We first review shell
32error handling, and discuss its fundamental problems. Then we show idiomatic
33YSH code, and look under the hood at the underlying mechanisms.
34
35[file a bug]: https://github.com/oilshell/oil/issues
36
37<div id="toc">
38</div>
39
40## Review of Shell Error Handling Mechanisms
41
42POSIX shell has fundamental problems with error handling. With `set -e` aka
43`errexit`, you're [damned if you do and damned if you don't][bash-faq].
44
45GNU [bash]($xref) fixes some of the problems, but **adds its own**, e.g. with
46respect to process subs, command subs, and assignment builtins.
47
48YSH fixes all the problems by adding new builtin commands, special variables,
49and global options. But you see a simple interface with `try` and `_status`.
50
51Let's review a few concepts before discussing YSH.
52
53### POSIX Shell
54
55- The special variable `$?` is the exit status of the "last command". It's a
56 number between `0` and `255`.
57- If `errexit` is enabled, the shell will abort if `$?` is nonzero.
58 - This is subject to the *Disabled `errexit` Quirk*, which I describe below.
59
60These mechanisms are fundamentally incomplete.
61
62### Bash
63
64Bash improves error handling for pipelines like `ls /bad | wc`.
65
66- `${PIPESTATUS[@]}` stores the exit codes of all processes in a pipeline.
67- When `set -o pipefail` is enabled, `$?` takes into account every process in a
68 pipeline.
69 - Without this setting, the failure of `ls` would be ignored.
70- `shopt -s inherit_errexit` was introduced in bash 4.4 to re-introduce error
71 handling in command sub child processes. This fixes a bash-specific bug.
72
73But there are still places where bash will lose an exit code.
74
75&nbsp;
76
77## Fundamental Problems
78
79Let's look at **four** fundamental issues with shell error handling. They
80underlie the **nine** [shell pitfalls enumerated in the
81appendix](#list-of-pitfalls).
82
83### When Is `$?` Set?
84
85Each external process and shell builtin has one exit status. But the
86definition of `$?` is obscure: it's tied to the `pipeline` rule in the POSIX
87shell grammar, which does **not** correspond to a single process or builtin.
88
89We saw that `pipefail` fixes one case:
90
91 ls /nonexistent | wc # 2 processes, 2 exit codes, but just one $?
92
93But there are others:
94
95 local x=$(false) # 2 exit codes, but just one $?
96 diff <(sort left) <(sort right) # 3 exit codes, but just one $?
97
98This issue means that shell scripts fundamentally **lose errors**. The
99language is unreliable.
100
101### What Does `$?` Mean?
102
103Each process or builtin decides the meaning of its exit status independently.
104Here are two common choices:
105
1061. **The Failure Paradigm**
107 - `0` for success, or non-zero for an error.
108 - Examples: most shell builtins, `ls`, `cp`, ...
1091. **The Boolean Paradigm**
110 - `0` for true, `1` for false, or a different number like `2` for an error.
111 - Examples: the `test` builtin, `grep`, `diff`, ...
112
113New error handling constructs in YSH deal with this fundamental inconsistency.
114
115### The Meaning of `if`
116
117Shell's `if` statement tests whether a command exits zero or non-zero:
118
119 if grep class *.py; then
120 echo 'found class'
121 else
122 echo 'not found' # is this true?
123 fi
124
125So while you'd expect `if` to work in the boolean paradigm, it's closer to
126the failure paradigm. This means that using `if` with certain commands can
127cause the *Error or False Pitfall*:
128
129 if grep 'class\(' *.py; then # grep syntax error, status 2
130 echo 'found class('
131 else
132 echo 'not found is a lie'
133 fi
134 # => grep: Unmatched ( or \(
135 # => not found is a lie
136
137That is, the `else` clause conflates grep's **error** status 2 and **false**
138status 1.
139
140Strangely enough, I encountered this pitfall while trying to disallow shell's
141error handling pitfalls in YSH! I describe this in another appendix as the
142"[meta pitfall](#the-meta-pitfall)".
143
144### Design Mistake: The Disabled `errexit` Quirk
145
146There's more bad news about the design of shell's `if` statement. It's subject
147to the *Disabled `errexit` Quirk*, which means when you use a **shell function**
148in a conditional context, errors are unexpectedly **ignored**.
149
150That is, while `if ls /tmp` is useful, `if my-ls-function /tmp` should be
151avoided. It yields surprising results.
152
153I call this the *`if myfunc` Pitfall*, and show an example in [the
154appendix](#disabled-errexit-quirk-if-myfunc-pitfall).
155
156We can't fix this decades-old bug in shell. Instead we disallow dangerous code
157with `strict_errexit`, and add new error handling mechanisms.
158
159&nbsp;
160
161## YSH Error Handling: The Big Picture
162
163We've reviewed how POSIX shell and bash work, and showed fundamental problems
164with the shell language.
165
166But when you're using YSH, **you don't have to worry about any of this**!
167
168### YSH Fails On Every Error
169
170This means you don't have to explicitly check for errors. Examples:
171
172 shopt --set ysh:upgrade # Enable good error handling in bin/osh
173 # It's the default in bin/ysh.
174 shopt --set strict_errexit # Disallow bad shell error handling.
175 # Also the default in bin/ysh.
176
177 local date=$(date X) # 'date' failure is fatal
178 # => date: invalid date 'X'
179
180 echo $(date X) # ditto
181
182 echo $(date X) $(ls > F) # 'ls' isn't executed; 'date' fails first
183
184 ls /bad | wc # 'ls' failure is fatal
185
186 diff <(sort A) <(sort B) # 'sort' failure is fatal
187
188On the other hand, you won't experience this problem caused by `pipefail`:
189
190 yes | head # doesn't fail due to SIGPIPE
191
192The details are explained below.
193
194### `try` Handles Command and Expression Errors
195
196You may want to **handle failure** instead of aborting the shell. In this
197case, use the `try` builtin and inspect the `_status` variable it sets.
198
199 try { # try takes a block of commands
200 ls /etc
201 ls /BAD # it stops at the first failure
202 ls /lib
203 } # After try, $? is always 0
204 if (_status !== 0) { # Now check _status
205 echo 'failed'
206 }
207
208Note that:
209
210- The `_status` variable is different than `$?`.
211 - The leading `_` is a PHP-like convention for special variables /
212 "registers" in YSH.
213- Idiomatic YSH programs don't look at `$?`.
214
215You can omit `{ }` when invoking a single command. Here's how to invoke a
216function without the *`if myfunc` Pitfall*:
217
218 try myfunc # Unlike 'myfunc', doesn't abort on error
219 if (_status !== 0) {
220 echo 'failed'
221 }
222
223You also have fine-grained control over every process in a pipeline:
224
225 try {
226 ls /bad | wc
227 }
228 write -- @_pipeline_status # every exit status
229
230And each process substitution:
231
232 try {
233 diff <(sort left.txt) <(sort right.txt)
234 }
235 write -- @_process_sub_status # every exit status
236
237
238&nbsp;
239
240<div class="attention">
241
242See [YSH vs. Shell Idioms > Error Handling](idioms.html#error-handling) for
243more examples.
244
245</div>
246
247&nbsp;
248
249Certain expressions produce fatal errors, like:
250
251 var x = 42 / 0 # divide by zero will abort shell
252
253The `try` builtin also handles them:
254
255 try {
256 var x = 42 / 0
257 }
258 if (_status !== 0) {
259 echo 'divide by zero'
260 }
261
262More examples:
263
264- Index out of bounds `a[i]`
265- Nonexistent key `d->foo` or `d['foo']`.
266
267Such expression evaluation errors result in status `3`, which is an arbitrary non-zero
268status that's not used by other shells. Status `2` is generally for syntax
269errors and status `1` is for most runtime failures.
270
271### `boolstatus` Enforces 0 or 1 Status
272
273The `boolstatus` builtin addresses the *Error or False Pitfall*:
274
275 if boolstatus grep 'class' *.py { # may abort the program
276 echo 'found' # status 0 means 'found'
277 } else {
278 echo 'not found' # status 1 means 'not found'
279 }
280
281Rather than confusing **error** with **false**, `boolstatus` will abort the
282program if `grep` doesn't return 0 or 1.
283
284You can think of this as a shortcut for
285
286 try grep 'class' *.py
287 case $_status {
288 (0) echo 'found'
289 ;;
290 (1) echo 'not found'
291 ;;
292 (*) echo 'fatal'
293 exit $_status
294 ;;
295 }
296
297### FAQ on Language Design
298
299<div class="faq">
300
301Why is there `try` but no `catch`?
302
303</div>
304
305First, it offers more flexibility:
306
307- The handler usually inspects `_status`, but it may also inspect
308 `_pipeline_status` or `_process_sub_status`.
309- The handler may use `case` instead of `if`, e.g. to distinguish true / false
310 / error.
311
312Second, it makes the language smaller:
313
314- `try` / `catch` would require specially parsed keywords. But our `try` is a
315 shell builtin that takes a block, like `cd` or `shopt`.
316- The builtin also lets us write either `try ls` or `try { ls }`, which is hard
317 with a keyword.
318
319Another way to remember this is that there are **three parts** to handling an
320error, each of which has independent choices:
321
3221. Does `try` take a simple command or a block? For example, `try ls` versus
323 `try { ls; var x = 42 / n }`
3242. Which status do you want to inspect?
3253. Inspect it with `if` or `case`? As mentioned, `boolstatus` is a special
326 case of `try / case`.
327
328<div class="faq">
329
330Why is `_status` different from `$?`
331
332</div>
333
334This avoids special cases in the interpreter for `try`, which is again a
335builtin that takes a block.
336
337The exit status of `try` is always `0`. If it returned a non-zero status, the
338`errexit` rule would trigger, and you wouldn't be able to handle the error!
339
340Generally, [errors occur *inside* blocks, not
341outside](proc-block-func.html#errors).
342
343Again, idiomatic YSH scripts never look at `$?`, which is only used to trigger
344shell's `errexit` rule. Instead they invoke `try` and inspect `_status` when
345they want to handle errors.
346
347<div class="faq">
348
349Why `boolstatus`? Can't you just change what `if` means in YSH?
350
351</div>
352
353I've learned the hard way that when there's a shell **semantics** change, there
354must be a **syntax** change. In general, you should be able to read code on
355its own, without context.
356
357Readers shouldn't have to constantly look up whether `ysh:upgrade` is on. There
358are some cases where this is necessary, but it should be minimized.
359
360Also, both `if foo` and `if boolstatus foo` are useful in idiomatic YSH code.
361
362&nbsp;
363
364<div class="attention">
365
366**Most users can skip to [the summary](#summary).** You don't need to know all
367the details to use YSH.
368
369</div>
370
371&nbsp;
372
373## Reference: Global Options
374
375
376Under the hood, we implement the `errexit` option from POSIX, bash options like
377`pipefail` and `inherit_errexit`, and add more options of our
378own. They're all hidden behind [option groups](options.html) like `strict:all`
379and `ysh:upgrade`.
380
381The following sections explain new YSH options.
382
383### `command_sub_errexit` Adds More Errors
384
385In all Bourne shells, the status of command subs is lost, so errors are ignored
386(details in the [appendix](#quirky-behavior-of)). For example:
387
388 echo $(date X) $(date Y) # 2 failures, both ignored
389 echo # program continues
390
391The `command_sub_errexit` option makes both `date` invocations an an error.
392The status `$?` of the parent `echo` command will be `1`, so if `errexit` is
393on, the shell will abort.
394
395(Other shells should implement `command_sub_errexit`!)
396
397### `process_sub_fail` Is Analogous to `pipefail`
398
399Similarly, in this example, `sort` will fail if the file doesn't exist.
400
401 diff <(sort left.txt) <(sort right.txt) # any failures are ignored
402
403But there's no way to see this error in bash. YSH adds `process_sub_fail`,
404which folds the failure into `$?` so `errexit` can do its job.
405
406You can also inspect the special `_process_sub_status` array variable to
407implement custom error logic.
408
409### `strict_errexit` Flags Two Problems
410
411Like other `strict_*` options, YSH `strict_errexit` improves your shell
412programs, even if you run them under another shell like [bash]($xref)! It's
413like a linter *at runtime*, so it can catch things that [ShellCheck][] can't.
414
415[ShellCheck]: https://www.shellcheck.net/
416
417`strict_errexit` disallows code that exhibits these problems:
418
4191. The `if `myfunc` Pitfall
4201. The `local x=$(false)` Pitfall
421
422See the appendix for examples of each.
423
424#### Rules to Prevent the `if myfunc` Pitfall
425
426In any conditional context, `strict_errexit` disallows:
427
4281. All commands except `((`, `[[`, and some simple commands (e.g. `echo foo`).
429 - Detail: `! ls` is considered a pipeline in the shell grammar. We have to
430 allow it, while disallowing `ls | grep foo`.
4312. Function/proc invocations (which are a special case of simple
432 commands.)
4333. Command sub and process sub (`shopt --unset allow_csub_psub`)
434
435This means that you should check the exit status of functions and pipeline
436differently. See [Does a Function
437Succeed?](idioms.html#does-a-function-succeed), [Does a Pipeline
438Succeed?](idioms.html#does-a-pipeline-succeed), and other [YSH vs. Shell
439Idioms](idioms.html).
440
441#### Rule to Prevent the `local x=$(false)` Pitfall
442
443- Command Subs and process subs are disallowed in assignment builtins: `local`,
444 `declare` aka `typeset`, `readonly`, and `export`.
445
446No:
447
448 local x=$(false)
449
450Yes:
451
452 var x = $(false) # YSH style
453
454 local x # Shell style
455 x=$(false)
456
457### `sigpipe_status_ok` Ignores an Issue With `pipefail`
458
459When you turn on `pipefail`, you may inadvertently run into this behavior:
460
461 yes | head
462 # => y
463 # ...
464
465 echo ${PIPESTATUS[@]}
466 # => 141 0
467
468That is, `head` closes the pipe after 10 lines, causing the `yes` command to
469**fail** with `SIGPIPE` status `141`.
470
471This error shouldn't be fatal, so OSH has a `sigpipe_status_ok` option, which
472is on by default in YSH.
473
474### `verbose_errexit`
475
476When `verbose_errexit` is on, the shell prints errors to `stderr` when the
477`errexit` rule is triggered.
478
479### FAQ on Options
480
481<div class="faq">
482
483Why is there no `_command_sub_status`? And why is `command_sub_errexit` named
484differently than `process_sub_fail` and `pipefail`?
485
486</div>
487
488Command subs are executed **serially**, while process subs and pipeline parts
489run **in parallel**.
490
491So a command sub can "abort" its parent command, setting `$?` immediately.
492The parallel constructs must wait until all parts are done and save statuses in
493an array. Afterward, they determine `$?` based on the value of `pipefail` and
494`process_sub_fail`.
495
496<div class="faq">
497
498Why are `strict_errexit` and `command_sub_errexit` different options?
499
500</div>
501
502Because `shopt --set strict:all` can be used to improve scripts that are run
503under other shells like [bash]($xref). It's like a runtime linter that
504disallows dangerous constructs.
505
506On the other hand, if you write code with `command_sub_errexit` on, it's
507impossible to get the same failures under bash. So `command_sub_errexit` is
508not a `strict_*` option, and it's meant for code that runs only under YSH.
509
510<div class="faq">
511
512What's the difference between bash's `inherit_errexit` and YSH
513`command_sub_errexit`? Don't they both relate to command subs?
514
515</div>
516
517- `inherit_errexit` enables failure in the **child** process running the
518 command sub.
519- `command_sub_errexit` enables failure in the **parent** process, after the
520 command sub has finished.
521
522&nbsp;
523
524## Summary
525
526YSH uses three mechanisms to fix error handling once and for all.
527
528It has two new **builtins** that relate to errors:
529
5301. `try` lets you explicitly handle errors when `errexit` is on.
5311. `boolstatus` enforces a true/false meaning. (This builtin is less common).
532
533It has three **special variables**:
534
5351. The `_status` integer, which is set by `try`.
536 - Remember that it's distinct from `$?`, and that idiomatic YSH programs
537 don't use `$?`.
5381. The `_pipeline_status` array (another name for bash's `PIPESTATUS`)
5391. The `_process_sub_status` array for process substitutions.
540
541Finally, it supports all of these **global options**:
542
543- From POSIX shell:
544 - `errexit`
545- From [bash]($xref):
546 - `pipefail`
547 - `inherit_errexit` aborts the child process of a command sub.
548- New:
549 - `command_sub_errexit` aborts the parent process immediately after a failed
550 command sub.
551 - `process_sub_fail` is analogous to `pipefail`.
552 - `strict_errexit` flags two common problems.
553 - `sigpipe_status_ok` ignores a spurious "broken pipe" failure.
554 - `verbose_errexit` controls whether error messages are printed.
555
556When using `bin/osh`, set all options at once with `shopt --set ysh:upgrade
557strict:all`. Or use `bin/ysh`, where they're set by default.
558
559<!--
560Related 2020 blog post [Reliable Error
561Handling](https://www.oilshell.org/blog/2020/10/osh-features.html#reliable-error-handling).
562-->
563
564
565## Related Docs
566
567- [YSH vs. Shell Idioms](idioms.html) shows more examples of `try` and `boolstatus`.
568- [Shell Idioms](shell-idioms.html) has a section on fixing `strict_errexit`
569 problems in Bourne shell.
570
571Good articles on `errexit`:
572
573- Bash FAQ: [Why doesn't `set -e` do what I expected?][bash-faq]
574- [Bash: Error Handling](http://fvue.nl/wiki/Bash:_Error_handling) from
575 `fvue.nl`
576
577[bash-faq]: http://mywiki.wooledge.org/BashFAQ/105
578
579Spec Test Suites:
580
581- <https://www.oilshell.org/release/latest/test/spec.wwz/survey/errexit.html>
582- <https://www.oilshell.org/release/latest/test/spec.wwz/survey/errexit-oil.html>
583
584These docs aren't about error handling, but they're also painstaking
585backward-compatible overhauls of shell!
586
587- [Simple Word Evaluation in Unix Shell](simple-word-eval.html)
588- [Egg Expressions (YSH Regexes)](eggex.html)
589
590For reference, this work on error handling was described in [Four Features That
591Justify a New Unix
592Shell](https://www.oilshell.org/blog/2020/10/osh-features.html) (October 2020).
593Since then, we changed `try` and `_status` to be more powerful and general.
594
595&nbsp;
596
597## Appendices
598
599### List Of Pitfalls
600
601We mentioned some of these pitfalls:
602
6031. The `if myfunc` Pitfall, caused by the Disabled `errexit` Quirk (`strict_errexit`)
6041. The `local x=$(false)` Pitfall (`strict_errexit`)
6051. The Error or False Pitfall (`boolstatus`, `try` / `case`)
606 - Special case: When the child process is another instance of the shell, the
607 Meta Pitfall is possible.
6081. The Process Sub Pitfall (`process_sub_fail` and `_process_sub_status`)
6091. The `yes | head` Pitfall (`sigpipe_status_ok`)
610
611There are two pitfalls related to command subs:
612
6136. The `echo $(false)` Pitfall (`command_sub_errexit`)
6146. Bash's `inherit_errexit` pitfall.
615 - As mentioned, this bash 4.4 option fixed a bug in earlier versions of
616 bash. YSH reimplements it and turns it on by default.
617
618Here are two more pitfalls that don't require changes to YSH:
619
6208. The Trailing `&&` Pitfall
621 - When `test -d /bin && echo found` is at the end of a function, the exit
622 code is surprising.
623 - Solution: always use `if` rather than `&&`.
624 - More reasons: the `if` is easier to read, and `&&` isn't useful when
625 `errexit` is on.
6268. The surprising return value of `(( i++ ))`, `let`, `expr`, etc.
627 - Solution: Use `i=$((i + 1))`, which is valid POSIX shell.
628 - In YSH, use `setvar i += 1`.
629
630#### Example of `inherit_errexit` Pitfall
631
632In bash, `errexit` is disabled in command sub child processes:
633
634 set -e
635 shopt -s inherit_errexit # needed to avoid 'touch two'
636 echo $(touch one; false; touch two)
637
638Without the option, it will touch both files, even though there is a failure
639`false` after the first.
640
641#### Bash has a grammatical quirk with `set -o failglob`
642
643This isn't a pitfall, but a quirk that also relates to errors and shell's
644**grammar**. Recall that the definition of `$?` is tied to the grammar.
645
646Consider this program:
647
648 set -o failglob
649 echo *.ZZ # no files match
650 echo status=$? # show failure
651 # => status=1
652
653This is the same program with a newline replaced by a semicolon:
654
655 set -o failglob
656
657 # Surprisingly, bash doesn't execute what's after ;
658 echo *.ZZ; echo status=$?
659 # => (no output)
660
661But it behaves differently. This is because newlines and semicolons are handled
662in different **productions of the grammar**, and produce distinct syntax trees.
663
664(A related quirk is that this same difference can affect the number of
665processes that shells start!)
666
667### Disabled `errexit` Quirk / `if myfunc` Pitfall
668
669This quirk is a bad interaction between the `if` statement, shell functions,
670and `errexit`. It's a **mistake** in the design of the shell language.
671Example:
672
673 set -o errexit # don't ignore errors
674
675 myfunc() {
676 ls /bad # fails with status 1
677 echo 'should not get here'
678 }
679
680 myfunc # Good: script aborts before echo
681 # => ls: '/bad': no such file or directory
682
683 if myfunc; then # Surprise! It behaves differently in a condition.
684 echo OK
685 fi
686 # => ls: '/bad': no such file or directory
687 # => should not get here
688
689We see "should not get here" because the shell **silently disables** `errexit`
690while executing the condition of `if`. This relates to the fundamental
691problems above:
692
6931. Does the function use the failure paradigm or the boolean paradigm?
6942. `if` tests a single exit status, but every command in a function has an exit
695 status. Which one should we consider?
696
697This quirk occurs in all **conditional contexts**:
698
6991. The condition of the `if`, `while`, and `until` constructs
7002. A command/pipeline prefixed by `!` (negation)
7013. Every clause in `||` and `&&` except the last.
702
703### The Meta Pitfall
704
705I encountered the *Error or False Pitfall* while trying to disallow other error
706handling pitfalls! The *meta pitfall* arises from a combination of the issues
707discussed:
708
7091. The `if` statement tests for zero or non-zero status.
7101. The condition of an `if` may start child processes. For example, in `if
711 myfunc | grep foo`, the `myfunc` invocation must be run in a subshell.
7121. You may want an external process to use the **boolean paradigm**, and
713 that includes **the shell itself**. When any of the `strict_` options
714 encounters bad code, it aborts the shell with **error** status `1`, not
715 boolean **false** `1`.
716
717The result of this fundamental issue is that `strict_errexit` is quite strict.
718On the other hand, the resulting style is straightforward and explicit.
719Earlier attempts allowed code that is too subtle.
720
721### Quirky Behavior of `$?`
722
723This is a different way of summarizing the information above.
724
725Simple commands have an obvious behavior:
726
727 echo hi # $? is 0
728 false # $? is 1
729
730But the parent process loses errors from failed command subs:
731
732 echo $(false) # $? is 0
733 # YSH makes it fail with command_sub_errexit
734
735Surprisingly, bare assignments take on the value of any command subs:
736
737 x=$(false) # $? is 1 -- we did NOT lose the exit code
738
739But assignment builtins have the problem again:
740
741 local x=$(false) # $? is 0 -- exit code is clobbered
742 # disallowed by YSH strict_errexit
743
744So shell is confusing and inconsistent, but YSH fixes all these problems. You
745never lose the exit code of `false`.
746
747
748&nbsp;
749
750## Acknowledgments
751
752- Thank you to `ca2013` for extensive review and proofreading of this doc.
753
754