| 1 | Simple Word Evaluation in Unix Shell
|
| 2 | ====================================
|
| 3 |
|
| 4 | This document describes the YSH word evaluation semantics (`shopt -s
|
| 5 | simple_word_eval`) for experienced shell users. It may also be useful to
|
| 6 | those who want to implement this behavior in another shell.
|
| 7 |
|
| 8 | The main idea is that YSH behaves like a traditional programming language:
|
| 9 |
|
| 10 | 1. It's **parsed** from start to end [in a single pass][parsing-shell].
|
| 11 | 2. It's **evaluated** in a single step too.
|
| 12 |
|
| 13 | That is, parsing and evaluation aren't interleaved, and code and data aren't
|
| 14 | confused.
|
| 15 |
|
| 16 | [parsing-shell]: https://www.oilshell.org/blog/2019/02/07.html
|
| 17 |
|
| 18 | [posix-spec]: https://pubs.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag_02_06
|
| 19 |
|
| 20 |
|
| 21 | <div id="toc">
|
| 22 | </div>
|
| 23 |
|
| 24 | ## An Analogy: Word Expressions Should Be Like Arithmetic Expressions
|
| 25 |
|
| 26 | In YSH, "word expressions" like
|
| 27 |
|
| 28 | $x
|
| 29 | "hello $name"
|
| 30 | $(hostname)
|
| 31 | 'abc'$x${y:-${z//pat/replace}}"$(echo hi)$((a[i] * 3))"
|
| 32 |
|
| 33 | are parsed and evaluated in a straightforward way, like this expression when `x
|
| 34 | == 2`:
|
| 35 |
|
| 36 | ```sh-prompt
|
| 37 | 1 + x / 2 + x * 3 → 8 # Python, JS, Ruby, etc. work this way
|
| 38 | ```
|
| 39 |
|
| 40 | In contrast, in shell, words are "expanded" in multiple stages, like this:
|
| 41 |
|
| 42 | ```sh-prompt
|
| 43 | 1 + "x / 2 + \"x * 3\"" → 8 # Hypothetical, confusing language
|
| 44 | ```
|
| 45 |
|
| 46 | That is, it would be odd if Python looked *inside a program's strings* for
|
| 47 | expressions to evaluate, but that's exactly what shell does! There are
|
| 48 | multiple places where there's a silent `eval`, and you need **quoting** to
|
| 49 | inhibit it. Neglecting this can cause security problems due to confusing code
|
| 50 | and data (links below).
|
| 51 |
|
| 52 | In other words, the **defaults are wrong**. Programmers are surprised by shell's
|
| 53 | behavior, and it leads to incorrect programs.
|
| 54 |
|
| 55 | So in YSH, you can opt out of the multiple "word expansion" stages described in
|
| 56 | the [POSIX shell spec][posix-spec]. Instead, there's only **one stage**:
|
| 57 | evaluation.
|
| 58 |
|
| 59 | ## Design Goals
|
| 60 |
|
| 61 | The new semantics should be easily adoptable by existing shell scripts.
|
| 62 |
|
| 63 | - Importantly, `bin/osh` is POSIX-compatible and runs real [bash]($xref)
|
| 64 | scripts. You can gradually opt into **stricter and saner** behavior with
|
| 65 | `shopt` options (or by running `bin/ysh`). The most important one is
|
| 66 | [simple_word_eval]($help), and the others are listed below.
|
| 67 | - Even after opting in, the new syntax shouldn't break many scripts. If it
|
| 68 | does break, the change to fix it should be small. For example, `echo @foo`
|
| 69 | is not too common, and it can be made bash-compatible by quoting it: `echo
|
| 70 | '@foo'`.
|
| 71 |
|
| 72 | <!--
|
| 73 | It's technically incompatible but I think it will break very few scripts.
|
| 74 |
|
| 75 | -->
|
| 76 |
|
| 77 | ## Examples
|
| 78 |
|
| 79 | In the following examples, the [argv][] command prints the `argv` array it
|
| 80 | receives in a readable format:
|
| 81 |
|
| 82 | ```sh-prompt
|
| 83 | $ argv one "two three"
|
| 84 | ['one', 'two three']
|
| 85 | ```
|
| 86 |
|
| 87 | I also use the YSH [var]($help) keyword for assignments. *(TODO: This could be
|
| 88 | rewritten with shell assignment for the benefit of shell implementers)*
|
| 89 |
|
| 90 | [argv]: $oils-src:spec/bin/argv.py
|
| 91 |
|
| 92 | ### No Implicit Splitting, Dynamic Globbing, or Empty Elision
|
| 93 |
|
| 94 | In YSH, the following constructs always evaluate to **one argument**:
|
| 95 |
|
| 96 | - Variable / "parameter" substitution: `$x`, `${y}`
|
| 97 | - Command sub: `$(echo hi)` or backticks
|
| 98 | - Arithmetic sub: `$(( 1 + 2 ))`
|
| 99 |
|
| 100 |
|
| 101 | <!--
|
| 102 | Related help topics: [command-sub]($help), [var-sub]($help), [arith-sub]($help).
|
| 103 | Not shown: [tilde-sub]($help).
|
| 104 | -->
|
| 105 |
|
| 106 | That is, quotes aren't necessary to avoid:
|
| 107 |
|
| 108 | - **Word Splitting**, which uses `$IFS`.
|
| 109 | - **Empty Elision**. For example, `x=''; ls $x` passes `ls` no arguments.
|
| 110 | - **Dynamic Globbing**. Globs are *dynamic* when the pattern comes from
|
| 111 | program data rather than the source code.
|
| 112 |
|
| 113 | <!-- - Tilde Sub: `~bob/src` -->
|
| 114 |
|
| 115 | Here's an example showing that each construct evaluates to one arg in YSH:
|
| 116 |
|
| 117 | ```sh-prompt
|
| 118 | ysh$ var pic = 'my pic.jpg' # filename with spaces
|
| 119 | ysh$ var empty = ''
|
| 120 | ysh$ var pat = '*.py' # pattern stored in a string
|
| 121 |
|
| 122 | ysh$ argv ${pic} $empty $pat $(cat foo.txt) $((1 + 2))
|
| 123 | ['my pic.jpg', '', '*.py', 'contents of foo.txt', '3']
|
| 124 | ```
|
| 125 |
|
| 126 | In contrast, shell applies splitting, globbing, and empty elision after the
|
| 127 | substitutions. Each of these operations returns an indeterminate number of
|
| 128 | strings:
|
| 129 |
|
| 130 | ```sh-prompt
|
| 131 | sh$ pic='my pic.jpg' # filename with spaces
|
| 132 | sh$ empty=
|
| 133 | sh$ pat='*.py' # pattern stored in a string
|
| 134 |
|
| 135 | sh$ argv ${pic} $empty $pat $(cat foo.txt) $((1 + 2))
|
| 136 | ['my', 'pic.jpg', 'a.py', 'b.py', 'contents', 'of', 'foo.txt', '3']
|
| 137 | ```
|
| 138 |
|
| 139 | To get the desired behavior, you have to use double quotes:
|
| 140 |
|
| 141 | ```sh-prompt
|
| 142 | sh$ argv "${pic}" "$empty" "$pat", "$(cat foo.txt)" "$((1 + 2))"
|
| 143 | ['my pic.jpg', '', '*.py', 'contents of foo.txt', '3']
|
| 144 | ```
|
| 145 |
|
| 146 | ### Splicing, Static Globbing, and Brace Expansion
|
| 147 |
|
| 148 | The constructs in the last section evaluate to a **single argument**. In
|
| 149 | contrast, these three constructs evaluate to **0 to N arguments**:
|
| 150 |
|
| 151 | 1. **Splicing** an array: `"$@"` and `"${myarray[@]}"`
|
| 152 | 2. **Static Globbing**: `echo *.py`. Globs are *static* when they occur in the
|
| 153 | program text.
|
| 154 | 3. **Brace expansion**: `{alice,bob}@example.com`
|
| 155 |
|
| 156 | In YSH, `shopt -s parse_at` enables these shortcuts for splicing:
|
| 157 |
|
| 158 | - `@myarray` for `"${myarray[@]}"`
|
| 159 | - `@ARGV` for `"$@"`
|
| 160 |
|
| 161 | Example:
|
| 162 |
|
| 163 | ```sh-prompt
|
| 164 | ysh$ var myarray = :| 'a b' c | # array with 2 elements
|
| 165 | ysh$ set -- 'd e' f # 2 arguments
|
| 166 |
|
| 167 | ysh$ argv @myarray @ARGV *.py {ian,jack}@sh.com
|
| 168 | ['a b', 'c', 'd e', 'f', 'g.py', 'h.py', 'ian@sh.com', 'jack@sh.com']
|
| 169 | ```
|
| 170 |
|
| 171 | is just like:
|
| 172 |
|
| 173 |
|
| 174 | ```sh-prompt
|
| 175 | bash$ myarray=('a b' c)
|
| 176 | bash$ set -- 'd e' f
|
| 177 |
|
| 178 | bash$ argv "${myarray[@]}" "$@" *.py {ian,jack}@sh.com
|
| 179 | ['a b', 'c', 'd e', 'f', 'g.py', 'h.py', 'ian@sh.com', 'jack@sh.com']
|
| 180 | ```
|
| 181 |
|
| 182 | Unchanged: quotes disable globbing and brace expansion:
|
| 183 |
|
| 184 | ```sh-prompt
|
| 185 | $ echo *.py
|
| 186 | foo.py bar.py
|
| 187 |
|
| 188 | $ echo "*.py" # globbing disabled with quotes
|
| 189 | *.py
|
| 190 |
|
| 191 | $ echo {spam,eggs}.sh
|
| 192 | spam.sh eggs.sh
|
| 193 |
|
| 194 | $ echo "{spam,eggs}.sh" # brace expansion disabled with quotes
|
| 195 | {spam,eggs}.sh
|
| 196 | ```
|
| 197 |
|
| 198 | <!--
|
| 199 | help topics:
|
| 200 |
|
| 201 | - braces
|
| 202 | - glob
|
| 203 | - splice
|
| 204 |
|
| 205 | More:
|
| 206 | - inline-call
|
| 207 |
|
| 208 | -->
|
| 209 |
|
| 210 | ## Where These Rules Apply
|
| 211 |
|
| 212 | These rules apply when a **sequence** of words is being evaluated, exactly as
|
| 213 | in shell:
|
| 214 |
|
| 215 | 1. [Command]($help:simple-command): `echo $x foo`
|
| 216 | 2. [For loop]($help:for): `for i in $x foo; do ...`
|
| 217 | 3. [Array Literals]($help:array): `a=($x foo)` and `var a = :| $x foo |` ([ysh-array]($help))
|
| 218 |
|
| 219 | Shell has other word evaluation contexts like:
|
| 220 |
|
| 221 | ```sh-prompt
|
| 222 | sh$ x="${not_array[@]}"
|
| 223 | sh$ echo hi > "${not_array[@]}"
|
| 224 | ```
|
| 225 |
|
| 226 | which aren't affected by [simple_word_eval]($help).
|
| 227 |
|
| 228 | <!--
|
| 229 | EvalWordSequence
|
| 230 | -->
|
| 231 |
|
| 232 | ## Opt In to the Old Behavior With Explicit Expressions
|
| 233 |
|
| 234 | YSH can express everything that shell can.
|
| 235 |
|
| 236 | - Split with `@[split(mystr, IFS?)]`
|
| 237 | - Glob with `@[glob(mypat)]`
|
| 238 | - Elision with `@[maybe(s)]`
|
| 239 |
|
| 240 | ## More Word Evaluation Issues
|
| 241 |
|
| 242 | ### More `shopt` Options
|
| 243 |
|
| 244 | - [nullglob]($help) - Globs matching nothing don't evaluate to code.
|
| 245 | - [dashglob]($help) is true by default, but **disabled** when YSH is enabled, so that
|
| 246 | files that begin with `-` aren't returned. This avoids [confusing flags and
|
| 247 | files](https://www.oilshell.org/blog/2020/02/dashglob.html).
|
| 248 |
|
| 249 | Strict options cause fatal errors:
|
| 250 |
|
| 251 | - [strict_tilde]($help) - Failed tilde expansions don't evaluate to code.
|
| 252 | - [strict_word_eval]($help) - Invalid slices and invalid UTF-8 aren't ignored.
|
| 253 |
|
| 254 | ### Arithmetic Is Statically Parsed
|
| 255 |
|
| 256 | This is an intentional incompatibility described in the [Known
|
| 257 | Differences](known-differences.html#static-parsing) doc.
|
| 258 |
|
| 259 | <!--
|
| 260 | TODO: also allow
|
| 261 |
|
| 262 | var parts = @[split(x)]
|
| 263 | var python = @[glob('*.py')]
|
| 264 | -->
|
| 265 |
|
| 266 | ## Summary
|
| 267 |
|
| 268 | YSH word evaluation is enabled with `shopt -s simple_word_eval`, and proceeds
|
| 269 | in a single step.
|
| 270 |
|
| 271 | Variable, command, and arithmetic substitutions predictably evaluate to a
|
| 272 | **single argument**, regardless of whether they're empty or have spaces.
|
| 273 | There's no implicit splitting, globbing, or elision of empty words.
|
| 274 |
|
| 275 | You can opt into those behaviors with explicit expressions like
|
| 276 | `@[split(mystr)]`, which evaluates to an array.
|
| 277 |
|
| 278 | YSH also supports shell features that evaluate to **0 to N arguments**:
|
| 279 | splicing, globbing, and brace expansion.
|
| 280 |
|
| 281 | There are other options that "clean up" word evaluation. All options are
|
| 282 | designed to be gradually adopted by other shells, shell scripts, and eventually
|
| 283 | POSIX.
|
| 284 |
|
| 285 | ## Notes
|
| 286 |
|
| 287 | ### Related Documents
|
| 288 |
|
| 289 | - [The Simplest Explanation of
|
| 290 | Oil](http://www.oilshell.org/blog/2020/01/simplest-explanation.html). Some
|
| 291 | color on the rest of the language.
|
| 292 | - [Known Differences Between OSH and Other Shells](known-differences.html).
|
| 293 | Mentioned above: Arithmetic is statically parsed. Arrays and strings are
|
| 294 | kept separate.
|
| 295 | - [OSH Word Evaluation Algorithm][wiki-word-eval] on the Wiki. Informally
|
| 296 | describes the data structures, and describes legacy constructs.
|
| 297 | - [Security implications of forgetting to quote a variable in bash/POSIX
|
| 298 | shells](https://unix.stackexchange.com/questions/171346/security-implications-of-forgetting-to-quote-a-variable-in-bash-posix-shells)
|
| 299 | by Stéphane Chazelas. Describes the "implicit split+glob" operator, which
|
| 300 | YSH word evaluation removes.
|
| 301 | - This is essentially the same [security
|
| 302 | issue](http://www.oilshell.org/blog/2019/01/18.html#a-story-about-a-30-year-old-security-problem)
|
| 303 | I rediscovered in January 2019. It appears in all [ksh]($xref)-derived shells, and some shells
|
| 304 | recently patched it. I wasn't able to exploit in a "real" context;
|
| 305 | otherwise I'd have made more noise about it.
|
| 306 | - Also described by the Fedora Security team: [Defensive Coding: Shell Double Expansion](https://docs.fedoraproject.org/en-US/Fedora_Security_Team/1/html/Defensive_Coding/sect-Defensive_Coding-Shell-Double_Expansion.html)
|
| 307 |
|
| 308 | [wiki-word-eval]: https://github.com/oilshell/oil/wiki/OSH-Word-Evaluation-Algorithm
|
| 309 |
|
| 310 | ### Tip: View the Syntax Tree With `-n`
|
| 311 |
|
| 312 | This gives insight into [how Oils parses shell][parsing-shell]:
|
| 313 |
|
| 314 | ```sh-prompt
|
| 315 | $ osh -n -c 'echo ${x:-default}$(( 1 + 2 ))'
|
| 316 | (C {<echo>}
|
| 317 | {
|
| 318 | (braced_var_sub
|
| 319 | token: <Id.VSub_Name x>
|
| 320 | suffix_op: (suffix_op.Unary op_id:Id.VTest_ColonHyphen arg_word:{<default>})
|
| 321 | )
|
| 322 | (word_part.ArithSub
|
| 323 | anode:
|
| 324 | (arith_expr.Binary
|
| 325 | op_id: Id.Arith_Plus
|
| 326 | left: (arith_expr.ArithWord w:{<Id.Lit_Digits 1>})
|
| 327 | right: (arith_expr.ArithWord w:{<Id.Lit_Digits 2>})
|
| 328 | )
|
| 329 | )
|
| 330 | }
|
| 331 | )
|
| 332 | ```
|
| 333 |
|
| 334 | You can pass `--ast-format text` for more details.
|
| 335 |
|
| 336 | Evaluation of the syntax tree is a single step.
|
| 337 |
|
| 338 |
|
| 339 | <!--
|
| 340 |
|
| 341 | ### Elision Without @[maybe()]
|
| 342 |
|
| 343 | The `@[maybe(s)]` function is a shortcut for something like:
|
| 344 |
|
| 345 | ```
|
| 346 | var x = '' # empty in this case
|
| 347 | var tmp = :| |
|
| 348 | if (x) { # test if string is non-empty
|
| 349 | append $x (tmp) # appends 'x' to the array variable 'tmp'
|
| 350 | }
|
| 351 | ```
|
| 352 |
|
| 353 | This is how it's used:
|
| 354 |
|
| 355 | -->
|