| 1 | ---
|
| 2 | in_progress: yes
|
| 3 | body_css_class: width40 help-body
|
| 4 | default_highlighter: oils-sh
|
| 5 | preserve_anchor_case: yes
|
| 6 | ---
|
| 7 |
|
| 8 | YSH Expression Language
|
| 9 | ===
|
| 10 |
|
| 11 | This chapter in the [Oils Reference](index.html) describes the YSH expression
|
| 12 | language, which includes [Egg Expressions]($xref:eggex).
|
| 13 |
|
| 14 | <div id="toc">
|
| 15 | </div>
|
| 16 |
|
| 17 | ## Literals
|
| 18 |
|
| 19 | ### bool-literal
|
| 20 |
|
| 21 | YSH uses JavaScript-like spellings for these three "atoms":
|
| 22 |
|
| 23 | true false null
|
| 24 |
|
| 25 | Note that the empty string is a good "special" value in some cases. The `null`
|
| 26 | value can't be interpolated into words.
|
| 27 |
|
| 28 | ### int-literal
|
| 29 |
|
| 30 | var myint = 42
|
| 31 | var myfloat = 3.14
|
| 32 | var float2 = 1e100
|
| 33 |
|
| 34 | ### rune-literal
|
| 35 |
|
| 36 | #'a' #'_' \n \\ \u{3bc}
|
| 37 |
|
| 38 | ### ysh-string
|
| 39 |
|
| 40 | Double quoted strings are identical to shell:
|
| 41 |
|
| 42 | var dq = "hello $world and $(hostname)"
|
| 43 |
|
| 44 | Single quoted strings may be raw:
|
| 45 |
|
| 46 | var s = r'line\n' # raw string means \n is literal, NOT a newline
|
| 47 |
|
| 48 | Or escaped *J8 strings*:
|
| 49 |
|
| 50 | var s = u'line\n \u{3bc}' # unicode string means \n is a newline
|
| 51 | var s = b'line\n \u{3bc} \yff' # same thing, but also allows bytes
|
| 52 |
|
| 53 | Both `u''` and `b''` strings evaluate to the single `Str` type. The difference
|
| 54 | is that `b''` strings allow the `\yff` byte escape.
|
| 55 |
|
| 56 | ---
|
| 57 |
|
| 58 | There's no way to express a single quote in raw strings. Use one of the other
|
| 59 | forms instead:
|
| 60 |
|
| 61 | var sq = "single quote: ' "
|
| 62 | var sq = u'single quote: \' '
|
| 63 |
|
| 64 | Sometimes you can omit the `r`, e.g. where there are no backslashes and thus no
|
| 65 | ambiguity:
|
| 66 |
|
| 67 | echo 'foo'
|
| 68 | echo r'foo' # same thing
|
| 69 |
|
| 70 | The `u''` and `b''` strings are called *J8 strings* because the syntax in YSH
|
| 71 | **code** matches JSON-like **data**.
|
| 72 |
|
| 73 | var strU = u'mu = \u{3bc}' # J8 string with escapes
|
| 74 | var strB = b'bytes \yff' # J8 string that can express byte strings
|
| 75 |
|
| 76 | More examples:
|
| 77 |
|
| 78 | var myRaw = r'[a-z]\n' # raw strings are useful for regexes (not
|
| 79 | # eggexes)
|
| 80 |
|
| 81 | ### triple-quoted
|
| 82 |
|
| 83 | Triple-quoted string literals have leading whitespace stripped on each line.
|
| 84 | They come in the same variants:
|
| 85 |
|
| 86 | var dq = """
|
| 87 | hello $world and $(hostname)
|
| 88 | no leading whitespace
|
| 89 | """
|
| 90 |
|
| 91 | var myRaw = r'''
|
| 92 | raw string
|
| 93 | no leading whitespace
|
| 94 | '''
|
| 95 |
|
| 96 | var strU = u'''
|
| 97 | string that happens to be unicode \u{3bc}
|
| 98 | no leading whitespace
|
| 99 | '''
|
| 100 |
|
| 101 | var strB = b'''
|
| 102 | string that happens to be bytes \u{3bc} \yff
|
| 103 | no leading whitespace
|
| 104 | '''
|
| 105 |
|
| 106 | Again, you can omit the `r` prefix if there's no backslash, because it's not
|
| 107 | ambiguous:
|
| 108 |
|
| 109 | var myRaw = '''
|
| 110 | raw string
|
| 111 | no leading whitespace
|
| 112 | '''
|
| 113 |
|
| 114 | ### str-template
|
| 115 |
|
| 116 | String templates use the same syntax as double-quoted strings:
|
| 117 |
|
| 118 | var mytemplate = ^"name = $name, age = $age"
|
| 119 |
|
| 120 | Related topics:
|
| 121 |
|
| 122 | - [Str => replace](chap-type-method.html#replace)
|
| 123 | - [ysh-string](chap-expr-lang.html#ysh-string)
|
| 124 |
|
| 125 | ### list-literal
|
| 126 |
|
| 127 | Lists have a Python-like syntax:
|
| 128 |
|
| 129 | var mylist = ['one', 'two', 3]
|
| 130 |
|
| 131 | And a shell-like syntax:
|
| 132 |
|
| 133 | var list2 = %| one two |
|
| 134 |
|
| 135 | The shell-like syntax accepts the same syntax that a command can:
|
| 136 |
|
| 137 | ls $mystr @ARGV *.py {foo,bar}@example.com
|
| 138 |
|
| 139 | # Rather than executing ls, evaluate and store words
|
| 140 | var cmd = :| ls $mystr @ARGV *.py {foo,bar}@example.com |
|
| 141 |
|
| 142 | ### dict-literal
|
| 143 |
|
| 144 | {name: 'value'}
|
| 145 |
|
| 146 | ### range
|
| 147 |
|
| 148 | A range is a sequence of numbers that can be iterated over:
|
| 149 |
|
| 150 | for i in (0 .. 3) {
|
| 151 | echo $i
|
| 152 | }
|
| 153 | => 0
|
| 154 | => 1
|
| 155 | => 2
|
| 156 |
|
| 157 | As with slices, the last number isn't included. Idiom to iterate from 1 to n:
|
| 158 |
|
| 159 | for i in (1 .. n+1) {
|
| 160 | echo $i
|
| 161 | }
|
| 162 |
|
| 163 | ### block-literal
|
| 164 |
|
| 165 | var myblock = ^(echo $PWD)
|
| 166 |
|
| 167 | ### expr-lit
|
| 168 |
|
| 169 | var myexpr = ^[1 + 2*3]
|
| 170 |
|
| 171 | ## Operators
|
| 172 |
|
| 173 | <h3 id="concat">concat <code>++</code></h3>
|
| 174 |
|
| 175 | The concatenation operator works on strings:
|
| 176 |
|
| 177 | var s = 'hello'
|
| 178 | var t = s ++ ' world'
|
| 179 | = t
|
| 180 | (Str) "hello world"
|
| 181 |
|
| 182 | and lists:
|
| 183 |
|
| 184 | var L = ['one', 'two']
|
| 185 | var M = L ++ ['three', '4']
|
| 186 | = M
|
| 187 | (List) ["one", "two", "three", "4"]
|
| 188 |
|
| 189 | String interpolation can be nicer than `++`:
|
| 190 |
|
| 191 | var t2 = "${s} world" # same as t
|
| 192 |
|
| 193 | Likewise, splicing lists can be nicer:
|
| 194 |
|
| 195 | var M2 = :| @L three 4 | # same as M
|
| 196 |
|
| 197 | ### ysh-compare
|
| 198 |
|
| 199 | a == b # Python-like equality, no type conversion
|
| 200 | 3 ~== 3.0 # True, type conversion
|
| 201 | 3 ~== '3' # True, type conversion
|
| 202 | 3 ~== '3.0' # True, type conversion
|
| 203 |
|
| 204 | ### ysh-logical
|
| 205 |
|
| 206 | not and or
|
| 207 |
|
| 208 | Note that these are distinct from `! && ||`.
|
| 209 |
|
| 210 | ### ysh-arith
|
| 211 |
|
| 212 | YSH supports most of the arithmetic operators from Python. Notably, `/` and `%`
|
| 213 | differ from Python as [they round toward zero, not negative
|
| 214 | infinity](https://www.oilshell.org/blog/2024/03/release-0.21.0.html#integers-dont-do-whatever-python-or-c-does).
|
| 215 |
|
| 216 | Use `+ - *` for `Int` or `Float` addition, subtraction and multiplication. If
|
| 217 | any of the operands are `Float`s, then the output will also be a `Float`.
|
| 218 |
|
| 219 | Use `/` and `//` for `Float` division and `Int` division, respectively. `/`
|
| 220 | will _always_ result in a `Float`, meanwhile `//` will _always_ result in an
|
| 221 | `Int`.
|
| 222 |
|
| 223 | = 1 / 2 # => (Float) 0.5
|
| 224 | = 1 // 2 # => (Int) 0
|
| 225 |
|
| 226 | Use `%` to compute the _remainder_ of integer division. The left operand must
|
| 227 | be an `Int` and the right a _positive_ `Int`.
|
| 228 |
|
| 229 | = 1 % 2 # -> (Int) 1
|
| 230 | = -4 % 2 # -> (Int) 0
|
| 231 |
|
| 232 | Use `**` for exponentiation. The left operand must be an `Int` and the right a
|
| 233 | _positive_ `Int`.
|
| 234 |
|
| 235 | All arithmetic operators may coerce either of their operands from strings to a
|
| 236 | number, provided those strings are formatted as numbers.
|
| 237 |
|
| 238 | = 10 + '1' # => (Int) 11
|
| 239 |
|
| 240 | Operators like `+ - * /` will coerce strings to _either_ an `Int` or `Float`.
|
| 241 | However, operators like `// ** %` and bit shifts will coerce strings _only_ to
|
| 242 | an `Int`.
|
| 243 |
|
| 244 | = '1.14' + '2' # => (Float) 3.14
|
| 245 | = '1.14' % '2' # Type Error: Left operand is a Str
|
| 246 |
|
| 247 | ### ysh-bitwise
|
| 248 |
|
| 249 | ~ & | ^
|
| 250 |
|
| 251 | ### ysh-ternary
|
| 252 |
|
| 253 | Like Python:
|
| 254 |
|
| 255 | display = 'yes' if len(s) else 'empty'
|
| 256 |
|
| 257 | ### ysh-index
|
| 258 |
|
| 259 | Like Python:
|
| 260 |
|
| 261 | myarray[3]
|
| 262 | mystr[3]
|
| 263 |
|
| 264 | TODO: Does string indexing give you an integer back?
|
| 265 |
|
| 266 | ### ysh-slice
|
| 267 |
|
| 268 | Like Python:
|
| 269 |
|
| 270 | myarray[1 : -1]
|
| 271 | mystr[1 : -1]
|
| 272 |
|
| 273 | ### func-call
|
| 274 |
|
| 275 | Like Python:
|
| 276 |
|
| 277 | f(x, y)
|
| 278 |
|
| 279 | ### thin-arrow
|
| 280 |
|
| 281 | The thin arrow is for mutating methods:
|
| 282 |
|
| 283 | var mylist = ['bar']
|
| 284 | call mylist->pop()
|
| 285 |
|
| 286 | <!--
|
| 287 | TODO
|
| 288 | var mydict = {name: 'foo'}
|
| 289 | call mydict->erase('name')
|
| 290 | -->
|
| 291 |
|
| 292 | ### fat-arrow
|
| 293 |
|
| 294 | The fat arrow is for transforming methods:
|
| 295 |
|
| 296 | if (s => startsWith('prefix')) {
|
| 297 | echo 'yes'
|
| 298 | }
|
| 299 |
|
| 300 | If the method lookup on `s` fails, it looks for free functions. This means it
|
| 301 | can be used for "chaining" transformations:
|
| 302 |
|
| 303 | var x = myFunc() => list() => join()
|
| 304 |
|
| 305 | ### match-ops
|
| 306 |
|
| 307 | YSH has four pattern matching operators: `~ !~ ~~ !~~`.
|
| 308 |
|
| 309 | Does string match an **eggex**?
|
| 310 |
|
| 311 | var filename = 'x42.py'
|
| 312 | if (filename ~ / d+ /) {
|
| 313 | echo 'number'
|
| 314 | }
|
| 315 |
|
| 316 | Does a string match a POSIX regular expression (ERE syntax)?
|
| 317 |
|
| 318 | if (filename ~ '[[:digit:]]+') {
|
| 319 | echo 'number'
|
| 320 | }
|
| 321 |
|
| 322 | Negate the result with the `!~` operator:
|
| 323 |
|
| 324 | if (filename !~ /space/ ) {
|
| 325 | echo 'no space'
|
| 326 | }
|
| 327 |
|
| 328 | if (filename !~ '[[:space:]]' ) {
|
| 329 | echo 'no space'
|
| 330 | }
|
| 331 |
|
| 332 | Does a string match a **glob**?
|
| 333 |
|
| 334 | if (filename ~~ '*.py') {
|
| 335 | echo 'Python'
|
| 336 | }
|
| 337 |
|
| 338 | if (filename !~~ '*.py') {
|
| 339 | echo 'not Python'
|
| 340 | }
|
| 341 |
|
| 342 | Take care not to confuse glob patterns and regular expressions.
|
| 343 |
|
| 344 | - Related doc: [YSH Regex API](../ysh-regex-api.html)
|
| 345 |
|
| 346 | ## Eggex
|
| 347 |
|
| 348 | ### re-literal
|
| 349 |
|
| 350 | An eggex literal looks like this:
|
| 351 |
|
| 352 | / expression ; flags ; translation preference /
|
| 353 |
|
| 354 | The flags and translation preference are both optional.
|
| 355 |
|
| 356 | Examples:
|
| 357 |
|
| 358 | var pat = / d+ / # => [[:digit:]]+
|
| 359 |
|
| 360 | You can specify flags passed to libc `regcomp()`:
|
| 361 |
|
| 362 | var pat = / d+ ; reg_icase reg_newline /
|
| 363 |
|
| 364 | You can specify a translation preference after a second semi-colon:
|
| 365 |
|
| 366 | var pat = / d+ ; ; ERE /
|
| 367 |
|
| 368 | Right now the translation preference does nothing. It could be used to
|
| 369 | translate eggex to PCRE or Python syntax.
|
| 370 |
|
| 371 | - Related doc: [Egg Expressions](../eggex.html)
|
| 372 |
|
| 373 | ### re-primitive
|
| 374 |
|
| 375 | There are two kinds of eggex primitives.
|
| 376 |
|
| 377 | "Zero-width assertions" match a position rather than a character:
|
| 378 |
|
| 379 | %start # translates to ^
|
| 380 | %end # translates to $
|
| 381 |
|
| 382 | Literal characters appear within **single** quotes:
|
| 383 |
|
| 384 | 'oh *really*' # translates to regex-escaped string
|
| 385 |
|
| 386 | Double-quoted strings are **not** eggex primitives. Instead, you can use
|
| 387 | splicing of strings:
|
| 388 |
|
| 389 | var dq = "hi $name"
|
| 390 | var eggex = / @dq /
|
| 391 |
|
| 392 | ### class-literal
|
| 393 |
|
| 394 | An eggex character class literal specifies a set. It can have individual
|
| 395 | characters and ranges:
|
| 396 |
|
| 397 | [ 'x' 'y' 'z' a-f A-F 0-9 ] # 3 chars, 3 ranges
|
| 398 |
|
| 399 | Omit quotes on ASCII characters:
|
| 400 |
|
| 401 | [ x y z ] # avoid typing 'x' 'y' 'z'
|
| 402 |
|
| 403 | Sets of characters can be written as trings
|
| 404 |
|
| 405 | [ 'xyz' ] # any of 3 chars, not a sequence of 3 chars
|
| 406 |
|
| 407 | Backslash escapes are respected:
|
| 408 |
|
| 409 | [ \\ \' \" \0 ]
|
| 410 | [ \xFF \u0100 ]
|
| 411 |
|
| 412 | Splicing:
|
| 413 |
|
| 414 | [ @str_var ]
|
| 415 |
|
| 416 | Negation always uses `!`
|
| 417 |
|
| 418 | ![ a-f A-F 'xyz' @str_var ]
|
| 419 |
|
| 420 | ### named-class
|
| 421 |
|
| 422 | Perl-like shortcuts for sets of characters:
|
| 423 |
|
| 424 | [ dot ] # => .
|
| 425 | [ digit ] # => [[:digit:]]
|
| 426 | [ space ] # => [[:space:]]
|
| 427 | [ word ] # => [[:alpha:]][[:digit:]]_
|
| 428 |
|
| 429 | Abbreviations:
|
| 430 |
|
| 431 | [ d s w ] # Same as [ digit space word ]
|
| 432 |
|
| 433 | Valid POSIX classes:
|
| 434 |
|
| 435 | alnum cntrl lower space
|
| 436 | alpha digit print upper
|
| 437 | blank graph punct xdigit
|
| 438 |
|
| 439 | Negated:
|
| 440 |
|
| 441 | !digit !space !word
|
| 442 | !d !s !w
|
| 443 | !alnum # etc.
|
| 444 |
|
| 445 | ### re-repeat
|
| 446 |
|
| 447 | Eggex repetition looks like POSIX syntax:
|
| 448 |
|
| 449 | / 'a'? / # zero or one
|
| 450 | / 'a'* / # zero or more
|
| 451 | / 'a'+ / # one or more
|
| 452 |
|
| 453 | Counted repetitions:
|
| 454 |
|
| 455 | / 'a'{3} / # exactly 3 repetitions
|
| 456 | / 'a'{2,4} / # between 2 to 4 repetitions
|
| 457 |
|
| 458 | ### re-compound
|
| 459 |
|
| 460 | Sequence expressions with a space:
|
| 461 |
|
| 462 | / word digit digit / # Matches 3 characters in sequence
|
| 463 | # Examples: a42, b51
|
| 464 |
|
| 465 | (Compare `/ [ word digit ] /`, which is a set matching 1 character.)
|
| 466 |
|
| 467 | Alternation with `|`:
|
| 468 |
|
| 469 | / word | digit / # Matches 'a' OR '9', for example
|
| 470 |
|
| 471 | Grouping with parentheses:
|
| 472 |
|
| 473 | / (word digit) | \\ / # Matches a9 or \
|
| 474 |
|
| 475 | ### re-capture
|
| 476 |
|
| 477 | To retrieve a substring of a string that matches an Eggex, use a "capture
|
| 478 | group" like `<capture ...>`.
|
| 479 |
|
| 480 | Here's an eggex with a **positional** capture:
|
| 481 |
|
| 482 | var pat = / 'hi ' <capture d+> / # access with _group(1)
|
| 483 | # or Match => _group(1)
|
| 484 |
|
| 485 | Captures can be **named**:
|
| 486 |
|
| 487 | <capture d+ as month> # access with _group('month')
|
| 488 | # or Match => group('month')
|
| 489 |
|
| 490 | Captures can also have a type **conversion func**:
|
| 491 |
|
| 492 | <capture d+ : int> # _group(1) returns Int
|
| 493 |
|
| 494 | <capture d+ as month: int> # _group('month') returns Int
|
| 495 |
|
| 496 | Related docs and help topics:
|
| 497 |
|
| 498 | - [YSH Regex API](../ysh-regex-api.html)
|
| 499 | - [`_group()`](chap-builtin-func.html#_group)
|
| 500 | - [`Match => group()`](chap-type-method.html#group)
|
| 501 |
|
| 502 | ### re-splice
|
| 503 |
|
| 504 | To build an eggex out of smaller expressions, you can **splice** eggexes
|
| 505 | together:
|
| 506 |
|
| 507 | var D = / [0-9][0-9] /
|
| 508 | var time = / @D ':' @D / # [0-9][0-9]:[0-9][0-9]
|
| 509 |
|
| 510 | If the variable begins with a capital letter, you can omit `@`:
|
| 511 |
|
| 512 | var ip = / D ':' D /
|
| 513 |
|
| 514 | You can also splice a string:
|
| 515 |
|
| 516 | var greeting = 'hi'
|
| 517 | var pat = / @greeting ' world' / # hi world
|
| 518 |
|
| 519 | Splicing is **not** string concatenation; it works on eggex subtrees.
|
| 520 |
|
| 521 | ### re-flags
|
| 522 |
|
| 523 | Valid ERE flags, which are passed to libc's `regcomp()`:
|
| 524 |
|
| 525 | - `reg_icase` aka `i` - ignore case
|
| 526 | - `reg_newline` - 4 matching changes related to newlines
|
| 527 |
|
| 528 | See `man regcomp`.
|
| 529 |
|
| 530 | ### re-multiline
|
| 531 |
|
| 532 | Multi-line eggexes aren't yet implemented. Splicing makes it less necessary:
|
| 533 |
|
| 534 | var Name = / <capture [a-z]+ as name> /
|
| 535 | var Num = / <capture d+ as num> /
|
| 536 | var Space = / <capture s+ as space> /
|
| 537 |
|
| 538 | # For variables named like CapWords, splicing @Name doesn't require @
|
| 539 | var lexer = / Name | Num | Space /
|