OILS / doc / ref / chap-expr-lang.md View on Github | oilshell.org

539 lines, 330 significant
1---
2in_progress: yes
3body_css_class: width40 help-body
4default_highlighter: oils-sh
5preserve_anchor_case: yes
6---
7
8YSH Expression Language
9===
10
11This chapter in the [Oils Reference](index.html) describes the YSH expression
12language, which includes [Egg Expressions]($xref:eggex).
13
14<div id="toc">
15</div>
16
17## Literals
18
19### bool-literal
20
21YSH uses JavaScript-like spellings for these three "atoms":
22
23 true false null
24
25Note that the empty string is a good "special" value in some cases. The `null`
26value can't be interpolated into words.
27
28### int-literal
29
30 var myint = 42
31 var myfloat = 3.14
32 var float2 = 1e100
33
34### rune-literal
35
36 #'a' #'_' \n \\ \u{3bc}
37
38### ysh-string
39
40Double quoted strings are identical to shell:
41
42 var dq = "hello $world and $(hostname)"
43
44Single quoted strings may be raw:
45
46 var s = r'line\n' # raw string means \n is literal, NOT a newline
47
48Or escaped *J8 strings*:
49
50 var s = u'line\n \u{3bc}' # unicode string means \n is a newline
51 var s = b'line\n \u{3bc} \yff' # same thing, but also allows bytes
52
53Both `u''` and `b''` strings evaluate to the single `Str` type. The difference
54is that `b''` strings allow the `\yff` byte escape.
55
56---
57
58There's no way to express a single quote in raw strings. Use one of the other
59forms instead:
60
61 var sq = "single quote: ' "
62 var sq = u'single quote: \' '
63
64Sometimes you can omit the `r`, e.g. where there are no backslashes and thus no
65ambiguity:
66
67 echo 'foo'
68 echo r'foo' # same thing
69
70The `u''` and `b''` strings are called *J8 strings* because the syntax in YSH
71**code** matches JSON-like **data**.
72
73 var strU = u'mu = \u{3bc}' # J8 string with escapes
74 var strB = b'bytes \yff' # J8 string that can express byte strings
75
76More examples:
77
78 var myRaw = r'[a-z]\n' # raw strings are useful for regexes (not
79 # eggexes)
80
81### triple-quoted
82
83Triple-quoted string literals have leading whitespace stripped on each line.
84They come in the same variants:
85
86 var dq = """
87 hello $world and $(hostname)
88 no leading whitespace
89 """
90
91 var myRaw = r'''
92 raw string
93 no leading whitespace
94 '''
95
96 var strU = u'''
97 string that happens to be unicode \u{3bc}
98 no leading whitespace
99 '''
100
101 var strB = b'''
102 string that happens to be bytes \u{3bc} \yff
103 no leading whitespace
104 '''
105
106Again, you can omit the `r` prefix if there's no backslash, because it's not
107ambiguous:
108
109 var myRaw = '''
110 raw string
111 no leading whitespace
112 '''
113
114### str-template
115
116String templates use the same syntax as double-quoted strings:
117
118 var mytemplate = ^"name = $name, age = $age"
119
120Related topics:
121
122- [Str => replace](chap-type-method.html#replace)
123- [ysh-string](chap-expr-lang.html#ysh-string)
124
125### list-literal
126
127Lists have a Python-like syntax:
128
129 var mylist = ['one', 'two', 3]
130
131And a shell-like syntax:
132
133 var list2 = %| one two |
134
135The shell-like syntax accepts the same syntax that a command can:
136
137 ls $mystr @ARGV *.py {foo,bar}@example.com
138
139 # Rather than executing ls, evaluate and store words
140 var cmd = :| ls $mystr @ARGV *.py {foo,bar}@example.com |
141
142### dict-literal
143
144 {name: 'value'}
145
146### range
147
148A range is a sequence of numbers that can be iterated over:
149
150 for i in (0 .. 3) {
151 echo $i
152 }
153 => 0
154 => 1
155 => 2
156
157As with slices, the last number isn't included. Idiom to iterate from 1 to n:
158
159 for i in (1 .. n+1) {
160 echo $i
161 }
162
163### block-literal
164
165 var myblock = ^(echo $PWD)
166
167### expr-lit
168
169 var myexpr = ^[1 + 2*3]
170
171## Operators
172
173<h3 id="concat">concat <code>++</code></h3>
174
175The concatenation operator works on strings:
176
177 var s = 'hello'
178 var t = s ++ ' world'
179 = t
180 (Str) "hello world"
181
182and lists:
183
184 var L = ['one', 'two']
185 var M = L ++ ['three', '4']
186 = M
187 (List) ["one", "two", "three", "4"]
188
189String interpolation can be nicer than `++`:
190
191 var t2 = "${s} world" # same as t
192
193Likewise, splicing lists can be nicer:
194
195 var M2 = :| @L three 4 | # same as M
196
197### ysh-compare
198
199 a == b # Python-like equality, no type conversion
200 3 ~== 3.0 # True, type conversion
201 3 ~== '3' # True, type conversion
202 3 ~== '3.0' # True, type conversion
203
204### ysh-logical
205
206 not and or
207
208Note that these are distinct from `! && ||`.
209
210### ysh-arith
211
212YSH supports most of the arithmetic operators from Python. Notably, `/` and `%`
213differ from Python as [they round toward zero, not negative
214infinity](https://www.oilshell.org/blog/2024/03/release-0.21.0.html#integers-dont-do-whatever-python-or-c-does).
215
216Use `+ - *` for `Int` or `Float` addition, subtraction and multiplication. If
217any of the operands are `Float`s, then the output will also be a `Float`.
218
219Use `/` and `//` for `Float` division and `Int` division, respectively. `/`
220will _always_ result in a `Float`, meanwhile `//` will _always_ result in an
221`Int`.
222
223 = 1 / 2 # => (Float) 0.5
224 = 1 // 2 # => (Int) 0
225
226Use `%` to compute the _remainder_ of integer division. The left operand must
227be an `Int` and the right a _positive_ `Int`.
228
229 = 1 % 2 # -> (Int) 1
230 = -4 % 2 # -> (Int) 0
231
232Use `**` for exponentiation. The left operand must be an `Int` and the right a
233_positive_ `Int`.
234
235All arithmetic operators may coerce either of their operands from strings to a
236number, provided those strings are formatted as numbers.
237
238 = 10 + '1' # => (Int) 11
239
240Operators like `+ - * /` will coerce strings to _either_ an `Int` or `Float`.
241However, operators like `// ** %` and bit shifts will coerce strings _only_ to
242an `Int`.
243
244 = '1.14' + '2' # => (Float) 3.14
245 = '1.14' % '2' # Type Error: Left operand is a Str
246
247### ysh-bitwise
248
249 ~ & | ^
250
251### ysh-ternary
252
253Like Python:
254
255 display = 'yes' if len(s) else 'empty'
256
257### ysh-index
258
259Like Python:
260
261 myarray[3]
262 mystr[3]
263
264TODO: Does string indexing give you an integer back?
265
266### ysh-slice
267
268Like Python:
269
270 myarray[1 : -1]
271 mystr[1 : -1]
272
273### func-call
274
275Like Python:
276
277 f(x, y)
278
279### thin-arrow
280
281The thin arrow is for mutating methods:
282
283 var mylist = ['bar']
284 call mylist->pop()
285
286<!--
287TODO
288 var mydict = {name: 'foo'}
289 call mydict->erase('name')
290-->
291
292### fat-arrow
293
294The fat arrow is for transforming methods:
295
296 if (s => startsWith('prefix')) {
297 echo 'yes'
298 }
299
300If the method lookup on `s` fails, it looks for free functions. This means it
301can be used for "chaining" transformations:
302
303 var x = myFunc() => list() => join()
304
305### match-ops
306
307YSH has four pattern matching operators: `~ !~ ~~ !~~`.
308
309Does string match an **eggex**?
310
311 var filename = 'x42.py'
312 if (filename ~ / d+ /) {
313 echo 'number'
314 }
315
316Does a string match a POSIX regular expression (ERE syntax)?
317
318 if (filename ~ '[[:digit:]]+') {
319 echo 'number'
320 }
321
322Negate the result with the `!~` operator:
323
324 if (filename !~ /space/ ) {
325 echo 'no space'
326 }
327
328 if (filename !~ '[[:space:]]' ) {
329 echo 'no space'
330 }
331
332Does a string match a **glob**?
333
334 if (filename ~~ '*.py') {
335 echo 'Python'
336 }
337
338 if (filename !~~ '*.py') {
339 echo 'not Python'
340 }
341
342Take care not to confuse glob patterns and regular expressions.
343
344- Related doc: [YSH Regex API](../ysh-regex-api.html)
345
346## Eggex
347
348### re-literal
349
350An eggex literal looks like this:
351
352 / expression ; flags ; translation preference /
353
354The flags and translation preference are both optional.
355
356Examples:
357
358 var pat = / d+ / # => [[:digit:]]+
359
360You can specify flags passed to libc `regcomp()`:
361
362 var pat = / d+ ; reg_icase reg_newline /
363
364You can specify a translation preference after a second semi-colon:
365
366 var pat = / d+ ; ; ERE /
367
368Right now the translation preference does nothing. It could be used to
369translate eggex to PCRE or Python syntax.
370
371- Related doc: [Egg Expressions](../eggex.html)
372
373### re-primitive
374
375There are two kinds of eggex primitives.
376
377"Zero-width assertions" match a position rather than a character:
378
379 %start # translates to ^
380 %end # translates to $
381
382Literal characters appear within **single** quotes:
383
384 'oh *really*' # translates to regex-escaped string
385
386Double-quoted strings are **not** eggex primitives. Instead, you can use
387splicing of strings:
388
389 var dq = "hi $name"
390 var eggex = / @dq /
391
392### class-literal
393
394An eggex character class literal specifies a set. It can have individual
395characters and ranges:
396
397 [ 'x' 'y' 'z' a-f A-F 0-9 ] # 3 chars, 3 ranges
398
399Omit quotes on ASCII characters:
400
401 [ x y z ] # avoid typing 'x' 'y' 'z'
402
403Sets of characters can be written as trings
404
405 [ 'xyz' ] # any of 3 chars, not a sequence of 3 chars
406
407Backslash escapes are respected:
408
409 [ \\ \' \" \0 ]
410 [ \xFF \u0100 ]
411
412Splicing:
413
414 [ @str_var ]
415
416Negation always uses `!`
417
418 ![ a-f A-F 'xyz' @str_var ]
419
420### named-class
421
422Perl-like shortcuts for sets of characters:
423
424 [ dot ] # => .
425 [ digit ] # => [[:digit:]]
426 [ space ] # => [[:space:]]
427 [ word ] # => [[:alpha:]][[:digit:]]_
428
429Abbreviations:
430
431 [ d s w ] # Same as [ digit space word ]
432
433Valid POSIX classes:
434
435 alnum cntrl lower space
436 alpha digit print upper
437 blank graph punct xdigit
438
439Negated:
440
441 !digit !space !word
442 !d !s !w
443 !alnum # etc.
444
445### re-repeat
446
447Eggex repetition looks like POSIX syntax:
448
449 / 'a'? / # zero or one
450 / 'a'* / # zero or more
451 / 'a'+ / # one or more
452
453Counted repetitions:
454
455 / 'a'{3} / # exactly 3 repetitions
456 / 'a'{2,4} / # between 2 to 4 repetitions
457
458### re-compound
459
460Sequence expressions with a space:
461
462 / word digit digit / # Matches 3 characters in sequence
463 # Examples: a42, b51
464
465(Compare `/ [ word digit ] /`, which is a set matching 1 character.)
466
467Alternation with `|`:
468
469 / word | digit / # Matches 'a' OR '9', for example
470
471Grouping with parentheses:
472
473 / (word digit) | \\ / # Matches a9 or \
474
475### re-capture
476
477To retrieve a substring of a string that matches an Eggex, use a "capture
478group" like `<capture ...>`.
479
480Here's an eggex with a **positional** capture:
481
482 var pat = / 'hi ' <capture d+> / # access with _group(1)
483 # or Match => _group(1)
484
485Captures can be **named**:
486
487 <capture d+ as month> # access with _group('month')
488 # or Match => group('month')
489
490Captures can also have a type **conversion func**:
491
492 <capture d+ : int> # _group(1) returns Int
493
494 <capture d+ as month: int> # _group('month') returns Int
495
496Related docs and help topics:
497
498- [YSH Regex API](../ysh-regex-api.html)
499- [`_group()`](chap-builtin-func.html#_group)
500- [`Match => group()`](chap-type-method.html#group)
501
502### re-splice
503
504To build an eggex out of smaller expressions, you can **splice** eggexes
505together:
506
507 var D = / [0-9][0-9] /
508 var time = / @D ':' @D / # [0-9][0-9]:[0-9][0-9]
509
510If the variable begins with a capital letter, you can omit `@`:
511
512 var ip = / D ':' D /
513
514You can also splice a string:
515
516 var greeting = 'hi'
517 var pat = / @greeting ' world' / # hi world
518
519Splicing is **not** string concatenation; it works on eggex subtrees.
520
521### re-flags
522
523Valid ERE flags, which are passed to libc's `regcomp()`:
524
525- `reg_icase` aka `i` - ignore case
526- `reg_newline` - 4 matching changes related to newlines
527
528See `man regcomp`.
529
530### re-multiline
531
532Multi-line eggexes aren't yet implemented. Splicing makes it less necessary:
533
534 var Name = / <capture [a-z]+ as name> /
535 var Num = / <capture d+ as num> /
536 var Space = / <capture s+ as space> /
537
538 # For variables named like CapWords, splicing @Name doesn't require @
539 var lexer = / Name | Num | Space /