OILS / doc / ref / chap-expr-lang.md View on Github | oilshell.org

506 lines, 307 significant
1---
2in_progress: yes
3body_css_class: width40 help-body
4default_highlighter: oils-sh
5preserve_anchor_case: yes
6---
7
8YSH Expression Language
9===
10
11This chapter in the [Oils Reference](index.html) describes the YSH expression
12language, which includes [Egg Expressions]($xref:eggex).
13
14<div id="toc">
15</div>
16
17## Literals
18
19### bool-literal
20
21YSH uses JavaScript-like spellings for these three "atoms":
22
23 true false null
24
25Note that the empty string is a good "special" value in some cases. The `null`
26value can't be interpolated into words.
27
28### int-literal
29
30 var myint = 42
31 var myfloat = 3.14
32 var float2 = 1e100
33
34### rune-literal
35
36 #'a' #'_' \n \\ \u{3bc}
37
38### ysh-string
39
40Double quoted strings are identical to shell:
41
42 var dq = "hello $world and $(hostname)"
43
44Single quoted strings may be raw:
45
46 var s = r'line\n' # raw string means \n is literal, NOT a newline
47
48Or escaped *J8 strings*:
49
50 var s = u'line\n \u{3bc}' # unicode string means \n is a newline
51 var s = b'line\n \u{3bc} \yff' # same thing, but also allows bytes
52
53Both `u''` and `b''` strings evaluate to the single `Str` type. The difference
54is that `b''` strings allow the `\yff` byte escape.
55
56---
57
58There's no way to express a single quote in raw strings. Use one of the other
59forms instead:
60
61 var sq = "single quote: ' "
62 var sq = u'single quote: \' '
63
64Sometimes you can omit the `r`, e.g. where there are no backslashes and thus no
65ambiguity:
66
67 echo 'foo'
68 echo r'foo' # same thing
69
70The `u''` and `b''` strings are called *J8 strings* because the syntax in YSH
71**code** matches JSON-like **data**.
72
73 var strU = u'mu = \u{3bc}' # J8 string with escapes
74 var strB = b'bytes \yff' # J8 string that can express byte strings
75
76More examples:
77
78 var myRaw = r'[a-z]\n' # raw strings are useful for regexes (not
79 # eggexes)
80
81### triple-quoted
82
83Triple-quoted string literals have leading whitespace stripped on each line.
84They come in the same variants:
85
86 var dq = """
87 hello $world and $(hostname)
88 no leading whitespace
89 """
90
91 var myRaw = r'''
92 raw string
93 no leading whitespace
94 '''
95
96 var strU = u'''
97 string that happens to be unicode \u{3bc}
98 no leading whitespace
99 '''
100
101 var strB = b'''
102 string that happens to be bytes \u{3bc} \yff
103 no leading whitespace
104 '''
105
106Again, you can omit the `r` prefix if there's no backslash, because it's not
107ambiguous:
108
109 var myRaw = '''
110 raw string
111 no leading whitespace
112 '''
113
114### str-template
115
116String templates use the same syntax as double-quoted strings:
117
118 var mytemplate = ^"name = $name, age = $age"
119
120Related topics:
121
122- [Str => replace](chap-type-method.html#replace)
123- [ysh-string](chap-expr-lang.html#ysh-string)
124
125### list-literal
126
127Lists have a Python-like syntax:
128
129 var mylist = ['one', 'two', 3]
130
131And a shell-like syntax:
132
133 var list2 = %| one two |
134
135The shell-like syntax accepts the same syntax that a command can:
136
137 ls $mystr @ARGV *.py {foo,bar}@example.com
138
139 # Rather than executing ls, evaluate and store words
140 var cmd = :| ls $mystr @ARGV *.py {foo,bar}@example.com |
141
142### dict-literal
143
144 {name: 'value'}
145
146### range
147
148A range is a sequence of numbers that can be iterated over:
149
150 for i in (0 .. 3) {
151 echo $i
152 }
153 => 0
154 => 1
155 => 2
156
157As with slices, the last number isn't included. Idiom to iterate from 1 to n:
158
159 for i in (1 .. n+1) {
160 echo $i
161 }
162
163### block-literal
164
165 var myblock = ^(echo $PWD)
166
167### expr-lit
168
169 var myexpr = ^[1 + 2*3]
170
171## Operators
172
173<h3 id="concat">concat <code>++</code></h3>
174
175The concatenation operator works on strings:
176
177 var s = 'hello'
178 var t = s ++ ' world'
179 = t
180 (Str) "hello world"
181
182and lists:
183
184 var L = ['one', 'two']
185 var M = L ++ ['three', '4']
186 = M
187 (List) ["one", "two", "three", "4"]
188
189String interpolation can be nicer than `++`:
190
191 var t2 = "${s} world" # same as t
192
193Likewise, splicing lists can be nicer:
194
195 var M2 = :| @L three 4 | # same as M
196
197### ysh-compare
198
199 a == b # Python-like equality, no type conversion
200 3 ~== 3.0 # True, type conversion
201 3 ~== '3' # True, type conversion
202 3 ~== '3.0' # True, type conversion
203
204### ysh-logical
205
206 not and or
207
208Note that these are distinct from `! && ||`.
209
210### ysh-arith
211
212 + - * / // % **
213
214### ysh-bitwise
215
216 ~ & | ^
217
218### ysh-ternary
219
220Like Python:
221
222 display = 'yes' if len(s) else 'empty'
223
224### ysh-index
225
226Like Python:
227
228 myarray[3]
229 mystr[3]
230
231TODO: Does string indexing give you an integer back?
232
233### ysh-slice
234
235Like Python:
236
237 myarray[1 : -1]
238 mystr[1 : -1]
239
240### func-call
241
242Like Python:
243
244 f(x, y)
245
246### thin-arrow
247
248The thin arrow is for mutating methods:
249
250 var mylist = ['bar']
251 call mylist->pop()
252
253<!--
254TODO
255 var mydict = {name: 'foo'}
256 call mydict->erase('name')
257-->
258
259### fat-arrow
260
261The fat arrow is for transforming methods:
262
263 if (s => startsWith('prefix')) {
264 echo 'yes'
265 }
266
267If the method lookup on `s` fails, it looks for free functions. This means it
268can be used for "chaining" transformations:
269
270 var x = myFunc() => list() => join()
271
272### match-ops
273
274YSH has four pattern matching operators: `~ !~ ~~ !~~`.
275
276Does string match an **eggex**?
277
278 var filename = 'x42.py'
279 if (filename ~ / d+ /) {
280 echo 'number'
281 }
282
283Does a string match a POSIX regular expression (ERE syntax)?
284
285 if (filename ~ '[[:digit:]]+') {
286 echo 'number'
287 }
288
289Negate the result with the `!~` operator:
290
291 if (filename !~ /space/ ) {
292 echo 'no space'
293 }
294
295 if (filename !~ '[[:space:]]' ) {
296 echo 'no space'
297 }
298
299Does a string match a **glob**?
300
301 if (filename ~~ '*.py') {
302 echo 'Python'
303 }
304
305 if (filename !~~ '*.py') {
306 echo 'not Python'
307 }
308
309Take care not to confuse glob patterns and regular expressions.
310
311- Related doc: [YSH Regex API](../ysh-regex-api.html)
312
313## Eggex
314
315### re-literal
316
317An eggex literal looks like this:
318
319 / expression ; flags ; translation preference /
320
321The flags and translation preference are both optional.
322
323Examples:
324
325 var pat = / d+ / # => [[:digit:]]+
326
327You can specify flags passed to libc `regcomp()`:
328
329 var pat = / d+ ; reg_icase reg_newline /
330
331You can specify a translation preference after a second semi-colon:
332
333 var pat = / d+ ; ; ERE /
334
335Right now the translation preference does nothing. It could be used to
336translate eggex to PCRE or Python syntax.
337
338- Related doc: [Egg Expressions](../eggex.html)
339
340### re-primitive
341
342There are two kinds of eggex primitives.
343
344"Zero-width assertions" match a position rather than a character:
345
346 %start # translates to ^
347 %end # translates to $
348
349Literal characters appear within **single** quotes:
350
351 'oh *really*' # translates to regex-escaped string
352
353Double-quoted strings are **not** eggex primitives. Instead, you can use
354splicing of strings:
355
356 var dq = "hi $name"
357 var eggex = / @dq /
358
359### class-literal
360
361An eggex character class literal specifies a set. It can have individual
362characters and ranges:
363
364 [ 'x' 'y' 'z' a-f A-F 0-9 ] # 3 chars, 3 ranges
365
366Omit quotes on ASCII characters:
367
368 [ x y z ] # avoid typing 'x' 'y' 'z'
369
370Sets of characters can be written as trings
371
372 [ 'xyz' ] # any of 3 chars, not a sequence of 3 chars
373
374Backslash escapes are respected:
375
376 [ \\ \' \" \0 ]
377 [ \xFF \u0100 ]
378
379Splicing:
380
381 [ @str_var ]
382
383Negation always uses `!`
384
385 ![ a-f A-F 'xyz' @str_var ]
386
387### named-class
388
389Perl-like shortcuts for sets of characters:
390
391 [ dot ] # => .
392 [ digit ] # => [[:digit:]]
393 [ space ] # => [[:space:]]
394 [ word ] # => [[:alpha:]][[:digit:]]_
395
396Abbreviations:
397
398 [ d s w ] # Same as [ digit space word ]
399
400Valid POSIX classes:
401
402 alnum cntrl lower space
403 alpha digit print upper
404 blank graph punct xdigit
405
406Negated:
407
408 !digit !space !word
409 !d !s !w
410 !alnum # etc.
411
412### re-repeat
413
414Eggex repetition looks like POSIX syntax:
415
416 / 'a'? / # zero or one
417 / 'a'* / # zero or more
418 / 'a'+ / # one or more
419
420Counted repetitions:
421
422 / 'a'{3} / # exactly 3 repetitions
423 / 'a'{2,4} / # between 2 to 4 repetitions
424
425### re-compound
426
427Sequence expressions with a space:
428
429 / word digit digit / # Matches 3 characters in sequence
430 # Examples: a42, b51
431
432(Compare `/ [ word digit ] /`, which is a set matching 1 character.)
433
434Alternation with `|`:
435
436 / word | digit / # Matches 'a' OR '9', for example
437
438Grouping with parentheses:
439
440 / (word digit) | \\ / # Matches a9 or \
441
442### re-capture
443
444To retrieve a substring of a string that matches an Eggex, use a "capture
445group" like `<capture ...>`.
446
447Here's an eggex with a **positional** capture:
448
449 var pat = / 'hi ' <capture d+> / # access with _group(1)
450 # or Match => _group(1)
451
452Captures can be **named**:
453
454 <capture d+ as month> # access with _group('month')
455 # or Match => group('month')
456
457Captures can also have a type **conversion func**:
458
459 <capture d+ : int> # _group(1) returns Int
460
461 <capture d+ as month: int> # _group('month') returns Int
462
463Related docs and help topics:
464
465- [YSH Regex API](../ysh-regex-api.html)
466- [`_group()`](chap-builtin-func.html#_group)
467- [`Match => group()`](chap-type-method.html#group)
468
469### re-splice
470
471To build an eggex out of smaller expressions, you can **splice** eggexes
472together:
473
474 var D = / [0-9][0-9] /
475 var time = / @D ':' @D / # [0-9][0-9]:[0-9][0-9]
476
477If the variable begins with a capital letter, you can omit `@`:
478
479 var ip = / D ':' D /
480
481You can also splice a string:
482
483 var greeting = 'hi'
484 var pat = / @greeting ' world' / # hi world
485
486Splicing is **not** string concatenation; it works on eggex subtrees.
487
488### re-flags
489
490Valid ERE flags, which are passed to libc's `regcomp()`:
491
492- `reg_icase` aka `i` - ignore case
493- `reg_newline` - 4 matching changes related to newlines
494
495See `man regcomp`.
496
497### re-multiline
498
499Multi-line eggexes aren't yet implemented. Splicing makes it less necessary:
500
501 var Name = / <capture [a-z]+ as name> /
502 var Num = / <capture d+ as num> /
503 var Space = / <capture s+ as space> /
504
505 # For variables named like CapWords, splicing @Name doesn't require @
506 var lexer = / Name | Num | Space /