1 | ---
|
2 | in_progress: yes
|
3 | body_css_class: width40 help-body
|
4 | default_highlighter: oils-sh
|
5 | preserve_anchor_case: yes
|
6 | ---
|
7 |
|
8 | YSH Expression Language
|
9 | ===
|
10 |
|
11 | This chapter in the [Oils Reference](index.html) describes the YSH expression
|
12 | language, which includes [Egg Expressions]($xref:eggex).
|
13 |
|
14 | <div id="toc">
|
15 | </div>
|
16 |
|
17 | ## Literals
|
18 |
|
19 | ### bool-literal
|
20 |
|
21 | YSH uses JavaScript-like spellings for these three "atoms":
|
22 |
|
23 | true false null
|
24 |
|
25 | Note that the empty string is a good "special" value in some cases. The `null`
|
26 | value can't be interpolated into words.
|
27 |
|
28 | ### int-literal
|
29 |
|
30 | var myint = 42
|
31 | var myfloat = 3.14
|
32 | var float2 = 1e100
|
33 |
|
34 | ### rune-literal
|
35 |
|
36 | #'a' #'_' \n \\ \u{3bc}
|
37 |
|
38 | ### ysh-string
|
39 |
|
40 | Double quoted strings are identical to shell:
|
41 |
|
42 | var dq = "hello $world and $(hostname)"
|
43 |
|
44 | Single quoted strings may be raw:
|
45 |
|
46 | var s = r'line\n' # raw string means \n is literal, NOT a newline
|
47 |
|
48 | Or escaped *J8 strings*:
|
49 |
|
50 | var s = u'line\n \u{3bc}' # unicode string means \n is a newline
|
51 | var s = b'line\n \u{3bc} \yff' # same thing, but also allows bytes
|
52 |
|
53 | Both `u''` and `b''` strings evaluate to the single `Str` type. The difference
|
54 | is that `b''` strings allow the `\yff` byte escape.
|
55 |
|
56 | ---
|
57 |
|
58 | There's no way to express a single quote in raw strings. Use one of the other
|
59 | forms instead:
|
60 |
|
61 | var sq = "single quote: ' "
|
62 | var sq = u'single quote: \' '
|
63 |
|
64 | Sometimes you can omit the `r`, e.g. where there are no backslashes and thus no
|
65 | ambiguity:
|
66 |
|
67 | echo 'foo'
|
68 | echo r'foo' # same thing
|
69 |
|
70 | The `u''` and `b''` strings are called *J8 strings* because the syntax in YSH
|
71 | **code** matches JSON-like **data**.
|
72 |
|
73 | var strU = u'mu = \u{3bc}' # J8 string with escapes
|
74 | var strB = b'bytes \yff' # J8 string that can express byte strings
|
75 |
|
76 | More examples:
|
77 |
|
78 | var myRaw = r'[a-z]\n' # raw strings are useful for regexes (not
|
79 | # eggexes)
|
80 |
|
81 | ### triple-quoted
|
82 |
|
83 | Triple-quoted string literals have leading whitespace stripped on each line.
|
84 | They come in the same variants:
|
85 |
|
86 | var dq = """
|
87 | hello $world and $(hostname)
|
88 | no leading whitespace
|
89 | """
|
90 |
|
91 | var myRaw = r'''
|
92 | raw string
|
93 | no leading whitespace
|
94 | '''
|
95 |
|
96 | var strU = u'''
|
97 | string that happens to be unicode \u{3bc}
|
98 | no leading whitespace
|
99 | '''
|
100 |
|
101 | var strB = b'''
|
102 | string that happens to be bytes \u{3bc} \yff
|
103 | no leading whitespace
|
104 | '''
|
105 |
|
106 | Again, you can omit the `r` prefix if there's no backslash, because it's not
|
107 | ambiguous:
|
108 |
|
109 | var myRaw = '''
|
110 | raw string
|
111 | no leading whitespace
|
112 | '''
|
113 |
|
114 | ### str-template
|
115 |
|
116 | String templates use the same syntax as double-quoted strings:
|
117 |
|
118 | var mytemplate = ^"name = $name, age = $age"
|
119 |
|
120 | Related topics:
|
121 |
|
122 | - [Str => replace](chap-type-method.html#replace)
|
123 | - [ysh-string](chap-expr-lang.html#ysh-string)
|
124 |
|
125 | ### list-literal
|
126 |
|
127 | Lists have a Python-like syntax:
|
128 |
|
129 | var mylist = ['one', 'two', 3]
|
130 |
|
131 | And a shell-like syntax:
|
132 |
|
133 | var list2 = %| one two |
|
134 |
|
135 | The shell-like syntax accepts the same syntax that a command can:
|
136 |
|
137 | ls $mystr @ARGV *.py {foo,bar}@example.com
|
138 |
|
139 | # Rather than executing ls, evaluate and store words
|
140 | var cmd = :| ls $mystr @ARGV *.py {foo,bar}@example.com |
|
141 |
|
142 | ### dict-literal
|
143 |
|
144 | {name: 'value'}
|
145 |
|
146 | ### range
|
147 |
|
148 | A range is a sequence of numbers that can be iterated over:
|
149 |
|
150 | for i in (0 .. 3) {
|
151 | echo $i
|
152 | }
|
153 | => 0
|
154 | => 1
|
155 | => 2
|
156 |
|
157 | As with slices, the last number isn't included. Idiom to iterate from 1 to n:
|
158 |
|
159 | for i in (1 .. n+1) {
|
160 | echo $i
|
161 | }
|
162 |
|
163 | ### block-literal
|
164 |
|
165 | var myblock = ^(echo $PWD)
|
166 |
|
167 | ### expr-lit
|
168 |
|
169 | var myexpr = ^[1 + 2*3]
|
170 |
|
171 | ## Operators
|
172 |
|
173 | <h3 id="concat">concat <code>++</code></h3>
|
174 |
|
175 | The concatenation operator works on strings:
|
176 |
|
177 | var s = 'hello'
|
178 | var t = s ++ ' world'
|
179 | = t
|
180 | (Str) "hello world"
|
181 |
|
182 | and lists:
|
183 |
|
184 | var L = ['one', 'two']
|
185 | var M = L ++ ['three', '4']
|
186 | = M
|
187 | (List) ["one", "two", "three", "4"]
|
188 |
|
189 | String interpolation can be nicer than `++`:
|
190 |
|
191 | var t2 = "${s} world" # same as t
|
192 |
|
193 | Likewise, splicing lists can be nicer:
|
194 |
|
195 | var M2 = :| @L three 4 | # same as M
|
196 |
|
197 | ### ysh-compare
|
198 |
|
199 | a == b # Python-like equality, no type conversion
|
200 | 3 ~== 3.0 # True, type conversion
|
201 | 3 ~== '3' # True, type conversion
|
202 | 3 ~== '3.0' # True, type conversion
|
203 |
|
204 | ### ysh-logical
|
205 |
|
206 | not and or
|
207 |
|
208 | Note that these are distinct from `! && ||`.
|
209 |
|
210 | ### ysh-arith
|
211 |
|
212 | + - * / // % **
|
213 |
|
214 | ### ysh-bitwise
|
215 |
|
216 | ~ & | ^
|
217 |
|
218 | ### ysh-ternary
|
219 |
|
220 | Like Python:
|
221 |
|
222 | display = 'yes' if len(s) else 'empty'
|
223 |
|
224 | ### ysh-index
|
225 |
|
226 | Like Python:
|
227 |
|
228 | myarray[3]
|
229 | mystr[3]
|
230 |
|
231 | TODO: Does string indexing give you an integer back?
|
232 |
|
233 | ### ysh-slice
|
234 |
|
235 | Like Python:
|
236 |
|
237 | myarray[1 : -1]
|
238 | mystr[1 : -1]
|
239 |
|
240 | ### func-call
|
241 |
|
242 | Like Python:
|
243 |
|
244 | f(x, y)
|
245 |
|
246 | ### thin-arrow
|
247 |
|
248 | The thin arrow is for mutating methods:
|
249 |
|
250 | var mylist = ['bar']
|
251 | call mylist->pop()
|
252 |
|
253 | <!--
|
254 | TODO
|
255 | var mydict = {name: 'foo'}
|
256 | call mydict->erase('name')
|
257 | -->
|
258 |
|
259 | ### fat-arrow
|
260 |
|
261 | The fat arrow is for transforming methods:
|
262 |
|
263 | if (s => startsWith('prefix')) {
|
264 | echo 'yes'
|
265 | }
|
266 |
|
267 | If the method lookup on `s` fails, it looks for free functions. This means it
|
268 | can be used for "chaining" transformations:
|
269 |
|
270 | var x = myFunc() => list() => join()
|
271 |
|
272 | ### match-ops
|
273 |
|
274 | YSH has four pattern matching operators: `~ !~ ~~ !~~`.
|
275 |
|
276 | Does string match an **eggex**?
|
277 |
|
278 | var filename = 'x42.py'
|
279 | if (filename ~ / d+ /) {
|
280 | echo 'number'
|
281 | }
|
282 |
|
283 | Does a string match a POSIX regular expression (ERE syntax)?
|
284 |
|
285 | if (filename ~ '[[:digit:]]+') {
|
286 | echo 'number'
|
287 | }
|
288 |
|
289 | Negate the result with the `!~` operator:
|
290 |
|
291 | if (filename !~ /space/ ) {
|
292 | echo 'no space'
|
293 | }
|
294 |
|
295 | if (filename !~ '[[:space:]]' ) {
|
296 | echo 'no space'
|
297 | }
|
298 |
|
299 | Does a string match a **glob**?
|
300 |
|
301 | if (filename ~~ '*.py') {
|
302 | echo 'Python'
|
303 | }
|
304 |
|
305 | if (filename !~~ '*.py') {
|
306 | echo 'not Python'
|
307 | }
|
308 |
|
309 | Take care not to confuse glob patterns and regular expressions.
|
310 |
|
311 | - Related doc: [YSH Regex API](../ysh-regex-api.html)
|
312 |
|
313 | ## Eggex
|
314 |
|
315 | ### re-literal
|
316 |
|
317 | An eggex literal looks like this:
|
318 |
|
319 | / expression ; flags ; translation preference /
|
320 |
|
321 | The flags and translation preference are both optional.
|
322 |
|
323 | Examples:
|
324 |
|
325 | var pat = / d+ / # => [[:digit:]]+
|
326 |
|
327 | You can specify flags passed to libc `regcomp()`:
|
328 |
|
329 | var pat = / d+ ; reg_icase reg_newline /
|
330 |
|
331 | You can specify a translation preference after a second semi-colon:
|
332 |
|
333 | var pat = / d+ ; ; ERE /
|
334 |
|
335 | Right now the translation preference does nothing. It could be used to
|
336 | translate eggex to PCRE or Python syntax.
|
337 |
|
338 | - Related doc: [Egg Expressions](../eggex.html)
|
339 |
|
340 | ### re-primitive
|
341 |
|
342 | There are two kinds of eggex primitives.
|
343 |
|
344 | "Zero-width assertions" match a position rather than a character:
|
345 |
|
346 | %start # translates to ^
|
347 | %end # translates to $
|
348 |
|
349 | Literal characters appear within **single** quotes:
|
350 |
|
351 | 'oh *really*' # translates to regex-escaped string
|
352 |
|
353 | Double-quoted strings are **not** eggex primitives. Instead, you can use
|
354 | splicing of strings:
|
355 |
|
356 | var dq = "hi $name"
|
357 | var eggex = / @dq /
|
358 |
|
359 | ### class-literal
|
360 |
|
361 | An eggex character class literal specifies a set. It can have individual
|
362 | characters and ranges:
|
363 |
|
364 | [ 'x' 'y' 'z' a-f A-F 0-9 ] # 3 chars, 3 ranges
|
365 |
|
366 | Omit quotes on ASCII characters:
|
367 |
|
368 | [ x y z ] # avoid typing 'x' 'y' 'z'
|
369 |
|
370 | Sets of characters can be written as trings
|
371 |
|
372 | [ 'xyz' ] # any of 3 chars, not a sequence of 3 chars
|
373 |
|
374 | Backslash escapes are respected:
|
375 |
|
376 | [ \\ \' \" \0 ]
|
377 | [ \xFF \u0100 ]
|
378 |
|
379 | Splicing:
|
380 |
|
381 | [ @str_var ]
|
382 |
|
383 | Negation always uses `!`
|
384 |
|
385 | ![ a-f A-F 'xyz' @str_var ]
|
386 |
|
387 | ### named-class
|
388 |
|
389 | Perl-like shortcuts for sets of characters:
|
390 |
|
391 | [ dot ] # => .
|
392 | [ digit ] # => [[:digit:]]
|
393 | [ space ] # => [[:space:]]
|
394 | [ word ] # => [[:alpha:]][[:digit:]]_
|
395 |
|
396 | Abbreviations:
|
397 |
|
398 | [ d s w ] # Same as [ digit space word ]
|
399 |
|
400 | Valid POSIX classes:
|
401 |
|
402 | alnum cntrl lower space
|
403 | alpha digit print upper
|
404 | blank graph punct xdigit
|
405 |
|
406 | Negated:
|
407 |
|
408 | !digit !space !word
|
409 | !d !s !w
|
410 | !alnum # etc.
|
411 |
|
412 | ### re-repeat
|
413 |
|
414 | Eggex repetition looks like POSIX syntax:
|
415 |
|
416 | / 'a'? / # zero or one
|
417 | / 'a'* / # zero or more
|
418 | / 'a'+ / # one or more
|
419 |
|
420 | Counted repetitions:
|
421 |
|
422 | / 'a'{3} / # exactly 3 repetitions
|
423 | / 'a'{2,4} / # between 2 to 4 repetitions
|
424 |
|
425 | ### re-compound
|
426 |
|
427 | Sequence expressions with a space:
|
428 |
|
429 | / word digit digit / # Matches 3 characters in sequence
|
430 | # Examples: a42, b51
|
431 |
|
432 | (Compare `/ [ word digit ] /`, which is a set matching 1 character.)
|
433 |
|
434 | Alternation with `|`:
|
435 |
|
436 | / word | digit / # Matches 'a' OR '9', for example
|
437 |
|
438 | Grouping with parentheses:
|
439 |
|
440 | / (word digit) | \\ / # Matches a9 or \
|
441 |
|
442 | ### re-capture
|
443 |
|
444 | To retrieve a substring of a string that matches an Eggex, use a "capture
|
445 | group" like `<capture ...>`.
|
446 |
|
447 | Here's an eggex with a **positional** capture:
|
448 |
|
449 | var pat = / 'hi ' <capture d+> / # access with _group(1)
|
450 | # or Match => _group(1)
|
451 |
|
452 | Captures can be **named**:
|
453 |
|
454 | <capture d+ as month> # access with _group('month')
|
455 | # or Match => group('month')
|
456 |
|
457 | Captures can also have a type **conversion func**:
|
458 |
|
459 | <capture d+ : int> # _group(1) returns Int
|
460 |
|
461 | <capture d+ as month: int> # _group('month') returns Int
|
462 |
|
463 | Related docs and help topics:
|
464 |
|
465 | - [YSH Regex API](../ysh-regex-api.html)
|
466 | - [`_group()`](chap-builtin-func.html#_group)
|
467 | - [`Match => group()`](chap-type-method.html#group)
|
468 |
|
469 | ### re-splice
|
470 |
|
471 | To build an eggex out of smaller expressions, you can **splice** eggexes
|
472 | together:
|
473 |
|
474 | var D = / [0-9][0-9] /
|
475 | var time = / @D ':' @D / # [0-9][0-9]:[0-9][0-9]
|
476 |
|
477 | If the variable begins with a capital letter, you can omit `@`:
|
478 |
|
479 | var ip = / D ':' D /
|
480 |
|
481 | You can also splice a string:
|
482 |
|
483 | var greeting = 'hi'
|
484 | var pat = / @greeting ' world' / # hi world
|
485 |
|
486 | Splicing is **not** string concatenation; it works on eggex subtrees.
|
487 |
|
488 | ### re-flags
|
489 |
|
490 | Valid ERE flags, which are passed to libc's `regcomp()`:
|
491 |
|
492 | - `reg_icase` aka `i` - ignore case
|
493 | - `reg_newline` - 4 matching changes related to newlines
|
494 |
|
495 | See `man regcomp`.
|
496 |
|
497 | ### re-multiline
|
498 |
|
499 | Multi-line eggexes aren't yet implemented. Splicing makes it less necessary:
|
500 |
|
501 | var Name = / <capture [a-z]+ as name> /
|
502 | var Num = / <capture d+ as num> /
|
503 | var Space = / <capture s+ as space> /
|
504 |
|
505 | # For variables named like CapWords, splicing @Name doesn't require @
|
506 | var lexer = / Name | Num | Space /
|