Why Sponsor Oils? | source | all docs for version 0.23.0 | all versions | oilshell.org
Warning: Work in progress! Leave feedback on Zulip or Github if you'd like this doc to be updated.
(July 2024)
This is a long, "unified/orthogonal" design for:
There's also a relation to:
jq, which will be covered elsewhere.It's a layered design. That means we need some underlying mechanisms:
eval and positional args $1 $2 $3ctx builtinIt will link to:
Let's introduce this with a text file
$ seq 4 | xargs -n 2 | tee test.txt
1 2
3 4 
xargs does splitting:
$ echo 'alice bob' | xargs -n 1 -- echo hi | tee test2.txt
hi alice
hi bob
Oils:
# should we use $_ for _word _line _row?  $[_.age] instead of $[_row.age]
$ echo 'alice bob' | each-word { echo "hi $_" } | tee test2.txt
hi alice
hi bob
Normally this should be balanced
Now let's use awk:
$ cat test.txt | awk '{ print $2 " " $1 }'
2 1
4 3
In YSH:
$ cat test.txt | chop '$2 $1'
2 1
4 3
It's shorter!  chop is an alias for split-by (space=true, template='$2 $1')
With a template, for static parsing:
$ cat test.txt | chop (^"$2 $1")
2 1
4 3
It's shorter! With a block:
$ cat test.txt | chop { mkdir -v -p $2/$1 }
mkdir: created directory '2/1'
mkdir: created directory '4/3'
With no argument, it prints a table:
$ cat test.txt | chop
#.tsv8 $1 $2
       2  1
       4  3
$ cat test.txt | chop (names = :|a b|)
#.tsv8 a  b
       2  1
       4  3
Longer examples with split-by:
$ cat test.txt | split-by (space=true, template='$2 $1')
$ cat test.txt | split-by (space=true, template=^"$2 $1")
$ cat test.txt | split-by (space=true) { mkdir -v -p $2/$1 }
$ cat test.txt | split-by (space=true)
$ cat test.txt | split-by (space=true, names= :|a b|)
$ cat test.txt | split-by (space=true, names= :|a b|) {
    mkdir -v -p $a/$b
  }
With must-match:
$ var p = /<capture d+> s+ </capture d+>/
$ cat test.txt | must-match (p, template='$2 $1')
$ cat test.txt | must-match (p, template=^"$2 $1")
$ cat test.txt | must-match (p) { mkdir -v -p $2/$1 }
$ cat test.txt | must-match (p)
With names:
$ var p = /<capture d+ as a> s+ </capture d+ as b>/
$ cat test.txt | must-match (p, template='$b $a')
$ cat test.txt | must-match (p)
#.tsv8 a b
       2 1
       4 3
$ cat test.txt | must-match (p) {
    mkdir -v -p $a/$b
  }
Doing it in parallel:
$ cat test.txt | must-match --max-jobs 4 (p) {
    mkdir -v -p $a/$b
  }
$ cat table.txt size path 3 foo.txt 20 bar.jpg
$ R
t=read.table('table.txt', header=T) t size path 1 3 foo.txt 2 20 bar.jpg
We already saw this! Because we "compressed" awk and xargs together
What's not in the streams / awk example above:
BEGIN END - that can be separatewhen [$1 ~ /d+/] { }Shell, Awk, and Make Should be Combined (2016)
What is a Data Frame? (2018)
Sketches of YSH Features (June 2023) - can we express things in YSH?
Language Compositionality Test: J8 Lines
read --split
What is a Data Frame?
jq in jq thread
Old wiki pages:
We're doing all of these.
table with the ctx builtinread --split feedbackfind . -printf '%s %P\n'  - size and path
[{bytes: 123, path: "foo"}, {}, ...]
jqblocks value.Block - ^() and { }
expressions value.Expr - ^[] and 'compute [] where []'
eval (b, vars={}, positional=[])
Buffered for loop
for x in (io.stdin)"magic awk loop"
with chop { for <README.md *.py> { echo _line_num _line _filename $1 $2 } }
positional args $1 $2 $3
ctx builtin
value.Place
TODO:
split() like Python, not like shell IFS algorithm
string formatting ${bytes %.2f}
${bytes %.2f M} Megabytes
${bytes %.2f Mi} Mebibytes
${timestamp +'%Y-m-%d'} and strfitime
this is for
floating point %e %f %g and printf and strftime
This means we consider all these conversions
Design might seem very general, but we did make some hard choices.
push vs. pull
buffered vs. unbuffered, everything
List vs iterators
THESE ARE ALL THE SAME ALGORITHM. They just have different names.
should we also have: if-split-by ? In case there aren't enough columns?
They all take:
value.Exprfor the block arg, this applies:
-j 4
--max-jobs 4
--max-jobs $(cached-nproc)
--max-jobs $[_nproc - 1]
So we have this
echo begin
var d = {}
cat -- @files | split-by (ifs=IFS) {
  echo $2 $1
  call d->accum($1, $2)
}
echo end
But then how do we have conditionals:
Filter foo {  # does this define a proc?  Or a data structure
  split-by (ifs=IFS)  # is this possible?  We register the proc itself?
  config split-by (ifs=IFS)  # register it
  BEGIN {
    var d = {}
  }
  END {
    echo d.sum
  }
  when [$1 ~ /d+/] {
    setvar d.sum += $1
  }
}
table to constructActions:
table cat
table align / table tabify
table header (cols)
table slice (1, -1)   or (-1, -2) etc.
Subcommands
cols
types
attr units
Partial Parsing / Lazy Parsing - TSV8 is designed for this
# we only decode the columns that are necessary
cat myfile.tsv8 | table --by-col (&out, cols = :|bytes path|)
sort-tsv8 or join-tsv8 with novel algorithmsThis is sort of "expanding the scope" of the project, when we want to reduce scope.
But YSH has both tree-shaped JSON, and table-shaped TSV8, and jq is a nice bridge between them.
Streams of Trees (jq)
empty
this
this[]
=>
select()
a & b  # more than one
Four types of Data Languages:
Four types of query languaegs:
Considering columns and then rows:
dplyr: