| 1 | 2022-04-19
|
| 2 |
|
| 3 |
|
| 4 | c. Proposal Summary/Scope of Work
|
| 5 | Provide a short summary of the work being proposed (maximum of 500 words)
|
| 6 |
|
| 7 | The Unix shell is a central user interface and glue language in all kinds of
|
| 8 | scientific computing, particularly in bioinformatics.
|
| 9 |
|
| 10 | Oil is a new open source Unix shell. It's meant to be our upgrade path from GNU
|
| 11 | bash, the most popular shell in the world. Home page: https://www.oilshell.org/
|
| 12 |
|
| 13 | It runs your existing scripts, and allows you to upgrade them to the new Oil
|
| 14 | language, which is designed to be familiar to Python and JavaScript users.
|
| 15 |
|
| 16 | In the last few years, we've released a correct but slow implementation dozens
|
| 17 | of times, and gotten regular feedback from users.
|
| 18 |
|
| 19 | Now we need a COMPILER ENGINEER to finish semi-automatically translating the
|
| 20 | code to fast C++. This plot is a quick way to see this:
|
| 21 | https://www.oilshell.org/blog/2022/03/spec-test-history.png
|
| 22 |
|
| 23 | Blog post with this plot: https://www.oilshell.org/blog/2022/03/middle-out.html
|
| 24 |
|
| 25 | Roughly speaking, we'll have a competitive replacement for / upgrade to bash
|
| 26 | when the red line meets the blue line!
|
| 27 |
|
| 28 | So the work is already more than half done, and I would consider it low risk /
|
| 29 | high reward. Addressing the speed issue will allow us to aggressively add new
|
| 30 | features and polish the documentation.
|
| 31 |
|
| 32 | Our FAQ has over 178K views, having been featured in many places like Hacker
|
| 33 | News: https://www.oilshell.org/blog/2021/01/why-a-new-shell.html
|
| 34 |
|
| 35 | -----
|
| 36 |
|
| 37 | I've drafted the job requirements here:
|
| 38 | https://github.com/oilshell/oil/wiki/Compiler-Engineer-Job
|
| 39 |
|
| 40 | I will use my professional network (having worked at Google and EA) to find the
|
| 41 | compiler engineer, who will be skilled in compilers, C++ and Python.
|
| 42 |
|
| 43 | Python creator Guido van Rossum knows about Oil:
|
| 44 |
|
| 45 | https://twitter.com/gvanrossum/status/995862193609551872
|
| 46 |
|
| 47 | "Amazing. A bash implementation in Python, by my ex-coworker (at Google) Andy
|
| 48 | Chu"
|
| 49 |
|
| 50 | A few years ago he introduced me to 2 compiler engineers working at Dropbox,
|
| 51 | who may be good candidates for the job. However they are highly employed and
|
| 52 | would need to be compensated.
|
| 53 |
|
| 54 |
|
| 55 |
|
| 56 |
|
| 57 |
|
| 58 |
|
| 59 | d. Value to Biomedical Users
|
| 60 | Described the expected value the proposed work to the biomedical research community (maximum of 250 words)
|
| 61 |
|
| 62 | If batch computation on Unix systems is a bottleneck in your lab's "scientific
|
| 63 | discovery loop", then a better Unix shell will make you more productive! You
|
| 64 | can run more experiments with less staff.
|
| 65 |
|
| 66 | Oil treats Unix shell like a real programming language, rather than a mystery
|
| 67 | handed off from one researcher to the next.
|
| 68 |
|
| 69 | Moreover, the software that underlies published experiments is heterogeneous: a
|
| 70 | mix of programs written in different languages, at different times, by
|
| 71 | different people.
|
| 72 |
|
| 73 | The Unix shell glues it all together and provides an interactive interface.
|
| 74 | It's also a powerful interface for using remote computers.
|
| 75 |
|
| 76 | But shell is showing its age and has been neglected by industry and academia.
|
| 77 | It has fundamental flaws like a lack of robust error handling, which lead to
|
| 78 | productivity loss, expensive training, and even erroneous scientific results.
|
| 79 |
|
| 80 | Oil fixes these problems, and adds much-needed features that will be familiar
|
| 81 | to Python, JavaScript, and R users.
|
| 82 |
|
| 83 | Four Features That Justify a New Unix Shell:
|
| 84 | http://www.oilshell.org/blog/2020/10/osh-features.html
|
| 85 |
|
| 86 | A Tour of the Oil Language:
|
| 87 | https://www.oilshell.org/release/latest/doc/oil-language-tour.html
|
| 88 |
|
| 89 | ----
|
| 90 |
|
| 91 | Similar sentiments from a third party at https://datacarpentry.org/2015-11-04-ACUNS/shell-intro/
|
| 92 |
|
| 93 | - For most bioinformatics tools, you have to use the shell. There is no
|
| 94 | graphical interface. If you want to work in metagenomics or genomics you're
|
| 95 | going to need to use the shell.
|
| 96 | - The shell gives you power ... When you need to do things tens to hundreds of
|
| 97 | times, knowing how to use the shell is transformative.
|
| 98 | - To use remote computers or cloud computing, you need to use the shell.
|
| 99 |
|
| 100 |
|
| 101 |
|
| 102 | f. Landscape Analysis
|
| 103 |
|
| 104 | Briefly describe the other software tools (either proprietary or open source)
|
| 105 | that the audience for this proposal primarily uses. How do the software
|
| 106 | project(s) in this proposal compare to these other tools in terms of user base
|
| 107 | size, usage, and maturity? How do existing tools and the project(s) in this
|
| 108 | proposal interact? (maximum of 250 words)
|
| 109 |
|
| 110 |
|
| 111 | I made a list of alternative shells:
|
| 112 | https://github.com/oilshell/oil/wiki/Alternative-Shells
|
| 113 |
|
| 114 | Oil is the ONLY shell that is compatible with bash. This effort took years,
|
| 115 | and the work is largely DONE, and documented extensively on the blog. It runs
|
| 116 | thousands of lines of unmodified bash scripts.
|
| 117 |
|
| 118 | Compatibility is important because users (including scientific users) don't
|
| 119 | have time to rewrite working shell scripts in a different language. It's
|
| 120 | expensive, just as it's expensive to rewrite C code in another language.
|
| 121 |
|
| 122 | But it's easy to run existing code under a new shell, and desirable if it
|
| 123 | provides better error handling, debugging, and new features.
|
| 124 |
|
| 125 | ------
|
| 126 |
|
| 127 | Scientific workflow languages like CWL, WDL, and Snakemake are increasingly
|
| 128 | popular [1]. However, they generally wrap Unix shell rather than replace it.
|
| 129 | So shell is complementary to these higher level tools.
|
| 130 |
|
| 131 | There are also many such languages, and each one may be especially suited for a
|
| 132 | particular HPC problem domain.
|
| 133 |
|
| 134 | In contrast, Unix shell is ubiquitous in all scientific computing domains, in
|
| 135 | both academia and industry. For example, here are some organizations that are
|
| 136 | teaching shell (found through Google):
|
| 137 |
|
| 138 | https://curriculumfellows.hms.harvard.edu/classes/introduction-command-line-interface-shell-bash-unix-linux
|
| 139 |
|
| 140 | http://chemlabs.princeton.edu/researchcomputing/wp-content/uploads/sites/21/2018/09/hpc-getting-started-chem-workshop.pdf
|
| 141 |
|
| 142 | https://bioinformatics.uconn.edu/unix-basics/#
|
| 143 |
|
| 144 | https://www.melbournebioinformatics.org.au/tutorials/tutorials/unix/unix/
|
| 145 |
|
| 146 | http://williamslab.bscb.cornell.edu/?page_id=235
|
| 147 |
|
| 148 | Shell is also widely used in machine learning. It has the same flavor of
|
| 149 | gluing together disparate data sets and tools that you find in the natural
|
| 150 | sciences.
|
| 151 |
|
| 152 | -----
|
| 153 |
|
| 154 | [1] "A review of bioinformatic pipeline frameworks" https://academic.oup.com/bib/article/18/3/530/2562749
|
| 155 |
|