Acceleration and Scaleout of Software Systems #
Shell scripting is used pervasively, partly due to its simplicity in combining components (commands) written in multiple languages. Unfortunately, this language-agnostic composition hinders automated parallelization and distribution, often forcing developers to manually rewrite shell programs (and their components) in other languages that support these features. We have built several systems that, combined, offer automated parallelization of Unix/Linux shell scripts—along with serious correctness and compatibility guarantees.
Papers: Our HotOS15 paper identifies the composition problem with today’s distributed computing software — that there’s no equivalent of an elegant and simple composition in modern distributed environments— and offers a vision for the future. Our EuroSys21 paper describes our PaSh system for parallelizing shell pipelines, and the corresponding ICFP21 paper formalizes the model at the core of PaSh and proves its parallelizing transformations correct. Our HotOS21 paper outlines a vision for the future of the shell, and our HotOS21 panel discusses future avenues for cross-discipline shell-related research.
Our recent OSDI22 paper tackles POSIX-compliant parallelization in the presence of fully dynamic behavior pervasive in the shell—via just-in-time compilation, intermixing evaluation and optimization of individual expressions. Our NSDI23 paper takes this to the distributed level, by offering automated POSIX-compliant scale-out across multiple computers. And our HotOS23 paper identifies speculative out-of-order shell-script execution as a key challenge — and sketches appropriate containment mechanisms that can be used to delay and reorder side effects.
Ongoing work: Ongoing research (1) develops an out-of-order execution engine for shell scripts, (2) proposes tackles automated generation of critical runtime components, through a combination of active learning and program synthesis, (3) proposes appropriate fault-tolerance support for distributed shell-script execution, and (4) develops appropriate type systems, formal models, and mathematical proofs targeting environments that support the composition of black-box software components.
Software:
- PaSh is an award-winning just-in-time parallelization system that forms the basis for all our shell-related research.
- DiSh is a system for automatically scaling out shell scripts to multiple computers.
- The try tool allows users to run a command and inspect its effects ahead of time.
Technology transition:
Our PaSh open-source work has joined and is available by the Linux Foundation and our try open-source primitive has received significant (over 5K GitHub stars).
Press:
- MIT News article on faster computing results without fear of errors
- Press release from the Linux Foundation
- Many discussions and third-party tutorials on PaSh and Try — e.g., ycombinator, i-programmer, medium, etc.