Composing shell scripts blog home

Posted Monday, 25-Apr-2022 by Ingo Karkat

context

I have a metaprogramming shell script that offers to update SUBJECT packages that are outdated, and memorizes the choices for future runs. The instantiating client has to pass the package manager command for package installations; the list of packages is then appended by the script. The corresponding line looks like this:

eval "${SOMETHING_UPDATE_COMMAND:?}" '"${selectedPackages[@]}"'

For npm (the Node.js package manager) that command is:

SOMETHING_UPDATE_COMMAND='npm --global install'

So far, so good. Unfortunately, npm does not offer logging of installations, but for system-wide installations, I'd like to have a persistent log (in /var/log/npm/npm-install.log), similar to what other package managers do. Let's approach this by piping the output to tee:

SOMETHING_UPDATE_COMMAND='npm --global install 2>&1 | tee --append /var/log/npm/npm-install.log'

This does not work because the script appends the list of package names, so these would now be passed to tee, not to npm!

place­holders

One common way to solve this would be via placeholders (commonly {}, as in find's -exec command {} +) that are replaced with the packages (or appended if no placeholder can be found — somehow these packages have to be handed over). Several of my commands use that approach. In fact, I've recently written a placeholderArguments command that can do this:

SOMETHING_UPDATE_COMMAND='placeholderArguments --command "npm --global install {} 2>&1 | tee --append /var/log/npm/npm-install.log" --'

Though functional, this solution just deals with the technical problem of the arguments not being where they need to be in a technical way. It does not lift the problem to a higher level of abstraction.

abstraction

Though tee is a simple and very common tool that almost everybody knows and understands, there's a higher-level purpose here: Executing a command and duplicating all of its output into a log file. So, how about a new generic command that encapsulates that?

withLoggingTo --help
Execute COMMAND with all of its output logged to LOGFILESPEC.

Usage: withLoggingTo LOGFILESPEC -c|--command "COMMANDLINE" [-c ...] | [--] SIMPLECOMMAND [...]

What kind of variations should this offer? Output may override an existing log file, or append to it. Output may be duplicated, or exclusively go into the log file. We may be interested in all output, or just standard output or standard error. That gives us these command-line options:

Usage: withLoggingTo LOGFILESPEC [-a|--append] [-t|--tee] [-1|--stdout|-2|--stderr] -c|--command "COMMANDLINE" [-c ...] | [--] SIMPLECOMMAND [...] [-?|-h|--help]

The implementation of withLoggingTo is mostly just boilerplate code around command-line argument parsing and the set up of the execution.

composition

With that syntax, our npm command stays at the end, and there's no problem with directly appending the package names to it. There's no pipeline any longer; the script receives a single command (withLoggingTo) that itself sets up the capturing, and then invokes the original npm command. In other words, the behavior of installing packages and logging the progress is composed of the separate concerns of logging and installing.

printf -v quotedLogFilespec %q "$logFilespec"
SOMETHING_UPDATE_COMMAND="withLoggingTo $quotedLogFilespec --append --tee --  npm --global install"

So far, all shown approaches are roughly equal in complexity. The {} placeholder is very straightforward from the client's perspective, but is arbitrary syntax that requires consultation of the man page to be clear about the semantics. And the implementation complicates the script itself. On the other hand, leveraging placeholderArguments moves the complexity from the script to its client. Like the second option, withLoggingTo also does not require modification of the script; but its huge benefit is that the functionality is encapsulated, can be tested independently, and can be reused by other clients.

feature 2

Right after that implementation I noticed another needed feature, and this is where the composition approach really begins to shine. The log lines should have the current date prepended, so that one can easily see when an installation did happen. I have a timestamp command that can be injected into a pipeline. A naive use in the simple pipeline that ignores the placeholder issue would look like this:

SOMETHING_UPDATE_COMMAND='npm --global install 2>&1 | timestamp --sortable - | tee --append /var/log/npm/npm-install.log'

But the bifurcation of tee is complicating things: The timestamp needs to be inserted after branching off file output; with the above, dates would also appear in the terminal! So the implementation has to replace tee's output file with a process substitution (>(timestamp ...)):

SOMETHING_UPDATE_COMMAND='npm --global install 2>&1 | tee >(timestamp --sortable - >> /var/log/npm/npm-install.log)'

That's now becoming unreadable very fast, and we haven't even added the handling of the argument placeholder, which we've already characterized as being technical and not showing the real intentions!

abstraction 2

Using the withLoggingTo abstraction, we can hide all that ugliness there by offering a --timestamp command-line option that then uses a pipeline to the timestamp command instead of a simple redirect into the log file. By forwarding all of timestamp's command-line options, we don't lose any of its functionality, but add a bit of coupling between the two. In object-oriented lingo, I would refer to this as a mixin. The command-line options are the public API, so this should be pretty stable.

With that, the new feature is handled by just adding the corresponding command-line flags; nice and clean:

printf -v quotedLogFilespec %q "$logFilespec"
SOMETHING_UPDATE_COMMAND="withLoggingTo $quotedLogFilespec --append --tee --timestamp --sortable -- npm --global install"

going further?

This could be taken even further through a withOutputToSink generalization of withLoggingTo (yes, I quickly implemented that one, too — never let a good abstraction go to waste); instead of a LOGFILE, it would take a --sink-exec SINK-COMMAND \; (which in this case would be withLoggingTo $quotedLogFilespec -- timestamp --sortable - — directing (now exclusively) the timestamped output into the log file).

printf -v quotedLogFilespec %q "$logFilespec"
SOMETHING_UPDATE_COMMAND="withOutputToSink --tee \\
    --sink-exec withLoggingTo $quotedLogFilespec --append -- timestamp --sortable - \; \\
    -- npm --global install"

Though we would not need to extend withLoggingTo with the --timestamp functionality then, this looks much more messy for the client. Besides, I don't mind that withLoggingTo is coupled to timestamp, as that looks like a natural fit. It would be another matter if there were needs for other manipulations (e.g. uppercasing (silly example) or filtering of lines (maybe, although I see a higher probability of filtering the output to the terminal rather than omitting stuff from the log file, and that can be done by appending another pipeline step)). So I leave it at that (for now).

conclusion

Scripting languages (and especially "procedural" ones like Bash) are often quickly dismissed as inferior, just to be used for quick-and-dirty one-time tasks, because results are messy and hard to maintain. It doesn't have to be that way, though. Just as you can write spaghetti code with objects, you can do clean abstractions with shell scripts, too. It may be a bit unusual, but if you treat each script as an object and its command-line options as its public API, and write some helper commands (following the approach I've tried to show in this post), it can work out nicely. Of course, the performance is nowhere near "real" object-oriented languages (like Java), and you're mostly limited to simple line-based textual data (flowing through the pipeline), but you can exploratorily build up functionality on the command-line, and each command / object extends your vocabulary and capabilities in the command-line. At the very least, the low barrier of entry (just a terminal and text editor will do) provides a fun environment that's available everywhere (just like in the age of 8-bit home computers I grew up with).

Ingo Karkat, 25-Apr-2022

blog comments powered by Disqus