Ingo Karkat - My use of command-line DSLs

My use of command-line DSLs

Posted Saturday, 14-Dec-2024 by Ingo Karkat

Wintertime is a good time to reflect on the past year and to clean up and organize things. My open source projects needed some love. I had recently published a couple more projects and had noticed that several repetitive manual steps are still required for that. A checklist like I had used for my Vim plugins is cumbersome, and in the heat of the moment it's easy to forget a step or two.

Context-sensitive extension points

I still use the original hub CLI for most command-line GitHub interactions (there's now an official gh CLI as well), and I have written a wrapper over its hub create command. One thing that this wrapper does is automatically generating the description of the Git repository (unless it's explicitly passed via -d <description>). This is the code for that extension point:

hub-create

# Generate / extract description via the supplied command if not provided.
[ ${#createDescriptionArgs[@]} -eq 0 ] \
    && [ -n "$HUB_CREATE_DESCRIPTION_GENERATOR" ] \
    && generatedDescription="$(eval "$HUB_CREATE_DESCRIPTION_GENERATOR")" \
    && [ -n "$generatedDescription" ] \
    && createDescriptionArgs=(--description "$generatedDescription")

I currently use one extractor for my Vim plugins:

~/.vim/pack/ingo/.lbashcrc

# Extract project description for "hub create" from the help file (according to my style) if not provided.
_hub_create_description_from_help()
{
	local root; root="$(git root)" \
	&& typeset -a helpFiles=("$root"/doc/*.txt) \
	&& [ ${#helpFiles[@]} -eq 1 -a -r "${helpFiles[0]}" ] \
	&& sed -n -e '1{ s/^\*[^[:space:]]\+\*[[:space:]]\+\(.\+\)$/\1/; p; q; }' "${helpFiles[0]}"
}
functionexport _hub_create_description_from_help
export HUB_CREATE_DESCRIPTION_GENERATOR=_hub_create_description_from_help

And another one for my scripting projects, where the description is found at the top of the readme file.

~/bin/.lbashcrc

# Extract project description for "hub create" from the readme file (according to my style) if not provided.
_hub_create_description_from_readme()
{
	local root; root="$(git root)" \
	&& [ -r "${root}/README.md" ] \
	&& sed -n -e '3{ s/^_\(.\+\)_$/\1/; p; q; }' "${root}/README.md"
}
functionexport _hub_create_description_from_readme
export HUB_CREATE_DESCRIPTION_GENERATOR=_hub_create_description_from_readme

The .lbashcrc files are automatically sourced when I enter the directory (or a subdirectory) via cd in an interactive shell.

In order to automatically apply settings to a newly created GitHub repository, a one-line change at the end of the hub-create wrapper enables that: eval "$HUB_CREATE_POST_COMMAND". The environment variable again is populated by the per-directory configuration:

…/.lbashcrc

_hubPostCommand='hub-reposettings'
case "$1" in
    enter) export HUB_CREATE_POST_COMMAND; commandSequenceMunge HUB_CREATE_POST_COMMAND "$_hubPostCommand";;
    leave) commandSequenceDrop strict HUB_CREATE_POST_COMMAND "$_hubPostCommand";;
esac
unset _hubPostCommand

Because I already have multiple per-directory configurations and it's more likely that the needed settings for those will remain identical, I extracted them to a separate hub-reposettings command — that's DRY.

repository iteration

Now that will take care of any future projects, but what about the existing ones? In order to ensure that all settings have been applied I need to re-execute the reposettings command. Unfortunately, my hub-labels command that comprises one half of it was written with the use case of establishing my preferred GitHub issue labels on new repositories, so it clumsily rewrites everything, and logs each and every tag, even if nothing changed. That's fine for a (re-)run on a single repo, but the huge amount of output isn't ideal for iterating over many repos at once. After all, I'm interested in how many repos I had missed; maybe I actually observed my checklist in most cases?! Adding a comparison with existing data wasn't a big deal. (So why didn't I add this already when implementing it? Well, by now the command has proven its worth; back then I wasn't so sure. Maybe I didn't have enough time back then, and adding this requirement would have made me postpone the implementation altogether. It was better to start with an MVP; by following clean coding, refactoring this later is easy to do.)

The crux of the matter is the repository iteration. Others have had the same idea; e.g. there's git-bulk, which requires manual registration of working copies. My approach in git-wcdo-core relies on a passed list of working copies (and the corresponding Git or shell commands to execute in each of them). git-wcs-in-dir-do is a specialization that takes parent directories and by itself iterates over contained working copies. And git-wcs-in-dir-do-command is a small metaprogramming helper that takes the original iteration script (which is important for mentioning the real command in usage help and for recursive invocations) and a description of what is being iterated as additional parameters, provided by a concrete iteration command.

That iterator for my scripting projects is called Unixhome-ingo. I have a three-level structure of bin/ and lib/ directories, which contain different categories (some of which are third party tools also as Git repositories or manually extracted tarballs; those need to be skipped), and those then contain the individual working copies. The wcs-in-dir-do takes care of the third level; the code at the beginning of the iterator filters the second level:

Unixhome-ingo

#!/bin/bash

readonly scriptName=$(basename -- "${BASH_SOURCE[0]}")

typeset -a rootDirspecs=()
for dirspec in ~/Unixhome/{bin,lib}/*/
do
    [[ "$dirspec" =~ /z*(manual|originals)(-[^/]+)?/$ ]] || rootDirspecs+=("$dirspec")
done

GIT_DOEXTENSIONS_EXCLUDE_FORKS=t \
GIT_WCDO_PROGRESS_WHAT='Unixhome' \
GIT_WCDO_STORE_SUBJECT="$scriptName" \
    exec git-wcs-in-dir-do-command "$scriptName" 'my Unixhome projects' \
	--skip-symlinks --progress addendum \
	"${rootDirspecs[@]}" \
	-- "$@"

filtering

The iteration commands support a predicate command that omits working copies where that command fails. There's also a set of built-in filters like --dirty and --with[out]-remote and special synthesized Git commands that make sense in the context of iteration (for example, dirty-sh is a command that uses the --dirty filter to only consider working copies with changes, and opens an interactive shell in each, one after the other).

Applying missing labels is done via this command:

$ Unixhome-ingo --with-remote --status-99-is-noop reposettings
nowrap:
Added label "abandoned"
Added label "accepted"
Added label "available"
Added label "blocked"
Added label "documentation"
Added label "documentation candidate"
Updated label "duplicate"
Updated label "enhancement"
Added label "potential enhancement"
Added label "faq"
Updated label "good first issue"
Updated label "help wanted"
Added label "help needed"
Added label "information needed"
Updated label "invalid"
Added label "offtopic"
Updated label "question"
Added label "review needed"
Added label "troubleshooting"
Updated label "wontfix"

smartless:
[...]

Working copies where no update is necessary do not produce any output (thanks to the improved update handling) and therefore aren't mentioned here. I can watch the iteration in real-time, and see just what I had missed.

reuse #1

With that straightened out, I could continue with some real improvements. How about running the automated tests on each push in a GitHub workflow; how many repositories would be eligible? To answer that, I need to filter for working copies that have already been published on GitHub: --with-remote, and they need to have Bats tests, which I can detect with a custom predicate that checks for files with the corresponding file extension anywhere in the working copy:

$ Unixhome-ingo --with-remote --predicate-command 'hasMatchingFile -r "*.bats" tests/'

I found some repos where adding the GitHub workflow was easy; later I was ready to tackle more challenging candidates. I re-ran my query with another added predicate filtering out those repos that already had the workflow:

$ Unixhome-ingo --with-remote --predicate-command 'hasMatchingFile -r "*.bats" tests/' --predicate-command '[ ! -d .github/workflows/ ]'

reuse #2

Just a day later, I've extended my VcsMessageRecall.vim plugin to exclude Git commit message trailers (that I've recently started using to automatically record system and tool version information). Previous messages still had those recorded and needed to be cleaned up. I copied a few messages (some with, a few without trailers) to a temp directory, and executed my git-trailer-parse command first on a single file, to verify that I got the invocation right:

$ cat ~/tmp/git/msg-20241214_133148 | git-trailer-parse --remove
gitcheats-vimdev: Adapt: labels and reposettings are now hooked into hub-create

No need to do these manually any longer.

No more trailers; great! As the git-trailer-parse command is a classic filter command that reads from standard input and writes to standard output, my pipethrough can be used to process all passed files individually, writing the cleaned contents back to the original record of the commit message. I wrote pipethrough when I started using the Unix command-line in earnest; it's already 20 years old!

$ pipethrough --piped --message-on-change 'Cleaned %q' -- git-trailer-parse --remove -- ~/tmp/git/msg-*
Cleaned /home/inkarkat/tmp/git/msg-20241212_112213
Cleaned /home/inkarkat/tmp/git/msg-20241212_115645
Cleaned /home/inkarkat/tmp/git/msg-20241212_115845
Cleaned /home/inkarkat/tmp/git/msg-20241212_145242
Cleaned /home/inkarkat/tmp/git/msg-20241214_132622
Cleaned /home/inkarkat/tmp/git/msg-20241214_133148

Okay, now I feel confident that I can do bulk-processing of all of my working copies. (Nonetheless, I'm doing a quick sync of my Unixhome data with the central repository, so that I have the possibility to roll back any unforeseen catastrophic changes.) I limit the file glob to messages from 2024 to avoid needless touching of older messages. Because my iterators change directory to the working copy root dir, I can use relative addressing of the commit message store (in .git/commit-msgs/). Because of the file globbing, I need interpretation as a full command-line, so the --command argument has to be used. (This is just syntactic sugar over --exec sh -c "pipethrough ...", but immensely useful nonetheless.)

$ Unixhome-ingo --command "pipethrough --piped --message-on-change 'Cleaned %q' -- git-trailer-parse --remove -- .git/commit-msgs/msg-2024*"
smartless:
Cleaned .git/commit-msgs/msg-20241112_072427

addOrUpdate:
Cleaned .git/commit-msgs/msg-20241213_125407
Cleaned .git/commit-msgs/msg-20241213_185342

browser:
Cleaned .git/commit-msgs/msg-20241204_190500
[...]

Voilá! Again, I have concise feedback of the progress without distracting unrelated output. Despite its power, the command still is very readable; it needs to be deconstructed from the inside out: Remove git trailers » from files modified in-place » originating from the glob » executed in all of my Unixhome projects.

conclusion

These queries and commands feel like SQL: You start with a simple one, and then refine and filter until you arrive at just the data you're interested in. The shell REPL with its history makes this very easy to do.

For that, the right abstractions need to be in place. I've needed iteration over my scripting and Vim projects so often that I've created corresponding Unixhome-ingo and vimrc-ingo commands (and a handful more). The underlying metaprogramming commands like git-wcs-in-dir-do offer a powerful and fluent API; yet defining an iteration wrapper is a matter of a few lines of code. Getting that right wasn't easy, but it's been well worth it.

Convenient built-in filters (like --dirty) make it easy to use, the possibility to pass custom predicate commands allow for flexibility and advanced uses. The iterators accept Git commands (built-in and any of my extensions), shell commands, and can also open an interactive shell in each working copy. That covers all use cases, from ad-hoc one-time uses to utility scripts that non-interactively use the iterations. (For example, I've put a list of pending changes and pull request reviews I'm assigned to on my desktop background. Depending on the type of system, these cover either my work repositories or my personal open source projects.)

Ingo Karkat, 14-Dec-2024

ingo's blog is licensed under Attribution-ShareAlike 4.0 International