Fortuitous find of a compatibility bug leads to a new tool blog home

Posted Friday, 08-Apr-2022 by Ingo Karkat

This actually happened one day before my previously told excursion in tool writing, and is another nice story about what unexpected turns regularly happen (at least to me) during software development. These events wreak havoc on your plans, but if you're able to engage them properly, they can also yield great serendipitous improvements to your tools, capabilities, and knowledge. Read on if you want to know how to make the best out of unforeseen problems…

prelude

It started out pretty normal: I had an idea for a tool that would parse my logbooks for the tracked number of hours spent. For that, I needed a logNormalizeDate filter that would convert the used human-readable date format into a standard one. The date command can do that, but I need to invoke it for a certain (tab-separated) field in all input lines. I already have a fieldMap wrapper around AWK, and "just" wanted to check whether it needs extension. (I imagined adding support for a coprocess to avoid repeated invocation via memoization.)

The tool has test coverage, so first thing I did was running the tests. And… some failed! In particular, this assertion about the creation of a backup file when doing the replacement in-place.

the bug

With such good test coverage, it wasn't difficult to see that AWK's inplace editing was broken, but only if a backup extension is specified. I had found that solution (it unfortunately isn't as easy as passing a -i.bak argument as with sed) in a StackOverflow answer. Looking at the current GNU AWK manual, the variable for the backup suffix has moved into a namespace, but the old (global) variable is still used as a fallback:

# Please set inplace::suffix to make a backup copy.  For example, you may
# want to set inplace::suffix to .bak on the command line or in a BEGIN rule.

# Before there were namespaces in gawk, this extension used
# INPLACE_SUFFIX as the variable for making backup copies. We allow this
# too, so that any code that used the previous version continues to work.

But not for me! The discrepancy was easy to see, as it's plain AWK code in the extension script:

So, what likely happened is that first the switch to namespaces was implemented, then later someone realized that a fallback to the old variable would be required for backwards compatibility, but somehow I got an AWK version from in between those two changes in my stable long-term support Ubuntu distribution. I had initially developed (and last tested) my script under Ubuntu 16.04 (a lot older and therefore likely to have different major AWK version), so that checks out.

$ gawk --version
GNU Awk 5.0.1, API: 2.0 (GNU MPFR 4.0.2, GNU MP 6.2.0)
Copyright (C) 1989-2022, 1991-2019 Free Software Foundation.

In order to investigate how many versions are affected by the gap in backwards compatibility, I found some forks on GitHub, but these all didn't have tags. The original Git repository is hosted by the GNU folks, so let's check that out locally in order to dive into the change history:

$ git clone https://git.savannah.gnu.org/git/gawk.git
[...]
$ git lg awklib/eg/lib/inplace.awk
* 8ff0d3a5 (Arnold D. Robbins, 2 years, 9 months ago)  Add backwards compatibility to inplace extension, update doc and tests.
* 663aff4a (Arnold D. Robbins, 3 years, 3 months ago)  Squashed merge of feature/namespaces. Add code and doc.
* 167b3a79 (Arnold D. Robbins, 4 years, 8 months ago)  Apply GPL to inplace.awk.
* 6f10e610 (Arnold D. Robbins, 4 years, 10 months ago)  Expand tab characters in the doc.
* 12e3cbe4 (Andrew J. Schorr, 7 years ago)  For the inplace extension, add inplace variable to control whether it's active.
* e0c1194c (Arnold D. Robbins, 7 years ago)  Bug fix to inplace extension and doc updates.
* 81df0ef6 (Arnold D. Robbins, 9 years ago)  Minor edits to inplace extension doc.
* 1abfe5e8 (Andrew J. Schorr, 9 years ago)  Add inplace file editing extension.

Dang, nothing is tagged! But the project does use (lightweight) tags:

$ git tag -l | grep gawk-5
gawk-5.0.0
gawk-5.0.1
gawk-5.1.0
gawk-5.1.0-docs
gawk-5.1.1

excursion

That means that development happens on a different branch than the tagging. The tags could be included via $ git lg --full-history awklib/eg/lib/inplace.awk, but there's an awful lot of merges that totally drowns out the few actual commits then.

Instead, what if we had a Git extension that can filter the one-line log format for commits that have a tag attached:

$ git lgtagged --color=always | head -n 6
2b1fffef (Arnold D. Robbins, 5 months ago)  (tag: gawk-5.1.1) Make 5.1.1 tar ball and diff file.
d5160ca6 (Arnold D. Robbins, 1 year, 11 months ago)  (tag: gawk-5.1.0-docs) Update build aux files.
40a6d096 (Arnold D. Robbins, 2 years ago)  (tag: gawk-5.1.0) Relase 5.1.0 tarball made.
ef83f3a1 (Arnold D. Robbins, 2 years, 10 months ago)  (tag: gawk-5.0.1) Make 5.0.1 tarball.
b4e4ba46 (Arnold D. Robbins, 3 years ago)  (tag: gawk-5.0.0) Files revised after 'make distcheck'.
bd8a8ad0 (Arnold D. Robbins, 4 years, 1 month ago)  (tag: gawk-4.2.1) Make 4.2.1 tarball.

On top of that, we could then write another extension that joins the log for a particular file under version control (in Git lingo: <path>) with that. It could show all tagged commits in between commits for the file, but there still could be a lot of tags between changes to a file. So let's default to just showing one tag before and after a commit that changes a file (and enable the full tag output via a command-line option):

$ git lgandtagged awklib/eg/lib/inplace.awk
2b1fffef (Arnold D. Robbins, 5 months ago)  (tag: gawk-5.1.1) Make 5.1.1 tar ball and diff file. [...]
40a6d096 (Arnold D. Robbins, 2 years ago)  (tag: gawk-5.1.0) Relase 5.1.0 tarball made.
8ff0d3a5 (Arnold D. Robbins, 2 years, 9 months ago)  Add backwards compatibility to inplace extension, update doc and tests.
ef83f3a1 (Arnold D. Robbins, 2 years, 10 months ago)  (tag: gawk-5.0.1) Make 5.0.1 tarball.
b4e4ba46 (Arnold D. Robbins, 3 years ago)  (tag: gawk-5.0.0) Files revised after 'make distcheck'.
663aff4a (Arnold D. Robbins, 3 years, 3 months ago)  Squashed merge of feature/namespaces. Add code and doc.
bd8a8ad0 (Arnold D. Robbins, 4 years, 1 month ago)  (tag: gawk-4.2.1) Make 4.2.1 tarball.
acab04b6 (Arnold D. Robbins, 4 years, 6 months ago)  (tag: gawk-4.2.0) Make 4.2.0 tarball.
167b3a79 (Arnold D. Robbins, 4 years, 8 months ago)  Apply GPL to inplace.awk.
6f10e610 (Arnold D. Robbins, 4 years, 10 months ago)  Expand tab characters in the doc.
50487de5 (Arnold D. Robbins, 6 years ago)  (tag: gawk-4.1.4) Make 4.1.4 release tar ball.
12e3cbe4 (Andrew J. Schorr, 7 years ago)  For the inplace extension, add inplace variable to control whether it's active.
7100f51d (Arnold D. Robbins, 7 years ago)  (tag: gawk-4.1.3) Use modern @image, fix a .txt image file.
ff9ab7fc (Arnold D. Robbins, 7 years ago)  (tag: gawk-4.1.2) Make 4.1.2 release.
e0c1194c (Arnold D. Robbins, 7 years ago)  Bug fix to inplace extension and doc updates.
db6a69ba (Arnold D. Robbins, 7 years ago)  (tag: eap4-last-fix-3) Minor doc fix. [...]
78193b5c (Arnold D. Robbins, 9 years ago)  (tag: gawk-4.1.0) Gawk 4.1.0 release.
81df0ef6 (Arnold D. Robbins, 9 years ago)  Minor edits to inplace extension doc.
1abfe5e8 (Andrew J. Schorr, 9 years ago)  Add inplace file editing extension.

In reality, I've first implemented the latter and then discovered the usefulness of the former. Under the hood, it's just a single git-onelinelog-and-decorations command. The special case is activated via a --only-decorations command-line flag:

git-onelinelog-and-decorations --help
One-line author, date, tags and commit summary for files in <path>, plus any ref
names, also of commits not covered.
This can be useful if tagging and development are done on separate branches, but
you want to find out which commits that changed a file were in a release.

Usage: git-onelinelog-and-decorations [--full-decorations] [<log-options>] [<revision range>] [--] <path> [...] [-?|-h|--help]

One-line author, date, tags and commit summary for commits that are decorated
with any ref names. Other (not tagged, not tip of a branch) commits are omitted.

Usage: git-onelinelog-and-decorations --only-decorations [<log-options>] [<revision range>] [--] [<path> ...] [-?|-h|--help]

This long and descriptive command is not meant for direct invocation; rather, it is used to define a lgandrefs alias (lg is my alias for the one-line log format, refs for references; I would have loved to use + in between, but that's not a valid character for a Git alias, so it has to be and). Along with that alias, several variants add git log's --decorate-refs filter to limit the considered refs to:

analysis

40a6d096 (Arnold D. Robbins, 2 years ago)  (tag: gawk-5.1.0) Relase 5.1.0 tarball made.
8ff0d3a5 (Arnold D. Robbins, 2 years, 9 months ago)  Add backwards compatibility to inplace extension, update doc and tests.
ef83f3a1 (Arnold D. Robbins, 2 years, 10 months ago)  (tag: gawk-5.0.1) Make 5.0.1 tarball.
b4e4ba46 (Arnold D. Robbins, 3 years ago)  (tag: gawk-5.0.0) Files revised after 'make distcheck'.
663aff4a (Arnold D. Robbins, 3 years, 3 months ago)  Squashed merge of feature/namespaces. Add code and doc.
bd8a8ad0 (Arnold D. Robbins, 4 years, 1 month ago)  (tag: gawk-4.2.1) Make 4.2.1 tarball.

With the change history so nicely laid out, it's easy to see what happened. After the 4.2.1 release, the namespace feature got introduced for the 5.x series, and landed in 5.0.0. Then, 5.0.1 got released, and that made it into Ubuntu 20.04. The backwards-compatibility was only introduced after that, and is available with 5.1.0 only.
So, there are two versions out there that break old clients that were written against 4.x. Too bad. Had only the AWK developers noticed the compatibility issue earlier!

fix

With the problem existing in the long-term release of Ubuntu, I have to implement a workaround. Fortunately, this is as easy as passing both old and new variable name. I've adapted my scripts accordingly, and all is well now.

conclusion

It's never the right time for making these extensions; you always have something else to do at the moment you run into them. If you allow yourself the time to deviate and improve your tools, you'll reap these benefits:

  1. the clear requirement is right in front of you — later it will be much harder to remember what is needed
  2. that problem can be directly used to validate your extension (ideally you'll also write automated tests based on it)
  3. solving the problem at hand without the extension also takes time — if you build the tool now, its amortization has already started

If you can't affort to implement the complete extension right now, maybe a crude sketch will do. That still lowers the barrier to improve on it later — at least I feel more uncomfortable with such half-finished work still lying around, whereas a task in my todo list is soon forgotten and sink ever deeping into the pile.

Ingo Karkat, 08-Apr-2022

ingo's blog is licensed under Attribution-ShareAlike 4.0 International

blog comments powered by Disqus