Good documentation is priceless blog home

Posted Friday, 29-Mar-2024 by Ingo Karkat

what I need

I needed to filter out some inconsequential warnings from our application's log file. These sounded like severe security issues, but the application itself wasn't aware that security had been turned off on their end by design, because we have a proxy in front of it that dealt with security (similar to SSL offloading). To avoid the confusion, these warnings should be suppressed — we're looking at all errors and warnings as part of our release process.

Luckily, there's RegexFilter in Log4j 2.x for that1. That link to the manual isn't in the top search results, rather a handful of Stack Overflow posts where users struggle with various details of the configuration. The XML configuration is rather tame, and I just have five fixed log messages to filter out, no complex pattern matching required. I just needed to know how to escape the literal text and how to anchor the match so that only the exact full messages are matched.

what I found

Unfortunately, the manual only provides this:

Parameter NameTypeDescription
regexStringThe regular expression.

Okay, since we're in the Java ecosystem, one can guess that this is a Java regular expression, not Posix or PCRE. But is it anchored to the begin and end — many examples add .* ... .* around the pattern? More information is needed.

With another search one soon finds the JavaDocs, and the createFilter() method seems to cover the configuration. Unfortunately, it doesn't provide more information on the regular expression syntax:

Parameters:
regex - The regular expression to match.

So I still have to try that out. But the next parameter is interesting for my use case of literal matching:

patternFlags - An array of Strings where each String is a Pattern.compile(String, int) compilation flag.

With a LITERAL flag, I should be able to just paste my messages without any manual escaping! This would really help with readability and also increases the probability that a literal text search of the warning would find the configuration, something that a future maintainer will do when that warning suppression has to be adapted or extended.

experi­ments

So, I'll have my experiments cut out for me:

  1. Is the regular expression anchored (so .* needs to be added to the beginning and end if only a part of the warning message is given), or do I need to add ^ and $ anchors?
  2. Try out the literal flag, and also Java's quotation via \Q...\E.

As I have 5 warning lines, I can test multiple combinations at once. This helps as the warnings are only issued once on application startup, so I need to shut down and restart the Docker Compose environment for each test, which takes a minute.

Except that… nothing worked the way I expected, I got very confusing behavior, the results seemed to depend on the ordering of the filters. 🤔

learnings

To cut short a long story, after a dozen restarts and a few more web searches, I had learned that:

discussion

Log4j is critical infrastructure, used millions of times all over the planet, every day. Yet it's an open source project, mostly staffed by volunteers in their free time. With a lot of demand (bugs, features, troubleshooting), things fall under the table. Currently, there are 858 open issues, the oldest at 14 years and the most recent one submitted four months ago. Despite heroic efforts, that's not sustainable. The feature started with a very nice and robust implementation; someone should have kept an eye on the whole thing and made sure that it was completed (or not merged at all). Alternatively, there should have been a critical discussion about the perceived value of such a feature — as Java supports embedded flags (for everything but the obscure CANON_EQ and LITERAL, which can use \Q...\E) already. Just one sentence in the manual hinting at (and linking to) those flags might have been enough.

Suppressing some warnings in a log isn't a crucial development activity. This should be quick and easy, but here, due to hidden complexities both technical and related to open source projects, the usual strategy of satisficing failed. From the user's perspective, the (naturally limited) effort was spent on the wrong things. Tragically, the initial part of the implementation was done so well (and already integrated into the library without marking it as "experimental" or "incomplete") that it fooled not just me into trying it out, also because the manual itself is dead silent on these common use cases. A few extra sentences at the beginning of the filter documentation would have prevented frustration and some wasted hours:

For case-insensitive, multi-line, or other special matching, use Java's embedded flags (link). Likewise, literal matching can be done by wrapping in \Q...\E. The regexp matches any text contained in the full log message; you need to explicitly anchor to begin (^) and/or end ($) if desired.
Multiple patterns can be combined via regexp branches (pattern1|pattern2) or (if unwieldy or different inclusions / exclusions are needed) via multiple RegexFilters, but you need to wrap those in <Filters> (link). Also note that DENY aborts further checks and NEUTRAL needs to be used.

From the library developer's point of view, omissions like these are very hard to recognize. These people been working for years on the library, know all of the ins and outs, and therefore have become oblivious to the challenges that newcomers face. (The German word betriebsblind covers this very well.) But it's important to appear friendly and inviting to new developers; first impressions count. Tragically, in this particular case, effort has already been spent, but unfortunately on a half-completed feature that's since been languishing in the code, and even confusing users occasionally. A fraction of that time would have been better spent on a bit of introductory and clarifying documentation.

Ingo Karkat, 29-Mar-2024


  1. Why is StringMatchFilter in the JavaDocs, but not mentioned in the filter manual page? It also doesn't help that its JavaDoc summary clearly is copy-and-pasted from the LevelMatchFilter; it mentions log levels, not strings. An accepted Stack Overflow answer says that (documented) MarkerFilter is the replacement for Log4j 1.x's StringMatchFilter, but that clearly doesn't work.(In my deepening confusion, I've tried even that!) Why are there filter classes that apparently aren't mentioned in the manual? (Also LevelRangeFilter (in the manual) vs. LevelMatchFilter (not).) Are these purely for backwards compatibility with the 1.x release? But why isn't there any deprecation note? I'm confused!

ingo's blog is licensed under Attribution-ShareAlike 4.0 International

blog comments powered by Disqus