Posted Friday, 29-Mar-2024 by Ingo Karkat
I needed to filter out some inconsequential warnings from our application's log file. These sounded like severe security issues, but the application itself wasn't aware that security had been turned off on their end by design, because we have a proxy in front of it that dealt with security (similar to SSL offloading). To avoid the confusion, these warnings should be suppressed — we're looking at all errors and warnings as part of our release process.
Luckily, there's RegexFilter in Log4j 2.x for that1. That link to the manual isn't in the top search results, rather a handful of Stack Overflow posts where users struggle with various details of the configuration. The XML configuration is rather tame, and I just have five fixed log messages to filter out, no complex pattern matching required. I just needed to know how to escape the literal text and how to anchor the match so that only the exact full messages are matched.
Unfortunately, the manual only provides this:
Parameter Name | Type | Description |
---|---|---|
regex | String | The regular expression. |
Okay, since we're in the Java ecosystem, one can guess that this is a Java regular expression, not Posix or PCRE. But is it anchored to the begin and end — many examples add .* ... .* around the pattern? More information is needed.
With another search one soon finds the JavaDocs, and the createFilter() method seems to cover the configuration. Unfortunately, it doesn't provide more information on the regular expression syntax:
Parameters:
regex - The regular expression to match.
So I still have to try that out. But the next parameter is interesting for my use case of literal matching:
patternFlags - An array of Strings where each String is a Pattern.compile(String, int) compilation flag.
With a LITERAL flag, I should be able to just paste my messages without any manual escaping! This would really help with readability and also increases the probability that a literal text search of the warning would find the configuration, something that a future maintainer will do when that warning suppression has to be adapted or extended.
So, I'll have my experiments cut out for me:
As I have 5 warning lines, I can test multiple combinations at once. This helps as the warnings are only issued once on application startup, so I need to shut down and restart the Docker Compose environment for each test, which takes a minute.
Except that… nothing worked the way I expected, I got very confusing behavior, the results seemed to depend on the ordering of the filters. 🤔
To cut short a long story, after a dozen restarts and a few more web searches, I had learned that:
A Filter will be called on one of its filter methods and will return a Result, which is an Enum that has one of 3 values - ACCEPT, DENY or NEUTRAL.So while the information is technically there, it's very easy for the unaware newcomer to overlook this fact and its implications for multiple filters. This is exacerbated by the many examples that just use a single filter (and therefore can get away with a simple ACCEPT / DENY combination that immediately causes a second added filter to be ignored).
<RegexFilter …> <PatternFlags>LITERAL</PatternFlags> <PatternFlags>CASE_INSENSITIVE</PatternFlags> </RegexFilter>A side quest led me to question the casing of that tag, or whether there would be separate <PatternFlag> tags inside a <PatternFlags> one (yay, don't we all love nested XML tags!)
I'm not sure how this will work.
Log4j is critical infrastructure, used millions of times all over the planet, every day. Yet it's an open source project, mostly staffed by volunteers in their free time. With a lot of demand (bugs, features, troubleshooting), things fall under the table. Currently, there are 858 open issues, the oldest at 14 years and the most recent one submitted four months ago. Despite heroic efforts, that's not sustainable. The feature started with a very nice and robust implementation; someone should have kept an eye on the whole thing and made sure that it was completed (or not merged at all). Alternatively, there should have been a critical discussion about the perceived value of such a feature — as Java supports embedded flags (for everything but the obscure CANON_EQ and LITERAL, which can use \Q...\E) already. Just one sentence in the manual hinting at (and linking to) those flags might have been enough.
Suppressing some warnings in a log isn't a crucial development activity. This should be quick and easy, but here, due to hidden complexities both technical and related to open source projects, the usual strategy of satisficing failed. From the user's perspective, the (naturally limited) effort was spent on the wrong things. Tragically, the initial part of the implementation was done so well (and already integrated into the library without marking it as "experimental" or "incomplete") that it fooled not just me into trying it out, also because the manual itself is dead silent on these common use cases. A few extra sentences at the beginning of the filter documentation would have prevented frustration and some wasted hours:
For case-insensitive, multi-line, or other special matching, use Java's embedded flags (link). Likewise, literal matching can be done by wrapping in \Q...\E. The regexp matches any text contained in the full log message; you need to explicitly anchor to begin (^) and/or end ($) if desired.
Multiple patterns can be combined via regexp branches (pattern1|pattern2) or (if unwieldy or different inclusions / exclusions are needed) via multiple RegexFilters, but you need to wrap those in <Filters> (link). Also note that DENY aborts further checks and NEUTRAL needs to be used.
From the library developer's point of view, omissions like these are very hard to recognize. These people been working for years on the library, know all of the ins and outs, and therefore have become oblivious to the challenges that newcomers face. (The German word betriebsblind covers this very well.) But it's important to appear friendly and inviting to new developers; first impressions count. Tragically, in this particular case, effort has already been spent, but unfortunately on a half-completed feature that's since been languishing in the code, and even confusing users occasionally. A fraction of that time would have been better spent on a bit of introductory and clarifying documentation.
Ingo Karkat, 29-Mar-2024
ingo's blog is licensed under Attribution-ShareAlike 4.0 International
blog comments powered by Disqus