linklint enhancements, bugfixes and updates
linklint-2.3.5_ingo_020 from 23-Feb-2008: Duplicate anchor detection, handling non-absolute redirects, new option -no_warn_files, bugfixes.
linklint-2.3.5_ingo_019 from 22-Jan-2008: Fixes to case and orphan checking, new options -ignore_orphan and -no_warn, various small bugfixes.
linklint-2.3.5_ingo_011 from 11-Nov-2007: Improved named anchor handling.
linklint-2.3.5_ingo_010 from 09-Nov-2007: Warning on literal spaces in URL. Small bugfixes.
linklint-2.3.5_ingo_009 from 03-Jul-2007: First published version.
Linklint is an open source Perl program that checks links on websites. The original Linklint site hasn't been updated since August 13, 2001. The latest release is Linklint 2.3.5. I've been using this cute tool to check my website for several years, and found it really helpful. I needed to make some enhancements (e.g. have it support local files containing spaces) and implemented some additional features, which was easy to do, because the tool is written in Perl. Its original creator, James B. Bowlin, put this tool under the GNU General Public License. In this spirit of free software, I share my modifications to Linklint, in the hope that somebody finds them useful. Please note that this is no official distribution and no attempt to continue development of Linklint. If somebody else wants to do this, I'd gladly contribute my changes, though.
The downloadable package consists of the original distribution (zipped Windows version), plus the updated linklint.pl and modified HTML documentation. The original linklint executable is named linklint-2.3.5.
- ENH: Detecting duplicate anchor names in pages.
- ENH: Don't list robots.txt and favicon.ico as orphans.
- Files that are explicitly ignored (via '-ignore ignoreset') are also ignored in the orphan check; no '-ignore_orphan ignoreset' must be specified.
- ENH: Added option -no_warn_files linkset (e.g. for generated content).
- ENH: Correct non-absolute redirect by prepending the originating absolute base URL to the incomplete redirect URL. Some web servers are a bit sloppy; we should only warn instead of failing with "not an http link".
- BF: Missing (un-)escaping in &LookupDir(). If seeds with escaped characters were specified, the index file wasn't detected properly, and files with filenames containing spaces in the seed directory were reported both in escaped and unescaped versions.
- Added option -ignore_orphan linkset with which certain files can be excluded from the orphan list.
- Added option -no_warn linkset to filter certain warnings based on the complete warning text or a PATTERN. (This is a generalization of -no_warn_index.)
- ENH: All linkset arguments now support specifying a regular expression in the format
m/PATTERN/
in addition to the literal expression.
- BF: The original orphan detection algorithm reported intermediate directories that had no linked files (but contained directories with linked files) as orphaned.
- ENH: The check for case mismatch now also detects mismatches in the path, not the file name.
- BF: Check for case mismatch handles filenames with spaces. The wrong case is added as an annotation.
- BF: An empty anchor (e.g. "index.html#") was reported as a missing named anchor, though it is neither named nor missing.
- ENH: HTML 4 allows anchor definitions on any tag via the 'id' attribute, not just via <a name="...">.
- Checking for literal spaces in URL. A warning is issued, and the literal space is escaped into %20, just like the browser would do.
- BF: Escaping characters to HTML entities when converting plain text report to HTML.
- BF: HTTP site check doesn't work if both -db7 and -db8 are set.
- Report local file:// links as warnings.
- Added <!DOCTYPE> declaration for HTML 3.2.
- Added highlighting of 'warn' in addition to 'ERROR'.
- BF: Do not shorten /../.. to /, so that all accesses above the server root can be reported.
- BF: Trim accesses above the server root directory, like a web browser would do. Report such bad URLs as warnings.
- Added additional image, audio and video fileformats.
- BF: option '-doc' now correctly interprets Windows drive letters as an absolute path.
- Also use HTML reports file extension '.html' on Windows.
- Small BF: missing space in urlindex.html heading.
- Now cross-linking index.html and urlindex.html if remote checking enabled.
- Clumsy workaround for problem on Windows systems with MKS Toolkit Perl.
- BF: In local mode, filenames with spaces are reported as missing.
- Added no-caching meta tag for HTML report pages so that the latest reports are accessed through the webserver.