Ingo Karkat - Testing a standalone Perl script

Testing a standalone Perl script

Posted Sunday, 03-Mar-2019 by Ingo Karkat

Many home-grown Perl programs at least start off as single-file scripts; due to the small size, there's no immediate need for a distributed set of modules, and this results in less boilerplate code and very easy distribution. However, as most Perl test libraries are centered around modules, this inhibits automated testing, which is sad, because the easiest way to great test coverage is to start early, and grow the tests along with the program. Fortunately, it is possible to implement tests even in the early stages of the program, without hindering future growth. This article presents one approach that combines Test::Script for some overall black-box testing of the entire script with internal modules inside the script, which are then unit-tested in the usual way with Test::More. I use my reldate program as an example; I'll only show snippets here; check out the repository for the full code.

black-box testing with `Test::Script`

How to write unit test cases for a Perl script clearly presents the problem: Writing a test for a module.pm is simple and straightforward, but how do I do the same for a script.pl? The answer first ignores the main issue and throws Test::Script into the ring, which allows to do basic black-box verification of the entire script. That in itself is surely important, so my first test ensures that there are no syntax errors in the Perl script (i.e. it compiles), and that it can be invoked with --help.

t0000-compiles.t

use Modern::Perl;
use Test::More;
use Test::Script;

script_compiles('bin/reldate');
script_runs(['bin/reldate', '--help']);

done_testing;

Next, let's ensure command-line argument parsing handles invalid arguments correctly. As I'm using GetOpt::Long, I don't need to test the library, but rather the options definitions. This also is best checked by directly invoking the script:

t1000-invalid-arguments.t

[...]
script_runs(['bin/reldate', '--relative-to', 'isNoDate'], {exit => 2}, 'invalid BASE-DATE');
script_stderr_like qr{isNoDate is not a valid BASE-DATE!}, 'invalid BASE-DATE message';
[...]

enabling unit testing with modulino

How can I test a standalone Perl script points to Scripts as Modules, where brian d foy introduces the idea of a modulino, which basically places the script's main functionality in a run() function inside a local package.

#!/usr/bin/perl
package Local::Modulino;

__PACKAGE__->run( @ARGV ) unless caller();

sub run { print "I'm a script!\n" }

__END__

caller() returns the calling package name if another Perl file loads this one with use() or require(), but undef if the script is directly called, and that is used to directly invoke the script functionality then. I think it's similar to the Python if __name__ == '__main__': idiom that's frequently seen.

As my script takes both command-line arguments and files, I actually need to pass references to both command-line arguments (\@ARGV) and to the special filehandle that iterates over command-line filenames (\*ARGV). The former is then passed to GetOptionsFromArray() from GetOpt::Long, instead of using the default ~~GetOptions()~~. Likewise, ~~while (<>)~~ becomes while (readline($inputHandleRef)). For separation of concerns, I only do command-line argument parsing inside run(), and delegate the actual script processing to a separate process() subroutine:

skeleton.pl

sub process {
    my ($inputHandleRef) = @_;

    while (readline($inputHandleRef)) {
        # ...
    }
}
sub run {
    my (undef, $argumentsRef, $inputHandleRef) = @_;

    GetOptionsFromArray($argumentsRef,
        'help|h|?' => \$help,
        'man' => \$man,
        'date-pattern=s' => \$datePattern,
    ) or pod2usage(2);
    pod2usage(-exitval => 0, -verbose => 0) if $help;
    pod2usage(-exitval => 0, -verbose => 2) if $man;

    process($argumentsRef, $inputHandleRef);
}

__PACKAGE__->run(\@ARGV, \*ARGV) unless caller();

unit testing with `Test::More`

Like many other command-line tools, my program takes input from files (or standard input), and prints the result to standard out. For that, the basic is() assertion from Test::More will do just fine. However, tests would like to define the input inline with a here-string, and capture the output in a string for convenient comparison. As that will be the same for any of the script's functionality (just different combinations of command-line arguments, input, and expected output), let's write a test library that reduces the boilerplate code in each of the tests to a minimum. This greatly aids maintainability and understandability, and keeps the test code DRY.

Test/Reldate.pm

package Test::Reldate;

use Modern::Perl;
use Test::More;
use autodie 'open';

require 'bin/reldate';

use Exporter 'import';
our @EXPORT = qw(run_with_input_produces_output);

sub run_with_input_produces_output {
    my ($argumentsRef, $input, $expected_output, $test_name) = @_;

    my $output;

    open(my $outputHandle, '>', \$output) or die "Can't open memory file: $!";
    my $originalHandle = select $outputHandle;

    my $inputHandle;
    if (defined $input) {
        open($inputHandle, '<', \$input);
    }

    Local::Reldate->run($argumentsRef, $inputHandle);
    chomp $output;
    is($output, $expected_output, $test_name);

    select $originalHandle;
}

1;

Here's the first of many similar tests, covering the --timespan option:

t2000-timespans.t

use Modern::Perl;
use Test::More;
use Test::Reldate;

run_produces_output(['--timespan', '60'], '1 minute', 'timespan of 60 seconds');
run_produces_output(['--timespan', '+60'], '1 minute ago', 'timespan of +60 seconds');
run_produces_output(['--timespan', '-60'], 'in 1 minute', 'timespan of -60 seconds');

run_produces_output(['--timespan', '120'], '2 minutes', 'timespan of 120 seconds');
run_produces_output(['--timespan', '122'], '2 minutes', 'timespan of 122 seconds');

run_produces_output(['--timespan', '1'], '1 second', 'timespan of 1 second');
run_produces_output(['--timespan', '0'], 'just now', 'timespan of 0 seconds');

run_produces_output(['--timespan', '86400'], '1 day', 'timespan of 1 day');
run_produces_output(['--timespan', '+86400'], 'yesterday', 'timespan of +1 day');
run_produces_output(['--timespan', '-86400'], 'tomorrow', 'timespan of -1 day');

done_testing;

If the same input or expected output occurs for multiple test runs, a variable (defined via here-string for multi-line content) can be used to avoid duplicating it:

my $input = qq{
At 20180501_071159, we said "20180501_083000 will happen".
At 20200601_120000, he said "Between 20200530_120000 and 20200604_120000".};

run_with_input_produces_output(['--relative-to', '20190108_060000'], $input, qq{
At 252 days before it, we said "252 days before it will happen".
At 1 year later than it, he said "Between 1 year later than it and 1 year later than it".}, 'relative to base date');

I'm not completely comfortable calling these tests unit tests, as still cover all of the script's main logic, and just segregate different options and parameters. It would be easy to skip the command-line argument parsing (and directly invoke process() instead of run()), but we'd need to define a data structure holding the resulting flags and strings (or directly expose the definitions inside the modulino), and that looks like a lot of work, additional code, and indirection. It might make sense if there were a lot of dependencies and rules among options, though, as we could then test the parsing separately. But in this case, I'd like to keep it simple (YAGNI), and directly using the command-line options as test arguments makes clearly conveys the test's purpose.

running it

Tests can be run individually through perl, with output in the TAP format:

$ perl t/t2000-timespans.t
ok 1 - timespan of 60 seconds
ok 2 - timespan of +60 seconds
ok 3 - timespan of -60 seconds
ok 4 - timespan of 120 seconds
ok 5 - timespan of 122 seconds
ok 6 - timespan of 1 second
ok 7 - timespan of 0 seconds
ok 8 - timespan of 1 day
ok 9 - timespan of +1 day
ok 10 - timespan of -1 day
1..10

Also, the prove command (from core TAP::Harness) can be used to interpret the output and provide a shorter summary, which is nice for running all tests together:

$ prove
t/t0000-compiles.t .................. ok
t/t1000-invalid-arguments.t ......... ok
t/t2000-timespans.t ................. ok
t/t3000-epoch.t ..................... ok
t/t3010-epoch-weekday.t ............. ok
t/t4000-simple-file-contents.t ...... ok
t/t4010-simple-multiple.t ........... ok
t/t5000-delta-to-first.t ............ ok
t/t5010-delta-to-first-within.t ..... ok
t/t5100-delta-each.t ................ ok
t/t5110-delta-each-within.t ......... ok
t/t5200-relative-to-first.t ......... ok
t/t5210-relative-to-first-within.t .. ok
t/t5300-relative-each.t ............. ok
t/t5310-relative-each-within.t ...... ok
t/t5500-reset.t ..................... ok
t/t5900-relative-to.t ............... ok
t/t6000-with-weekday.t .............. ok
t/t6100-prefer.t .................... ok
t/t6200-prefix.t .................... ok
t/t6300-keep-width.t ................ ok
t/t6300-precision.t ................. ok
t/t7000-default-date-patterns.t ..... ok
t/t7100-custom-date-patterns.t ...... ok
t/t7110-weird-date-pattern.t ........ ok
All tests successful.
Files=25, Tests=137,  4 wallclock secs ( 0.10 usr  0.02 sys +  2.83 cusr  0.22 csys =  3.17 CPU)
Result: PASS

This is working pretty well for me, and I plan to cover any future Perl scripts in the same way, too.

Ingo Karkat, 03-Mar-2019

ingo's blog is licensed under Attribution-ShareAlike 4.0 International