Skip to content

Commit

Permalink
archiver/retriever: Handle USR1 signal to skip delay timers
Browse files Browse the repository at this point in the history
Implement USR1 signal handlers to allow bypassing current delay timers
to force operations to proceed.

This is useful when doing manual testing or error recovery allowing
admins to skip the wait involved with ENDIT trying to perform tape-optimal
operations.

Solves #23
  • Loading branch information
ZNikke committed Feb 19, 2021
1 parent d37bfb1 commit b7a3ad0
Show file tree
Hide file tree
Showing 3 changed files with 43 additions and 5 deletions.
13 changes: 13 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,19 @@ To run multiple instances for different tape pools on one host, the `ENDIT_CONFI
to use a different configuration file. This is not to be confused with enabling parallel/multiple archive and
retrieve operations for one pool which is done using options in the ENDIT daemon configuration file.

# Bypassing delays/threshold/timers when testing

The ENDIT daemons are designed to avoid unnecessary tape mounts, and achieves
this by employing various thresholds and timers as explained in the example
configuration file.

However, when doing functional tests or error recovery related to the tape
system it can be really frustrating having to wait longer than
necessary. For these situations it's suitable to use the `USR1` signal
handling in the ENDIT daemons. In general, the `USR1` signal tells the
daemons to disregard all timers and thresholds and perform any pending
actions immediately.

# Collaboration

It's all healthy perl, no icky surprises, we hope. Patches, suggestions, etc are
Expand Down
9 changes: 8 additions & 1 deletion tsmarchiver.pl
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
$Endit::logsuffix = 'tsmarchiver.log';
my $filelist = "tsm-archive-files.XXXXXX";
my $dsmcpid;
my $skipdelays = 0; # Set by USR1 signal handler

##################
# Helper functions
Expand Down Expand Up @@ -109,6 +110,7 @@ INIT
$SIG{QUIT} = sub { warn("Got SIGQUIT, exiting...\n"); killchild(); exit; };
$SIG{TERM} = sub { warn("Got SIGTERM, exiting...\n"); killchild(); exit; };
$SIG{HUP} = sub { warn("Got SIGHUP, exiting...\n"); killchild(); exit; };
$SIG{USR1} = sub { $skipdelays = 1; };

my $desclong="";
if($conf{'desc-long'}) {
Expand All @@ -125,6 +127,7 @@ INIT
getdir($dir, \%files);

if(!%files) {
$skipdelays = 0; # Ignore irrelevant request by USR1 signal
printlog "No files, sleeping for $conf{sleeptime} seconds" if($conf{debug});
sleep($conf{sleeptime});
next;
Expand Down Expand Up @@ -152,7 +155,11 @@ INIT
if(!defined($timer)) {
$timer = 0;
}
if($timer < $conf{archiver_timeout}) {
if($skipdelays) {
$skipdelays = 0; # Reset state set by USR1 signal
printlog "$usagestr below threshold and only waited $timer seconds, but proceeding anyway as instructed by USR1 signal";
}
elsif($timer < $conf{archiver_timeout}) {
if($conf{debug} || $conf{verbose} && $usagestr ne $lastusagestr) {
printlog "$usagestr below threshold, waiting for more data (waited $timer seconds)";
}
Expand Down
26 changes: 22 additions & 4 deletions tsmretriever.pl
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,13 @@

# Add directory of script to module search path
use lib dirname (__FILE__);

use Endit qw(%conf readconf printlog);

###########
# Variables
$Endit::logsuffix = 'tsmretriever.log';
my $skipdelays = 0; # Set by USR1 signal handler

# Turn off output buffering
$| = 1;
Expand Down Expand Up @@ -60,6 +64,7 @@ ()
$SIG{QUIT} = sub { warn("Got SIGQUIT, exiting...\n"); killchildren(); exit; };
$SIG{TERM} = sub { warn("Got SIGTERM, exiting...\n"); killchildren(); exit; };
$SIG{HUP} = sub { warn("Got SIGHUP, exiting...\n"); killchildren(); exit; };
$SIG{USR1} = sub { $skipdelays = 1; };

sub checkrequest($) {
my $req = shift;
Expand Down Expand Up @@ -335,13 +340,25 @@ ()
}

if($tape ne 'default' && defined $lastmount{$tape} && $lastmount{$tape} > time - $conf{retriever_remountdelay}) {
printlog "Skipping volume $tape, last mounted at " . strftime("%Y-%m-%d %H:%M:%S",localtime($lastmount{$tape})) . " which is more recent than remountdelay $conf{retriever_remountdelay}s ago" if($conf{verbose});
next;
my $msg = "volume $tape, last mounted at " . strftime("%Y-%m-%d %H:%M:%S",localtime($lastmount{$tape})) . " which is more recent than remountdelay $conf{retriever_remountdelay}s ago";
if($skipdelays) {
printlog "Proceeding due to USR1 signal despite $msg";
}
else {
printlog "Skipping $msg" if($conf{verbose});
next;
}
}

if($tape ne 'default' && $job->{$tape}->{tsoldest} > time()-$conf{retriever_reqlistfillwaitmax} && $job->{$tape}->{tsnewest} > time()-$conf{retriever_reqlistfillwait}) {
printlog "Skipping volume $tape, request list $job->{$tape}->{listsize} entries and still filling, oldest " . strftime("%Y-%m-%d %H:%M:%S",localtime($job->{$tape}->{tsoldest})) . " newest " . strftime("%Y-%m-%d %H:%M:%S",localtime($job->{$tape}->{tsnewest})) if($conf{verbose});
next;
my $msg = "volume $tape, request list $job->{$tape}->{listsize} entries and still filling, oldest " . strftime("%Y-%m-%d %H:%M:%S",localtime($job->{$tape}->{tsoldest})) . " newest " . strftime("%Y-%m-%d %H:%M:%S",localtime($job->{$tape}->{tsnewest}));
if($skipdelays) {
printlog "Proceeding due to USR1 signal despite $msg";
}
else {
printlog "Skipping $msg" if($conf{verbose});
next;
}
}

my ($lf, $listfile) = tempfile("$tape.XXXXXX", DIR=>"$conf{dir}/requestlists", UNLINK=>0);
Expand Down Expand Up @@ -507,4 +524,5 @@ ()
}
}
}
$skipdelays = 0; # Reset state set by USR1 signal
}

0 comments on commit b7a3ad0

Please sign in to comment.