Data Integrator (Perl API)
|
Command-line tools require a common set of functions and behavior which is required to be implemented.
The initialization framework is rooted in the Dint::Init class, and provides a set of modules that can be decoupled from each other (depending on the required functionality) and initialize the global state of the program. Typically, each module introduces a single new global object.
The main module, which must be "use"d first is Dint::Init::Cmd, which creates the global $LOG, $CFG and $ARGS objects (for the Log4perl, Dint::Config and Dint::ArgumentParser respectively). All global objects are provided already initialized.
The initialization framework relies on generic boilerplate code present in Dint::Init, which provides common helper functions.
A typical tool needs to start by including the required modules:
then it also needs to define the following mandatory module variables:
$VERSION:
a quoted version string (see the version
module reference for documentation).$CMD:
the command like invocation syntax (used to generate the help)$DSC:
a reasonably short tool descriptionArguments can be added through the global $ARGS
object (whose usage is documented in Dint::ArgumentParser). Normally you only need to add new flags through the use of the add()
method:
Once parse()
is called, the flags and their value is available through the $ARGS->opts()
hash:
After the command line arguments are checked, but before creating any actual Dint/EnsEMBL object/class, Delayed initialization of the modules should be started by calling Dint::Init::init()
.
The Dint::Init::Reader module is then typically used to iterate over the input, line-by-line:
The following example is a simple "cut" replacement that prints the content of the column specified with the -c
flag, making use of most of initialization framework classes. The example, as it is, supports customizable empty cell markers, multiple data versions, common help flags, input error checking, and an automatic progress meter.
Several modules (such as Dint::Init::EnsEMBL) need to perform lengthy initialization, and/or use the command line argument contents (such as Dint::Init::Reader) which are only available after the inclusion and after parsing the arguments themselves.
To overcome to this problem, modules inside Dint::Init make use of Dint::Init::register(), which implements a simple callback mechanism.
Practically speaking, a module can register his own command-line arguments normally, through the global $ARGS
object. For the argument validation, or for setting an internal variable, the use of the Dint::ArgumentParser::handle() method directly might be sufficient. If not, the initialization can be performed in the callback when the program calls Dint::Init::init(). The ability of choosing (from the tool point of view) when to execute the initialization has the advantage of letting each tool have his own argument validation before establishing remote connections and/or avoid it entirely when it's not needed.
As a general rule, all modules register their arguments using Dint::ArgumentParser::add(). The flag is validated immediately using Dint::ArgumentParser::handle(), which also usually sets the value to a module-local variable. Then, in the delayed callback, the module-local variable is actually used.