Data Integrator
|
The tool gcoordsconservation
gets sequence conservation status, GERP scores or GERP "Constrained Elements" for the specified genomic coordinates and supplied method/species set.
Several conservation methods are supported, and are divided into three different categories, which behave slightly differently:
For details on the conservation methods themselves, please consult the official Ensembl's Compara documentation.
This tool utilizes the Ensembl Perl API.
The input consists of just one column for the genomic coordinate. The method/species-set which needs to be extracted must be supplied on the command line through the –mss
option.
A list of all available methods/species-set can be obtained by passing –list-mss
on the command line.
Options applicable to more than a single tool are summarized in common command line options.
The sequence conservation status or score is output as a new column. If headers are being used, the column name has the same name as the method/species-set supplied on the command line.
A list of all available methods/species-sets can be obtained by the following command:
$ ./gcoordsconservation.pl –data-version v-01 –list-mss
which results in the following list in Ensembl r100 (GRCh38):
103 eutherian mammals EPO-Low-Coverage 103 eutherian mammals GERP Conservation Scores 103 eutherian mammals GERP Constrained Elements 13 primates EPO 27 primates EPO-Low-Coverage 49 mammals EPO 81 amniota vertebrates GERP Conservation Scores 81 amniota vertebrates GERP Constrained Elements 81 amniota vertebrates Mercator-Pecan
Given the input file /tmp/gc.tsv
GC GRCh38:1:18916968 GRCh38:1:18917308 GRCh38:14:72904497-73054722
the command
$ ./gcoordsconservation.pl -H –data-version v-01 -c 1 –mss '81 amniota vertebrates Mercator-Pecan' </tmp/gc.tsv
will produce the following output (using Ensembl r100 database):
GC 81 amniota vertebrates Mercator-Pecan GRCh38:1:18916968 1 GRCh38:1:18917308 0 GRCh38:14:72904497-73054722 1
The same input using the 103 eutherian mammals GERP Conservation Scores
method/species-set instead returns the following GERP scores:
GC 103 eutherian mammals GERP Conservation Scores GRCh38:1:18916968 0.69 GRCh38:1:18917308 1.88 GRCh38:14:72904497-73054722 4.32
Please notice that GRCh37 conservation data are not available. Therefore, a command targetting GRCh37 data like
./gcoordsconservation.pl --data-version v-00 --list-mss
will result in a warning message as follows:
gcoordsconservation.pl WARN: Problem during initialization. Compara database missing?
and output NA
in a single line. This value can be used as an -mss
option and will lead to the same warning and no output at all, which will be the behavior for the Galaxy installation.