Data Integrator
HSGeneOrthologyMapper - Orthology mapping between human and a model organism

Maps between Homo sapiens genes and orthologs from Drospohila melanogaster (fruit fly), Mus musculus (mouse), Caenorhabditis elegans (worm). We are aiming to use Ensembl gene IDs wherever possible and the mappings are derived from Ensembl's Compara pipeline via Biomart. We would like to note here that the Compara pipeline is an established procedure for deriving homology, but it is also highly dynamic: Orthologies between releases are not guaranteed to be stable, it is therefore possible and has been observed that in a new Ensembl release previously detected orthologs will no longer be present. Also, sequence identity percentages are varying slightly between release.

Input

In contrast to the gene ID mappers (HSGeneIdConverter - Homo sapiens gene ID converter, and DMGeneIdConverter - Drosophila melanogaster gene ID converter), this tool only maps between defined pairs of genes from different species, and is centered on human data: It can only map from the model organism genes to human genes or from human genes to the model organism genes. It is therefore currently not possible to map worm genes to mouse genes.

Ensembl Compara divides its orthology prediction into two categories. One are the confidently predicted orthologs, and the second are possible orthologs. By default, possible orthologs are not included in the tool's output and can be added optionally.

The mapped genes can optionally be furnished with additional data: The type of homology, and the percentage of identity between the protein sequences of the orthologous genes. See section Output for more details.

Options applicable to more than a single tool are summarized in common command line options.

Output

Depending on which organism was selected, the gene output column, which is the only mandatory output column, will have the name Human Ensembl gene ID, Drosophila Ensembl gene ID, Mouse Ensembl gene ID, or Caenorhabditis elegans Ensembl gene ID, respectively.

Based on the selection of additional columns, any or all of the following column will be appended to the table. Irrespective of the selection, the order of the output columns is guaranteed to be as listed here, starting with the above mentioned gene ID column:

  • Homology Type Gives information of the relationship and quality of the orthologs. Compara names are somewhat lengthy, we have therefore abbreviated them, which is summarized in the table below. See the Ensembl r68 Gene Orthology/Paralogy prediction method web site for more information on the method and the homology types.

    Compara Homology Type DIntegrator abbreviation
    ortholog_one2many 1:N
    ortholog_one2one 1:1
    apparent_ortholog_one2one 1:1?
    ortholog_many2many M:N

  • % Identity with respect to human gene Percentage of identity is computed on the level of the protein sequence, even if the genes are the primary entities for orthology. The value in this column is the number of identical amino acids shared in an alignment between the human and the model organism's protein sequence divided by the length of the human protein encoded by the human gene. This information is based on an Ensembl developer email from July 2011.

  • % Identity with respect to (model organism) gene Like described above, this is the number of identical amino acids divided by the length of the protein sequence encoded by the gene in the model organism.

  • Orthology Confidence Gives information on the real and possible orthologs.
    Orthology Confidence DIntegrator abbreviation
    0 * possible
    1 real
    * This entry appears only if possible orthologs were selected for inclusion.

If there is no ortholog for a certain gene, the empty cell identifier will be output in each column.