Data Integrator
HSGeneIdConverter - Homo sapiens gene ID converter

Human genes are described by many different identifiers and this tool helps to convert between the most common of them, the Ensembl gene ID, the NCBI Entrez gene ID, the HGNC primary gene symbol, and the HGNC gene alias. The first two are considered very stable identifiers, whereas HGNC primary symbols are a bit more subject to change. For example, the gene on chromosome 11 between bp 8932686-8941631 it had the HGNC gene symbol "C11orf17" in Ensembl 61, but in Ensembl 62 it was already "AKIP1". Things are getting even worse once one moves to the gene aliases, as there is inconsistency within them and furthermore HGNC gene primary symbols can be used as aliases for different genes.

Examples of inconsistencies:

  • HGNC primary symbol "BCAM" (ENSG00000187244) is also used as a gene alias for HGNC primary symbol "BCAT2" (ENSG00000105552). Other examples include "ARX", "CAMP", "GRIN1", and about 450 more as of the date of writing this page (2012-12-08).

  • Gene alias "SPO" is used for both HGNC primary gene symbols "LPO" (ENSG00000167419) and "SYNPR" (ENSG00000163630). The same situation is observed for gene alias "BCAP" and HGNC primary gene symbols "PHF11" and "PIK3AP1"; "BDP"/"ARID3B", "PPM1K"; "BRCC1"/"BRCA1", "NARG2"; and many, many more.

Input

For this tool, the input consists of a single column with currently supported human gene identifiers. This is the same set of identifiers as described in the Output section. In addition to the data column itself, the program also needs to be supplied with the type of gene identifier.

Options applicable to more than a single tool are summarized in common command line options.

Output

Gene identifiers are converted to their target gene identifiers using Ensembl Biomart translation tables. In order to avoid combinatorial explosion of translation tables, the tool internally converts a given source gene identifier to the Ensembl gene identifier and from there it further converts to the chosen destination gene ID. We have chosen Ensembl gene IDs since EBI provides an excellent and highly comprehensive collection of data relevant to our field. If it is not possible to convert the gene identifier, whether its source ID is unknown or there is no fitting destination gene ID, the empty cell string is ouput.

A possible list of output gene identifiers is summarized below (this is also the list of possible input gene IDs. Theoretically it is also possible to convert from and to the same gene identifier.) These correspond to the table headers, if they are included in the output.

  • Ensembl Gene ID Ensembl gene identifier, starting with ENSG.
  • EntrezGene ID NIH Entrez gene identifier, an integer value.
  • HGNC Approved Symbol HGNC primary gene symbol, a short alpha-numeric uppercase gene name.
  • HGNC Alias Gene name alias listed by HGNC. Very redundant, very inconsistent, see comments above.